Python Basics

You can execute commands in Python Shell -> line by line

You can also create Python scripts in text files with extension .py

You can create variables, do operations and then print() to generate output from script :

Savings = 100
growth_multiplier =1.1
result = Savings*growth_multiplier
print(result)

What is a variable : Specific, case-sensitive name, Call up value through variable name

#In Python Shell 
Height=1.79
Weight=68.7
height 

#prints out 1.79

#Calculate BMI 
#BMI = weight / height^2
Height=1.79
Weight=68.7
bmi=Weight/Height**2
#**2 squares the value, **3 cubes it and so on
print(bmi)

Python Types

type(bmi)
#gives the type of the variable
#in this case it will be a float

day_of_week=5
type(day_of_week)
#in this case it will be an int (integer)

x='body mass index'
y='this works too'
type(y)
#in this case it will a str (string)

z=True
type(z)
#this will be a boolean (so either True or False)

2+3 
#this will print 5 in shell

'ab'+'cd' 
#this will print 'abcd'
#Different types = different behaviour

#if you want to print with int and strings in it you need to use the method
# str() on the integers, example : 

print("I started with $" + str(savings) + " and now have $" + str(result) + ". Awesome!")

Python Lists

#Python Data Types
# float - real numbers
# int - integer numbers
# str - string, text
# bool - True, False

height= 1.73
tall = True

#Each variable represents single value
#Problem when we have plenty of data points, like it is the case in data #science usually

height1 = 1.73
height2 = 1.68
height3 = 1.71
height4 = 1.89

#We can store those in lists
[a,b,c]=[1.73,1.68,1.71,1.89]
fam=[1.73,1.68,1.71,1.89]

#Name collection of values
#Contain any type
#Contain different types

fam = ["liz", 1.73, "emma", 1.68, "mom", 1.71, "dad", 1.89]
#prints :
['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]

fam2 = [["liz", 1.73],
["emma", 1.68],
["mom", 1.71],
["dad", 1.89]]

#prints same thing but maybe more conveniant

type(fam)
#the type of fam is 'list'
#Specific functionality and specific behavior

Subsetting lists

fam = ["liz", 1.73, "emma", 1.68, "mom", 1.71, "dad", 1.89]
fam[3]
#taking the index 3 of the list will get the value '1.68' as the index #starts at 0, indeed fam[0] will select the first value, 'liz'

fam[-1]
#this will take the last element of the list, so 1.89 in our example

#you can also slice the list, take a part of it in one go : 
fam[3:5]

#this will select everything from the index 3 (inclusive) to index 5 
#(exclusive)
#In our example : [1.68,'mom']

#you can also slice it using ':' to get either all the elements before an #index or all the elements after it

fam[:4]
fam[4:]

#in first case it will take all the elements before index 4 (excluding) 
['liz', 1.73, 'emma', 1.68]
#in the second case it will take all the elements after 4 (including)
[1.71, 'dad', 1.89]

List Manipulation

Changing list elements

fam = ["liz", 1.73, "emma", 1.68, "mom", 1.71, "dad", 1.89]

#if you want to change an element you can do that by changing the value of a #specific index 

fam[7]=1.86

#or by changing multiple values by slicing 

fam[0:2] = ["lisa", 1.74]

Adding new elements to a list

#You can as well add and remove elements 
fam_1 = fam + ["me", 1.79]

#And you can as well remove elements from a list using the del() method! 
del(fam[2])

List of lists

#You can also do lists of lists

# area variables (in square meters)
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50

# house information as list of lists
house = [["hallway", hall],
         ["kitchen", kit],
         ["living room", liv],
         ["bedroom", bed],
         ["bathroom",bath]]

Functions

fam = [1.73, 1.68, 1.71, 1.89]
print(max(fam))

#The function max() will get the highest number in the list, you can also #store the result in a variable : 

tallest = max(fam)

round(1.68,1)

#round() will round the number with the number of decimals specified in the #second parameter, in our example it is 1, so it will round the nunmber to #1.7

#You can use help(round) to get instructions on how to use the function

#some other functions : 

# Create variables var1 and var2
var1 = [1, 2, 3, 4]
var2 = True

# Print out type of var1
print(type(var1))

# Print out length of var1
print(len(var1))

# Convert var2 to an integer: out2
out2=int(var2)

# Create lists first and second
first = [11.25, 18.0, 20.0]
second = [10.75, 9.50]

# Paste together first and second: full
full=first+second

# Sort full in descending order: full_sorted
full_sorted=sorted(full,reverse=True)

If you are doing a standard task, a function probably exists to do it !

List methods :

fam.index("mom") 

#returns the index of the element 'mom' in the list fam

fam.count(1.73)

#returns the number of times the element specified appears in the list

fam.append('me')

#will add a new element 'me' at the end of the list

#Some additional methods : 
#append(), that adds an element to the list it is called on,
#remove(), that removes the first element of a list that matches the input, #and
#reverse(), that reverses the order of the elements in the list it is called on.

Str methods

sister='liza'
sister.capitalize()

#Will put the first character in capital letter

sister.replace("z", "sa")

#Will replace the 'z' in the string by 'sa'

sister.index('z')
#will return 2, as this is the index in liza


place = "poolhouse"
# Use upper() on place: place_up
place_up=place.upper()

#Will capitalize all the letters in the variable place

Packages

Functions and methods are powerful. A package is a directory of Python scripts. Each script is a module, a package gives access to new libraries and new stuff to use in your code. You have to install and then import the packages you want to use and then you can use the different functions etc that are built into the package.

Some popular packages :

Numpy, Matplotlib, Scikit-learn

Dowload pip :

#In the terminal : 
python3 get-pip.py

#Once you have pip (the package installer tool) you can install packages :
pip3 install numpy

In the script you have then to import the packages

import numpy as np
np.array([1,2,3])

#You imported the numpy package and can now use arrays ! 

import numpy
numpy.array([1,2,3])

the math package

# Definition of radius
r = 0.43

# Import the math package
import math

# Calculate C
C = 2*math.pi*r

# Calculate A
A =math.pi*r**2

# Build printout
print("Circumference: " + str(C))
print("Area: " + str(A))

# Definition of radius
r = 192500

# Import radians function of math package
from math import radians

# Travel distance of Moon over 12 degrees. Store in dist.
dist=r*radians(12)

# Print out dist
print(dist)

Numpy

NumPy is a fundamental Python package to efficiently practice data science

You cannot use traditional list to do operations between lists, example :

height = [1.73, 1.68, 1.71, 1.89, 1.79]
weight = [65.4, 59.2, 63.6, 88.4, 68.7]

weight / height ** 2

#You get an error : TypeError: unsupported operand type(s) for **: 'list' #and 'int'

Numpy stands for Numeric Python. It has an alternative to Pyhton lists : the Numpy Arrays. It can do calculations over entire arrays, it’s easy and fast. You just have to install the numpy package with :

#In terminal
pip3 install numpy
import numpy as np

height = [1.73, 1.68, 1.71, 1.89, 1.79]
weight = [65.4, 59.2, 63.6, 88.4, 68.7]

np_height = np.array(height)
#np_height = array([ 1.73, 1.68, 1.71, 1.89, 1.79])

np_weight = np.array(weight)
#np_weight = array([ 65.4, 59.2, 63.6, 88.4, 68.7])

bmi = np_weight / np_height ** 2
print(bmi)

#array([ 21.852, 20.975, 21.75 , 24.747, 21.441])

This time it worked ! we could take the whole collection of data and do operations between those lists element to element

Numpy arrays can only contain one type of data

#If you do : 
np.array([1.0, "is", True])

#then all the elements will be converted to string automatically
# -> array(['1.0', 'is', 'True'],

Different types, different behaviors :

python_list = [1, 2, 3]
numpy_array = np.array([1, 2, 3])

python_list + python_list
#[1, 2, 3, 1, 2, 3]

numpy_array + numpy_array
#array([2, 4, 6])

Numpy Subsetting

bmi
#array([ 21.852, 20.975, 21.75 , 24.747, 21.441])

bmi[1]
#20.975

bmi > 23
#array([False, False, False, True, False], dtype=bool)
#returns an array with booleans that say if element is bigger than 23 or not

bmi[bmi > 23]
#array([ 24.747]) return the element for which the condition is True

2D Numpy Arrays

np_2d = np.array([[1.73, 1.68, 1.71, 1.89, 1.79],
[65.4, 59.2, 63.6, 88.4, 68.7]])

np_2d.shape
#Returns the number of rows and columns
#in this caqse (2,5), 2 rows and 5 columns, there are indeed 2 lists of 5 #elements each

Subsetting

          0     1     2     3     4
array([[ 1.73, 1.68, 1.71, 1.89, 1.79],  0
       [ 65.4, 59.2, 63.6, 88.4, 68.7]]) 1

np_2d[0]
#Returns first array
# array([ 1.73, 1.68, 1.71, 1.89, 1.79])

np_2d[0][2]
#Return element with index 2 of the first row 
#1.71

np_2d[0,2]
#Same returns 1.71

np_2d[:,1:3]
#Returns elements 1->3 from both rows 
#array([[ 1.68, 1.71],
#       [ 59.2 , 63.6 ]])

np_2d[1,:]
#Returns everything from second row
#array([ 65.4, 59.2, 63.6, 88.4, 68.7])

You can also do array multiplications element to element. imagine you have a 2D array with each row reprensenting some dude with 3 columns, one for the weight, one for the height and one for the age. Now you want to convert the weight and the height for all the dudes from heretic american scales to glorious european ones.

You can create an array with the multiplying factors and just multiplay both arrays !

# baseball is available as a regular list of lists
# updated is available as 2D numpy array

# Import numpy package
import numpy as np

# Create np_baseball (3 cols, weight, height, age)
np_baseball = np.array(baseball)

# Create numpy array: conversion
conversion = np.array([0.0254, 0.453592, 1])

# Print out product of np_baseball and conversion
print(np_baseball*conversion)

Numpy : basic statistics

When you have huge amounts of data it is impossible to just look at it and spot problems or trends. You must use statistics tools available in Numpy.

#City wide implementation

import numpy as np
np_city = ... # Implementation left out
np_city

#the list contains thousands of rows, one for each citizen with height and #weight like this : 

array([[1.64, 71.78],
[1.37, 63.35],
[1.6 , 55.09],
...,
[2.04, 74.85],
[2.04, 68.72],
[2.01, 73.57]])


np.mean(np_city[:,0])
#returns the mean for all rows, for the first column
#returns for example : 1.7472

#mean is the average

np.median(np_city[:,0])
#returns the median for all rows, for the first column
#returns for example : 1.75

#median is the number where exactly half of the people are below and half #are above

np.corrcoef(np_city[:,0], np_city[:,1])
#Corrcoef calculates correlation coefficients
#array([[ 1. , -0.01802],
#[-0.01803, 1. ]])

np.std(np_city[:,0])
#computes the standard deviation for first column for all rows 
#for instance : 0.1992 in our dataset

#You can also use sum(), sort()..

#selecting specific values
#Imagine you have one array with the heights
#and another array with the position of the player (Goal keeper, defense..)
#Now you want to get all the heights of the goal keepers, you do : 

gk_heights=np_heights[np_positions == 'GK']

#it returns True/False for the position and based on this select the correct
#values in the heights array

Generate data

#Arguments for np.random.normal()
#distribution mean
#distribution standard deviation
#number of samples

height = np.round(np.random.normal(1.75, 0.20, 5000), 2)
weight = np.round(np.random.normal(60.32, 15, 5000), 2)
np_city = np.column_stack((height, weight))

Brax

Dude in his 30s starting his digital notepad