Matplotlib is used to…Plot your data into nice histograms and graphs. As you can imagine it is pretty important for Data scientists as it is used to get a visual of your data, explore it and get insights.
import matplotlib.pyplot as plt year = [1950, 1970, 1990, 2010] pop = [2.519, 3.692, 5.263, 6.972] plt.plot(year, pop) plt.show()
Scatter plot
import matplotlib.pyplot as plt year = [1950, 1970, 1990, 2010] pop = [2.519, 3.692, 5.263, 6.972] plt.plot(year, pop) plt.show()
How to put the x-axis on a logarithmic scale :
plt.xscale('log')
Histogram
Used to explore datasets and get an idea about distribution of the data
import matplotlib.pyplot as plt values = [0,0.6,1.4,1.6,2.2,2.5,2.6,3.2,3.5,3.9,4.2,6] plt.hist(values, bins = 3) plt.show()
Bins is used to customize the number of blocks, here we will have 3
How to clean up the plot so you can start fresh between two calls :
plt.clf()
Customization
There are many ways to customize your visualization. Different plot types etc.. Choice depends on data and on the story you want to tell.
Basic plot
import matplotlib.pyplot as plt year = [1950, 1951, 1952, ..., 2100] pop = [2.538, 2.57, 2.62, ..., 10.85] plt.plot(year, pop) plt.show()
You can customize the axis labels :
import matplotlib.pyplot as plt year = [1950, 1951, 1952, ..., 2100] pop = [2.538, 2.57, 2.62, ..., 10.85] plt.plot(year, pop) plt.show() plt.xlabel('Year') plt.ylabel('Population')
You can add a Title
import matplotlib.pyplot as plt year = [1950, 1951, 1952, ..., 2100] pop = [2.538, 2.57, 2.62, ..., 10.85] plt.plot(year, pop) plt.xlabel('Year') plt.ylabel('Population') plt.title('World Population Projections') plt.show()
Change the ticks on the axis and change as well change the display of the labels. you can of course put in lists and pass the lists as the arguments of yticks. you can as well of course change xticks
import matplotlib.pyplot as plt year = [1950, 1951, 1952, ..., 2100] pop = [2.538, 2.57, 2.62, ..., 10.85] plt.plot(year, pop) plt.xlabel('Year') plt.ylabel('Population') plt.title('World Population Projections') plt.yticks([0, 2, 4, 6, 8, 10], ['0', '2B', '4B', '6B', '8B', '10B']) plt.show()
Example from the exercises : We have a list with GDP per capita, another list with life expectency and a list with country total population. Here is our code :
# Specify c and alpha inside plt.scatter() plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2, c=col, alpha=0.8) # Previous customizations plt.xscale('log') plt.xlabel('GDP per Capita [in USD]') plt.ylabel('Life Expectancy [in years]') plt.title('World Development in 2007') plt.xticks([1000,10000,100000], ['1k','10k','100k']) # Additional customizations plt.text(1550, 71, 'India') plt.text(5700, 80, 'China') # Add grid() call plt.grid(True) # Show the plot plt.show()