Data visualization is extremely important in every field of science, especially when it comes to data science. It is easier for the human brain to understand and remember pictures than it is to remember numbers and words. Visualization also makes it effortless to detect trends, patterns, and relationships in groups of data.
Matplotlib is the most popular visualization library that focuses on generating static publicly quality 2D and 3D graphs, as well as animated and interactive visualizations.
Some of the many advantages of Matplot library include, it's easy to get started. Matplotlib is extremely powerful because it allows users to create numerous and diverse plot types. It can be used in variety of user interfaces such as IPhython shells, Python scripts, Jupyter notebooks, as well as web applications and GUI toolkits. It has support for LaTeX-formatted labels and texts and offers control of every aspect of a figure or a plot. It supports high quality output in various formats including PNG, SVG and PDF.
One of the key features of Matplotlib that I find valuable is the possibility to use a programmatic approach in which graphs are created by writing code. You control every aspect of their appearance instead of manually creating graphs using a graphical user interface. This is is extremely important because programmatically created graphics can be made reproducible or easily adjusted when data is updated and are time-saving, as there is no need to redo lengthy and tedious procedures in a GUI. Finally, Matplotlib is open source and therefore data scientists and developers can use it for free.
%matplotlib notebook
%matplotlib inline
%matplotlib notebook -> interactive features
%matplotlib inline -> prints in the notebook directly
import numpy as np
import matplotlib.pyplot as plt
average_monthly_temperatures = [39.1, 40.1, 48.0, 50.4, 60.3, 73.7, 80.0, 76.9, 68.8, 57.9, 53.0, 39.2]
fig = plt.figure()
plt.plot(average_monthly_temperatures)
plt.show()
average_monthly_temperatures = [39.1, 40.1, 48.0, 50.4, 60.3, 73.7, 80.0, 76.9, 68.8, 57.9, 53.0, 39.2]
months=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
fig = plt.figure()
plt.plot(months,average_monthly_temperatures)
plt.title("Average monthly temperatures")
plt.xlabel("months")
plt.ylabel("temperature")
plt.show()
fig.savefig('average_monthly_temperatures.png')
fig.savefig('average_monthly_temperatures.pdf')
!ls -lh average_monthly_temperatures.png
-rw-r--r-- 1 evancarr staff 31K Oct 22 16:17 average_monthly_temperatures.png
!ls -lh average_monthly_temperatures.pdf
-rw-r--r-- 1 evancarr staff 13K Oct 22 16:17 average_monthly_temperatures.pdf
The figure is by definition, a high level Matplotlib object that contains all the elements of the output graph. We can arrange multiple graphs in a different ways, so to form a figure. Every element of a figure is customizable. As we can see from the picture, axes is a subsection of a figure where our graph is plotted.
Axes contains a title, x-label, and y-label. Our figure contains only one axis. But a figure can have multiple axis. Each represented one or more graphs. The important thing to notice is the difference between axis and axes. Axis are the number lines that show the scale of the plotted graphs. As we have previously seen in our two dimensional graph, we had two axis, x-axis and y-axis. For three dimensional graphs, we will have three axis. A great thing to add to our graph is a grid that is drawn along major ticks of x and y-axis. So we can easily read the coordinates of various points and understand the function that is drawn.
x = np.arange(3)
plt.plot(x,x)
plt.plot(x,2*x)
plt.plot(x,3*x)
plt.grid(True)
plt.show()
x = np.linspace(0,5,5)
y=2*x
plt.plot(x,y)
plt.show()
The central principle is to create figure objects and then call methods over this. We are going to type, fig = plt.figure
. Now, when we run the code, you see that the figure object is created. Now, we can add axes to figure by typing, axes = fig.add_axes()
. As arguments for add axes method, we'll pass in a list that contains floating points that represent the left of the axis, bottom of the axis, width and height. So we'll know exactly where our axes will be placed. Next we want to plot on that set of axes. We'll do this by typing, axes.plot(x, y)
, and plt.show()
.
fig = plt.figure()
axes = fig.add_axes([0.1,0.1,0.8,0.8])
axes.plot(x,y)
plt.show()
Subplots are group of small axis that can stand together between a single figure. They can be in sets, grids or plots or some other complicated layout.
There are two ways to do this: plt.subplot()
function that creates only a single subplot between a grid and plt.subplots()
function that creates a full grid of subplots at once.
The plt.subplot()
function takes these arguments, number of rows, number of columns and the third argument is the index of the plot we're referring to.
fig=plt.figure()
x=np.arange(3)
y=2*x
plt.subplot(2,2,1)
plt.plot(x,y,'b')
plt.subplot(2,2,2)
plt.plot(x,1-y,'r')
plt.subplot(2,2,3)
plt.plot(x,2-y,'g')
plt.subplot(2,2,4)
plt.plot(x,y,'y')
plt.show()
This way of creating subplots can become tedious. In cases where we are creating a large grid of subplots, the way to do this elegantly is with the plt.subplots()
function. We can create all four subplots with just one line of code by typing fig, axes = plt.subplots(2, 2, figsize=(6,6)
Now notice here we still have to copy the plot function four times in order to plot our four graphs.
fig, axs = plt.subplots(2, 2, figsize=(6,6))
axs[0, 0].plot(x, y, 'b')
axs[0, 1].plot(x, 1-y, 'r')
axs[1, 0].plot(x, 2-y, 'g')
axs[1, 1].plot(x, y, 'y')
plt.show()
The legend is the description of each of the graphs on given axis. Matplotlib has a built-in function to create a legend called plt.legend()
.
x = np.linspace(1,10)
first_line = plt.plot(x, x+1, label= 'y=x+1')
plt.legend()
<matplotlib.legend.Legend at 0x7fbf72603460>
second_line, = plt.plot(x,x+2,linestyle='solid')
second_line.set_label('y=x+2')
third_line, = plt.plot(x,x+3,linestyle='dashed')
third_line.set_label('y=x+3')
plt.legend()
<matplotlib.legend.Legend at 0x7fbf7266f9d0>
The bbox_to_anchor
keyword gives a great degree of control for manual legend placement. For example, if you want your axes legend located at the figure's top right-hand corner instead of the axes' corner, simply specify the corner's location and the coordinate system of that location:
ax.legend(bbox_to_anchor=(1, 1),
bbox_transform=fig.transFigure)
first_plot,=plt.plot([1,2,3],label='first plot')
second_plot,=plt.plot([3,2,1],label='second plot')
third_plot,=plt.plot([2,2,2],label='third plot')
plt.legend(bbox_to_anchor=(1.02, 1.0), borderaxespad=0);
SECTION CHALLENGE: plot a graph according to directions given in video
first_student = [1, 4, 7, 3, 8, 2, 5, 3, 6, 8, 3, 2]
second_student = [4, 2, 7, 5, 2, 3, 1, 7, 5, 3, 4, 3]
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'June',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
fig = plt.figure(figsize=(10,8), facecolor='#4a4a4a')
ax = plt.axes()
ax.set_facecolor('#f0f2fc')
plt.plot(months, first_student, label="Student 1", linestyle='dashed', linewidth=3, marker='d', markersize=9);
plt.plot(months, second_student, label="Student 2", linestyle='dotted', linewidth=3, marker='h', markersize=9);
ax.set_ylabel("Number of Books Read", fontsize=20, color="white")
ax.set_xlabel('Months', fontsize=20, color="white")
plt.title("Books Read By Students", fontsize=24, color="white", pad=14)
plt.xticks(fontsize = 15, color="white")
plt.yticks(fontsize = 15, color="white")
plt.legend(bbox_to_anchor=(1.25, 1.0), borderaxespad=0, fontsize=14);
plt.show()
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(dpi=720)
first_student_books=[2,4,7,3,1,5,1,0,2,3,6,4]
second_student_books=[0,5,3,1,6,4,1,1,3,4,3,2]
first_line=plt.plot(range(1,13),first_student_books)
second_line=plt.plot(range(1,13),second_student_books)
plt.xlabel('months')
plt.ylabel('books read')
plt.legend(['books first student','books second student'],loc=1)
plt.title('Books read by students')
plt.show()
first_figure = plt.figure()
x = np.linspace(1, 10)
y = np.linspace(1, 10)
ax=first_figure.add_axes([0,0,1,1])
ax.plot(x,y, color='red');
second_figure = plt.figure()
ax=second_figure.add_axes([0,0,1,1])
ax.plot(x,y, color='g');
third_figure = plt.figure()
ax=third_figure.add_axes([0,0,1,1])
ax.plot(x,y, color='#FF00FF')
[<matplotlib.lines.Line2D at 0x7fbf81a100a0>]
plt.plot(x,2*x,linestyle='solid')
plt.plot(x,3*x,linestyle='dashed')
plt.plot(x,4*x,linestyle='dashdot')
plt.plot(x,5*x,linestyle='dotted')
[<matplotlib.lines.Line2D at 0x7fbf726c2dd0>]
plt.plot(x,2*x,linestyle='-')
plt.plot(x,3*x,linestyle='--')
plt.plot(x,4*x,linestyle='-.')
plt.plot(x,5*x,linestyle=':')
[<matplotlib.lines.Line2D at 0x7fbf72739fc0>]
plt.plot(x, 3*x ,'-.g');
When we use Matplotlib for plotting, it will automatically create a linear scale. Sometimes, creating plots on a linear scale won't give us clear and valuable results. The solution for our struggle is to use one of the three types of non-linear scales: logarithmic
, symmetrical logarithmic
, or logit scale
.
We will use logarithmic scale
when we have a series of values where each value equals the previous value multiplied with a constant. In that case, values can be represented by equidistant ticks on the logarithmic scale.
Symmetrical logarithmic
scale is used when we want to represent non-positive numbers. The logarithmic scale is one of the most used nonlinear scales. Usually we'll use powers of 10. We could also use some other bases that would narrow or widen the spacing of plotted elements.
Use:
plt.xscale()
plt.yscale()
Change the base of the scale by passing basex=
or basey=
.
# $ makes label italics
x = np.linspace(1, 10, 1024)
plt.xscale('log')
plt.yscale('log')
plt.plot(x, x, label ='$f(x)=x$')
plt.plot(x, 10**x, label ='$f(x)=10^x$')
plt.plot(x, np.log(x),label ='$f(x)=log(x)$')
plt.legend()
plt.show()
Matplotlib.ticker
is a Matplotlib module that provides a general tick management system so we can have full control of tick placement using different classes.
For info on setting a formatter and how to use these aspects of Matplotlib.
from matplotlib import ticker
from matplotlib.ticker import (MultipleLocator, AutoMinorLocator)
x = np.arange(0.0, 50.0, 0.1)
y = x**2
fig, ax = plt.subplots()
ax.plot(x,y)
formatter = ticker.FormatStrFormatter('%1.2f')
ax.xaxis.set_major_locator(MultipleLocator(10))
ax.xaxis.set_major_formatter(formatter)
ax.xaxis.set_minor_locator(MultipleLocator(2))
plt.show()
Setting axis limits:
ax.set_xlim([low, high])
ax.set_ylim([low, high]
x = np.arange(0.0, 50.0, 0.1)
y = x**2
fig, ax = plt.subplots()
ax.plot(x,y)
ax.set_xlim([0, 50])
ax.set_ylim([0, 2500])
plt.show()
x = np.arange(0.0, 50.0, 0.1)
y = x**2
fig, ax = plt.subplots()
ax.plot(x,y)
ax.set_xticks([0,5,10,15,20,25,30,35,40,45,50])
ax.set_yticks([0,250,500,750,1000,1250,1500,1750,2000,2250,2500])
plt.show()
Annotations are used to describe specific details on the plot so we can draw attention to points of interest on the graph, call out surprising features, or explain significance of a wiggle. Matplotlib provides a few modules to add text, arrows, and shapes on our plot. So we can add text annotations, arrows, graphical annotations, and even image annotations.
Use:
ax.annotate()
arrowprops=dict(linewidth=, arrowstyle='')
x = np.linspace(0, 10)
y1 = x
y2 = 8-x
fig, ax = plt.subplots()
plt.plot(x,y1,label='supply')
plt.plot(x,y2,label='demand')
ax.annotate("Equilibrium", xy=(4,4), xytext=(3,2), \
fontsize=12, fontweight='semibold',\
arrowprops=dict(linewidth=2, arrowstyle="<|-"))
plt.xlabel('quantity',fontsize=12)
plt.ylabel('price',fontsize=12)
plt.legend()
plt.show()
x = np.linspace(0, 10)
y1 = x
y2 = 8-x
# Plot the data
fig, ax = plt.subplots()
plt.plot(x,y1,label='supply')
plt.plot(x,y2,label='demand')
# Annotate the equilibrium point with arrow and text
bbox_props = dict(boxstyle="rarrow", fc=(0.8, 0.9, 0.9), lw=2)
t = ax.text(2,4, "equilibrium", ha="center", va="center", rotation=0,
size=10,bbox=bbox_props)
# Label the axes
plt.xlabel('quantity',fontsize=12)
plt.ylabel('price',fontsize=12)
plt.legend()
plt.show()
We can also add different graphical annotations using the Matplotlib class called patches
. The most used shapes are circle, ellipse, wedge and polygon.
First, you have to import circle, polygon, and ellipse from the patches
class and PatchCollection
from collections
.
We'll define fig and ax by calling plt.subplots()
function and patches
. Draw a circle by passing as parameters coordinates for the center and radius. And for the triangle, pass coordinates of three points to polygon patch.
Lastly, we just need to draw the patches and show our figure.
from matplotlib.patches import Circle, Polygon
from matplotlib.collections import PatchCollection
fig, ax = plt.subplots()
patches = []
# draw circle and triangle
circle = Circle((.42,.75),0.12)
triangle = Polygon([[.1,.5],[.2,.7],[.3,.54]], True)
patches += [circle,triangle]
# Draw the patches
colors = 100*np.random.rand(len(patches)) # set random colors
p = PatchCollection(patches)
p.set_array(np.array(colors))
ax.add_collection(p)
# Show the figure
plt.show()
preferred_workoption = [10.7, 47.6, 38.8, 2.9]
colors = ['b', 'g', 'r', 'c']
labels = ['Collocated', 'Hybrid', 'Fully remote', 'Not applicable']
explode = (0.1, 0.2, 0.1, 0.1) # 0 leaves wedge in place, 0+ pushes it out that much
plt.pie(preferred_workoption, colors=colors, labels=labels,
explode=explode, autopct='%1.1f%%',
counterclock=False, shadow=True)
plt.title('Preferred workoption')
plt.show()
preferred_workoption = [10.7, 47.6, 38.8, 2.9]
colors = ['b', 'g', 'r', 'c']
labels = ['Collocated', 'Hybrid', 'Fully remote', 'Not applicable']
widths= [0.6, 0.6, 0.6, 0.6]
plt.bar(range(0, 4), preferred_workoption, width=widths, color=colors, align='center')
plt.title('Preferred workoption')
plt.show()
from mpl_toolkits.mplot3d import Axes3D # For 3D plots
X = np.random.randn(10000)
plt.hist(X, bins = 20)
plt.show()
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
theta = np.linspace(-3 * np.pi, 3 * np.pi, 200)
z = np.linspace(-3, 3, 200)
r = z**3 + 1
x = r * np.sin(theta)
y = r * np.cos(theta)
ax.plot(x, y, z, label='Parametric Curve')
ax.legend()
plt.show()