NumPy has a rich collection of universal functions, or ufuncs, that you can use to eliminate loops and optimize your code. Universal functions are basically Python objects that belong to NumPy ufunc class and encapsulate behavior of a function. You have already experienced some of that in NumPy part 1 where we learned arithmetic functions and statistical functions. Many other examples of universal functions can be found in trigonometry, summary statistics, and comparison operations
from __future__ import print_function
import numpy as np
numbers=np.arange(1,11)
numbers
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
np.sin()
and np.log
=> Applied to an array in an element-by-element fashion
np.sin(numbers)
array([ 0.84147098, 0.90929743, 0.14112001, -0.7568025 , -0.95892427, -0.2794155 , 0.6569866 , 0.98935825, 0.41211849, -0.54402111])
np.log(numbers)
array([0. , 0.69314718, 1.09861229, 1.38629436, 1.60943791, 1.79175947, 1.94591015, 2.07944154, 2.19722458, 2.30258509])
np.frompyfunc()
¶You can create your own universal function using three simple steps.
np.frompyfunc()
method. Info about np.frompyfunc()
:
Parameters
----------
func : Python function object
An arbitrary Python function.
nin : int
The number of input arguments.
nout : int
The number of objects returned by `func`.
identity : object, optional
The value to use for the `~numpy.ufunc.identity` attribute of the resulting
object. If specified, this is equivalent to setting the underlying
C ``identity`` field to ``PyUFunc_IdentityValue``.
If omitted, the identity is set to ``PyUFunc_None``. Note that this is
_not_ equivalent to setting the identity to ``None``, which implies the
operation is reorderable.
Returns
-------
out : ufunc
Returns a NumPy universal function (``ufunc``) object.
# creating numpy array
integers = np.arange(1, 101)
print("integers :", *integers)
# creating own function
def modulo(val):
return (val % 10)
# adding into numpy
mod_10=np.frompyfunc(modulo, 1, 1)
# using function over numpy array
mod_integers=mod_10(integers)
print("mod_integers :", *mod_integers)
integers : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 mod_integers : 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
Strides are the indexing scheme in ndarrays and specify the number of bytes to jump to find the next element and give insight into the memory usage of the data.
numbers = np.arange(10, dtype = np.int8)
numbers
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int8)
numbers.strides # This tells us each element is 1 byte apart from the next
(1,)
numbers.shape = 2,5
numbers
array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]], dtype=int8)
numbers.strides # first integer is bytes to next row, second is bytes to next column
(5, 1)
first_array = np.zeros((100000,))
first_array
array([0., 0., 0., ..., 0., 0., 0.])
second_array = np.zeros((100000 * 100, ))[::100]
second_array
array([0., 0., 0., ..., 0., 0., 0.])
first_array.shape
(100000,)
second_array.shape
(100000,)
first_array.strides
(8,)
second_array.strides
(800,)
%timeit first_array.sum()
20.3 Āµs Ā± 133 ns per loop (mean Ā± std. dev. of 7 runs, 10,000 loops each)
%timeit second_array.sum()
171 Āµs Ā± 1.05 Āµs per loop (mean Ā± std. dev. of 7 runs, 10,000 loops each)
Numpy has a special type of arrays called structured or record arrays. They're effective in cases when you're performing computations and want to keep closely related data together. We can use them for grouping data of different types and sizes. The way to achieve that is with data containers called fields. Each data field contains data with the same or different type or size.
student_records = np.array([('Lazaro','Oneal', '0526993', 2009, 2.33), ('Dorie','Salinas', '0710325', 2006, 2.26), ('Mathilde','Hooper', '0496813', 2000, 2.56),('Nell','Gomez', '0740631', 2003, 2.22),('Lachelle','Jordan', '0490888', 2003, 2.13),('Claud','Waller', '0922492', 2004, 3.60),('Bob','Steele', '0264843', 2002, 2.79),('Zelma','Welch', '0885463', 2007, 3.69)],
dtype=[('name', (np.str_, 10)),('surname', (np.str_, 10)), ('id', (np.str_,7)),('graduation_year', np.int32), ('gpa', np.float64)])
student_records
array([('Lazaro', 'Oneal', '0526993', 2009, 2.33), ('Dorie', 'Salinas', '0710325', 2006, 2.26), ('Mathilde', 'Hooper', '0496813', 2000, 2.56), ('Nell', 'Gomez', '0740631', 2003, 2.22), ('Lachelle', 'Jordan', '0490888', 2003, 2.13), ('Claud', 'Waller', '0922492', 2004, 3.6 ), ('Bob', 'Steele', '0264843', 2002, 2.79), ('Zelma', 'Welch', '0885463', 2007, 3.69)], dtype=[('name', '<U10'), ('surname', '<U10'), ('id', '<U7'), ('graduation_year', '<i4'), ('gpa', '<f8')])
student_records[['id','graduation_year']]
array([('0526993', 2009), ('0710325', 2006), ('0496813', 2000), ('0740631', 2003), ('0490888', 2003), ('0922492', 2004), ('0264843', 2002), ('0885463', 2007)], dtype={'names': ['id', 'graduation_year'], 'formats': ['<U7', '<i4'], 'offsets': [80, 108], 'itemsize': 120})
np.sort()
¶Another interesting feature of structured arrays is that you can use a sort function and passed order as a parameter. We need to pass the value of the field according to which we want to sort our array.
students_sorted_by_surname = np.sort(student_records, order='surname')
print('Students sorted according to the surname :\n', students_sorted_by_surname)
Students sorted according to the surname : [('Nell', 'Gomez', '0740631', 2003, 2.22) ('Mathilde', 'Hooper', '0496813', 2000, 2.56) ('Lachelle', 'Jordan', '0490888', 2003, 2.13) ('Lazaro', 'Oneal', '0526993', 2009, 2.33) ('Dorie', 'Salinas', '0710325', 2006, 2.26) ('Bob', 'Steele', '0264843', 2002, 2.79) ('Claud', 'Waller', '0922492', 2004, 3.6 ) ('Zelma', 'Welch', '0885463', 2007, 3.69)]
students_sorted_by_grad_year = np.sort(student_records, order='graduation_year')
print('Students sorted according to the graduation year :\n', students_sorted_by_grad_year)
Students sorted according to the graduation year : [('Mathilde', 'Hooper', '0496813', 2000, 2.56) ('Bob', 'Steele', '0264843', 2002, 2.79) ('Lachelle', 'Jordan', '0490888', 2003, 2.13) ('Nell', 'Gomez', '0740631', 2003, 2.22) ('Claud', 'Waller', '0922492', 2004, 3.6 ) ('Dorie', 'Salinas', '0710325', 2006, 2.26) ('Zelma', 'Welch', '0885463', 2007, 3.69) ('Lazaro', 'Oneal', '0526993', 2009, 2.33)]
Dates and time are critical especially in the time series analytics. It's a specific way of analyzing a sequence of data points collected over internal of time. Time series analytics is used in many different industries like finance, retail, and economics. Some examples of time series analytics used in many different areas and industries include weather data like rainfall measurements and temperature readings, heart rate monitoring, financial data, quarter sales, stock prices and interest rates, and industry forecasts.
If you're familiar with Python, you know that the daytime option is used for date time types. NumPy has a similar data time object called datetime 64. Datetime 64 object is constructed from the ISO 8601 string universal date format. The default date unit supported are years, months, weeks and days. While the time units are hours, minutes, seconds and milliseconds. It also accepts the string NAT which stands for Not a Time Value.
np.datetime64('2022-03-01')
numpy.datetime64('2022-03-01')
np.datetime64('2022-03')
numpy.datetime64('2022-03')
np.busday_count()
¶NumPy has many useful functions that deal with dates called "bus days functions", short for business days functions. For instance, if we want to find out the number of business days in 2022, we would call the busday_count()
function and pass arguments of the current year, 2022, and the following year, 2023.
If we want to find out the number of weekdays in June, 2022, we would pass two arguments 2022-06 and 2022-07.
Parameters
begindates : array_like of datetime64[D]
The array of the first dates for counting.
enddates : array_like of datetime64[D]
The array of the end dates for counting, which are excluded
from the count themselves.
print('Number of weekdays in 2022:')
print(np.busday_count('2022','2023'))
Number of weekdays in 2022: 260
print('Number of weekdays in June 2022:')
np.busday_count('2022-06', '2022-07')
Number of weekdays in June 2022:
22
print("Number of weekdays in October 2022: ", np.busday_count('2022-10', '2022-11'))
Number of weekdays in October 2022: 21
np.is_busday()
¶Another useful function np.is_busday()
. As its name suggests, it checks if the date passed as an argument is a valid business day.
Parameters
----------
dates : array_like of datetime64[D]
The array of dates to process.
np.is_busday(np.datetime64('2022-06-05'))
False
The amount of data needed for machine learning and deep learning models increased tremendously and created the need for vectorized or matrix operations.
There is a field of mathematics called linear algebra that deals with linear equations and their representations in vector spaces and through matrix's. NumPy provides different types of objects to solve mathematical problems and np.linalg
package contains linear algebra functions. In this lesson, we'll cover only a matrix object, leaving scalars, vectors, and tensors behind. Matrix objects inherit all the attributes and functions from ndarray with only one difference: it's two dimensional while ndarray can be any dimension.
import numpy as np
first_array = np.arange(16).reshape(4,4)
first_array
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]])
np.matrix()
first_matrix = np.matrix(first_array)
first_matrix
matrix([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]])
np.identity()
Identity matrix is a matrix where every diagonal element is one and all other elements are zero. It is denoted with a capital letter I and a number in subscript that represents its size. We'll achieve that by typing second_matrix = np.matrix(np.identity 4)
second_matrix = np.matrix(np.identity(4))
second_matrix
matrix([[1., 0., 0., 0.], [0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 1.]])
np.matmul()
This function returns the matrix product of two arrays. While it returns a normal product for 2-D arrays, if dimensions of either argument is >2, it is treated as a stack of matrices residing in the last two indexes and is broadcast accordingly.
On the other hand, if either argument is 1-D array, it is promoted to a matrix by appending a 1 to its dimension, which is removed after multiplication.
matrix_a=np.random.randint(5,size=(2,3))
matrix_a
array([[0, 2, 4], [0, 0, 3]])
matrix_b=np.random.randint(5,size=(3,2))
matrix_b
array([[0, 0], [3, 0], [2, 0]])
np.matmul(matrix_a,matrix_b)
array([[14, 0], [ 6, 0]])
np.matmul(matrix_b, matrix_a)
array([[ 0, 0, 0], [ 0, 6, 12], [ 0, 4, 8]])
ones = np.ones(5).reshape(5, 1)
ones
array([[1.], [1.], [1.], [1.], [1.]])
another = np.array([[3, 6, 13, 4, 4]])
anotherone = np.array([3, 6, 13, 4, 4])
anotherone.shape
(5,)
another.shape
(1, 5)
answer = np.matmul(ones, another)
answer
array([[ 3., 6., 13., 4., 4.], [ 3., 6., 13., 4., 4.], [ 3., 6., 13., 4., 4.], [ 3., 6., 13., 4., 4.], [ 3., 6., 13., 4., 4.]])
np.linalg.inv()
We use numpy.linalg.inv() function to calculate the inverse of a matrix. The inverse of a matrix is such that if it is multiplied by the original matrix, it results in identity matrix.
The matrix must be a square matrix, having the same number of rows and columns.
matrix_c=np.matrix("0 1 2;1 0 3;4 -3 8")
matrix_c
matrix([[ 0, 1, 2], [ 1, 0, 3], [ 4, -3, 8]])
inverse = np.linalg.inv(matrix_c)
inverse
matrix([[-4.5, 7. , -1.5], [-2. , 4. , -1. ], [ 1.5, -2. , 0.5]])
print(matrix_c*inverse)
[[1. 0. 0.] [0. 1. 0.] [0. 0. 1.]]
np.mat()
interprets a given input as a matrix.np.linalg.solve()
gives the solution of linear equations in the matrix form.¶A =np.mat("1 -2 1;0 2 -8;-4 5 9")
A
matrix([[ 1, -2, 1], [ 0, 2, -8], [-4, 5, 9]])
b = np.array([0, 16, -18])
b
array([ 0, 16, -18])
x = np.linalg.solve(A, b)
print("Solution", x)
Solution [58. 32. 6.]
Matrix decomposition or matrix factorization is a process of splitting a matrix into parts. You probably recall prime factorization for math where you were finding which prime numbers multiplied together to make the original number.
Well, this is quite similar. The most famous metric decomposition techniques are: lower-upper decomposition, singular value decomposition, QR decomposition, and Cholesky factorization. We'll start by understanding the basics of Eigenvalues and Eigenvectors. And then explore the most commonly used decompositions, singular value decomposition and QR decomposition.
Eigenvalues are scalar solutions to the equation Ax equals lambda x, where A is a two dimensional matrix, x is a one dimensional vector called a eigenvector, and lambda is eigenvalues
NumPy is equipped with the np.linalg
sub package that has two functions, np.linalg.eig()
, which returns a couple of eigenvalues and eigenvectors and np.linalg.eigvals()
, which returns the eigenvalues.
first_matrix=np.matrix([[4,8],[10,14]])
print("Matrix:\n",first_matrix)
Matrix: [[ 4 8] [10 14]]
eigenvalues, eigenvectors = np.linalg.eig(first_matrix)
print("Eigenvalues:\n", eigenvalues)
print("Eigenvectors:\n", eigenvectors)
Eigenvalues: [-1.24695077 19.24695077] Eigenvectors: [[-0.83619408 -0.46462222] [ 0.54843365 -0.885509 ]]
eigenvalues= np.linalg.eigvals(first_matrix)
print("Eigenvalues:", eigenvalues)
Eigenvalues: [-1.24695077 19.24695077]
np.linalg.svd()
¶Singular Value Decomposition, or SVD, decomposes the matrix into singular vectors and singular values. It's used in computer vision, signal processing, natural language processing, and machine learning.
Singular Value Decomposition means when arr is a 2D array, it is factorized as u and vh, where u and vh are 2D unitary arrays and s is a 1D array of aās singular values. numpy.linalg.svd() function is used to compute the factor of an array by Singular Value Decomposition.
This returns a tuple that contains matrices U and V and diagonal values for the matrix sigma.
Syntax : numpy.linalg.svd(a, full_matrices=True, compute_uv=True, hermitian=False)
Parameters :
a (ā¦, M, N) array : A real or complex array with a.ndim >= 2.
full_matrices(bool, optional) : If True (default), u and vh have the shapes (ā¦, M, M)
and (ā¦, N, N), respectively. Otherwise, the shapes are (ā¦, M, K) and (ā¦, K, N),
respectively, where K = min(M, N).
compute_uv(bool, optional) : Whether or not to compute u and vh in addition to s.
Its default value is True.
hermitian(bool, optional) : If True, a is assumed to be Hermitian (symmetric if real-valued),
enabling a more efficient method for finding singular values. Its default value is False.
A = np.mat("3 1 4;1 5 9;2 6 5")
print("A\n", A)
U, Sigma, V = np.linalg.svd(A, full_matrices=False)
print("U: ",U)
print("Sigma : ",Sigma)
print("V : ", V)
A [[3 1 4] [1 5 9] [2 6 5]] U: [[-0.32463251 0.79898436 0.50619929] [-0.75307473 0.1054674 -0.64942672] [-0.57226932 -0.59203093 0.56745679]] Sigma : [13.58235799 2.84547726 2.32869289] V : [[-0.21141476 -0.55392606 -0.80527617] [ 0.46331722 -0.78224635 0.41644663] [ 0.86060499 0.28505536 -0.42202191]]
print("Product\n", U * np.diag(Sigma) * V)
Product [[3. 1. 4.] [1. 5. 9.] [2. 6. 5.]]
M=Q*R
¶QR decomposition is used to decompose square or rectangular matrix M, internal or outgoing or matrix Q, an upper triangle matrix R.
np.linalg.qr()
QR factorization of a matrix is the decomposition of a matrix say āAā into āA=QRā where Q is orthogonal and R is an upper-triangular matrix. We factorize the matrix using numpy.linalg.qr() function.
Syntax : numpy.linalg.qr(a, mode=āreducedā)
Parameters :
a : matrix(M,N) which needs to be factored.
A
matrix([[3, 1, 4], [1, 5, 9], [2, 6, 5]])
b = np.array([1,2,3]).reshape(3,1)
q, r = np.linalg.qr(A)
x = np.dot(np.linalg.inv(r), np.dot(q.T, b))
x
matrix([[ 0.26666667], [ 0.46666667], [-0.06666667]])
np.linalg.solve(A,b)
array([[ 0.26666667], [ 0.46666667], [-0.06666667]])
np.polynomial()
¶The word polynomial comes from the Greek word poly, meaning many, and the Latin word nomial, meaning term. So it's name means many terms. Just a quick reminder from math, polynomial is an expression involving the sum of powers in one or more variables multiplied by coefficients.
Examples of polynomial functions are linear, quadratic, cubic, and quartic functions. NumPy contains the sub module polynomial which provides functions and classes for working with polynomials.
from numpy.polynomial import polynomial
np.polynomial.Polynomial()
¶Parameters:
coef : array_like
Polynomial coefficients, in increasing order.
For example, (1, 2, 3)
implies P_0 + 2P_1 + 3P_2
where the P_i are a graded polynomial basis.
first_polynomial = np.polynomial.Polynomial([2, -3, 1])
first_polynomial
np.polynomial.Polynomial.fromroots()
¶Generate a monic polynomial with given roots.
Return the coefficients of the polynomial
where the r_n are the roots specified in roots. If a zero has multiplicity n, then it must appear in roots n times. For instance, if 2 is a root of multiplicity three and 3 is a root of multiplicity 2, then roots looks something like [2, 2, 2, 3, 3]. The roots can appear in any order.
If the returned coefficients are c, then
The coefficient of the last term is 1 for monic polynomials in this form.
Parameters rootsarray_like Sequence containing the roots.
second_polynomial = np.polynomial.Polynomial.fromroots([1, 2])
second_polynomial
np.roots()
¶return the roots of a polynomial with coefficients given in p. The values in the rank-1 array p are coefficients of a polynomial. If the length of p is n+1 then the polynomial is described by:
p[0] x**n + p[1] x*(n-1) + ā¦ + p[n-1]x + p[n]
Syntax : numpy.roots(p)
Parameters :
p : [array_like] Rank-1 array of polynomial coefficients.
Return : [ndarray] An array containing the roots of the polynomial.
first_polynomial.roots()
array([1., 2.])
second_polynomial.roots()
array([1., 2.])
np.polyval(poly, x)
¶evaluates a polynomial at specific values.
If āNā is the length of polynomial āpā, then this function returns the value
Parameters :
p : [array_like or poly1D] polynomial coefficients are given in decreasing order
of powers. If the second parameter (root) is set to True then array values are the
roots of the polynomial equation.
For example : poly1d(3, 2, 6) = 3x2 + 2x + 6
x : [array_like or poly1D] A number, an array of numbers, for evaluating āpā.
np.polyval([5,4,3,2,1], 1)
15
third_polynomial = np.polynomial.Polynomial([1,2,3,4,5])
third_polynomial
np.polynomial.polynomial.Polynomial.integ(m=1, k=[], lbnd=None)
¶Return a series instance that is the definite integral of the current series.
Parameters
mnon-negative int
The number of integrations to perform.
karray_like
Integration constants. The first constant is applied to the first integration,
the second to the second, and so on. The list of values must less than or equal to m
in length and any missing values are set to zero.
lbndScalar
The lower bound of the definite integral.
Returns
new_seriesseries
A new series representing the integral. The domain is the same as the domain of the
integrated series.
integral=third_polynomial.integ()
integral
np.polynomial.polynomial.Polynomial.deriv(m=1)
¶Return a series instance of that is the derivative of the current series.
Parameters
mnon-negative int
Find the derivative of order m.
Returns
new_series
A new series representing the derivative. The domain is the same as the domain of the
differentiated series.
integral.deriv()
derivative=third_polynomial.deriv()
derivative
Linear regression is a simple, one of the most important and widely used model for machine learning algorithms. Machine learning algorithms are divided into two categories, supervised machine learning algorithms and unsupervised machine learning algorithms.
For unsupervised learning algorithms, the system will only use input data without any labels. For supervised learning algorithms, the input data set and the corresponding output or true prediction are available. And these algorithms try to find the relationship between inputs and outputs.
Linear regression is a supervised machine learning method most commonly used for finding out the relationship between variables and forecasting. By definition, linear regression is estimating an unknown variable in a linear fashion, based on some other known variables.
Visually we fit a line or a hip plane in higher dimensions through our data points. It can be applied to various kinds of business and scientific problems. For example, stock prices, property price, (indistinct) prices, sales and GDP growth rate predictions.
Let's explore one of the most famous problems, prediction of property prices on a simplification. We can get statistical data for the house price indices and pick the average house price from 2012 to 2021. We want to find out the average house price for the year 2022. We can assume their relation is squared, so we want to find the polynomial y equals ax squared plus bx plus B to represent the relations. Y will represent price at year x.
year = np.arange(1,11)
price = np.array([129000, 133000, 138000, 144000, 142000, 141000, 150000, 135000, 134000, 137000])
year
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
np.polyfit(x, y, deg)
¶Fit a polynomial p(x) = p[0] * x**deg + ... + p[deg] of degree deg to points (x, y).
Returns a vector of coefficients p that minimises the squared error in the order
deg, deg-1, ā¦ 0.
The Polynomial.fit class method is recommended for new code as it is more
stable numerically. See the documentation of the method for more information.
Parameters
xarray_like, shape (M,)
x-coordinates of the M sample points (x[i], y[i]).
yarray_like, shape (M,) or (M, K)
y-coordinates of the sample points. Several data sets of sample points sharing the same
x-coordinates can be fitted at once by passing in a 2D-array that contains one
dataset per column.
degint
Degree of the fitting polynomial
a, b, c = np.polyfit(year, price, 2)
print ("a:",a)
print ("b:",b)
print ("c:",c)
a: -594.696969696968 b: 7032.575757575749 c: 122516.66666666664
print("Estimated price for 2022:",a*11**2 + b*11 + c )
Estimated price for 2022: 127916.66666666674
import matplotlib.pyplot as plt
plt.plot(year,price, color = 'blue')
plt.scatter(year,price, color = 'blue')
plt.scatter(11, a*11**2 + b*11 + c ,color='red')
plt.title('Linear regression')
plt.xlabel('year')
plt.ylabel('average house price')
Text(0, 0.5, 'average house price')