Python Data Analysis - NumPy -2. Fundamentals of NumPy

1. NumPy array object

The ndarray in NumPy is a multidimensional array object that consists of two parts:
1. Actual data
2. Describe the metadata of these data
Most array operations only modify the metadata section without altering the actual underlying data
NumPy arrays are generally homogeneous (with one special exception, which is heterogeneous), meaning that all element types in the array must be consistent. The advantage of this is that all elements in the array are of the same type, making it easy to determine the storage space required for the array
Like Python, the index of NumPy arrays starts from 0.

# ndarry yes python a class object in 
import numpy as np
a = np.array(5, dtype='int64')
print(a)
print(a.dtype)  #array type 
print(a.shape)  #array dimension 
print(type(a))  #get the type of the object.

The shape property of an array returns a tuple, where the elements in the tuple represent the size of each dimension of the NumPy array. The array in the above example is one-dimensional, so there is only one element in the tuple.

2. Create a multidimensional array

The array() function can generate an array based on a given object, which should be a class array (a list in Python). Such a class array object is the only necessary parameter of the array() function, and many other parameters are optional parameters with default values.

#create a 2 * 2-dimensional array 
#display the dimensions of this array 
import numpy as np
# arange the array created by the function is used as a list element, and the list is passed as a parameter to the array function, creating 2 × 2 arrays 
a = np.array([np.arange(0, 3), np.arange(0, 3)])
print(a)
print(a.shape)
print("ndarray storage method for dimension attributes of objects :", type(a.shape))
print(a.dtype)  # shape property returns a tuple

#create a 3 * 3 dimensional array 
#display the dimensions of this array 
import numpy as np
a = np.array([np.arange(0, 3), (2, 33, 44), [11, 21, 23]])
print(a)
print(a.shape)
print(a.dtype)

#select array elements 
import numpy as np
a = np.array([np.arange(0, 2), (11, 22)])
print(a)
print(a[0, 0])  #for arrays a to use a[m,n] select elements from each array 
print(a[0, 1])
print(a[1, 0])
print(a[1, 1])

3. NumpPy data type

Python supports integer, floating-point, and complex data types, but these types are not sufficient to meet the needs of scientific computing, so NumPy has added many other data types. In practical applications, data types with different accuracies are required, and the memory space they occupy is also different
In NumPy, most data type names end with a number, which represents the number of bits it occupies in memory.

 data type 	 describe 
bool	 using a stored boolean type with a value of True or Fals)
inti	 the integer used by the platform to determine its accuracy (usually int 32 or int64)
int8	        	 an integer ranging from -128 to 127 
int16	        	 an integer ranging from -32768 to 32 767 
int32	        	 integers, ranging from -2147483648 to 2 147 483 647) 
int64	        	 an integer ranging from -9 223 372 036 854 775 808 to 9 223 372 036 854 775 807 
uint8	        	 unsigned integer, range from 0 to 255 
uint16	        	 unsigned integer, range from 0 to 65 535 
uint32	        	 unsigned integer, range from 0 to 4 294 967 295 
uint64	        	 unsigned integer, range from 0 to 18 446 744 073 709 551 615 
float16	        	 half precision floating-point number (16 bits) : one digit represents the sign, five digits represent the exponent, and ten digits represent the mantissa 
float32	        	 single precision floating-point number (32-bit) : among them, 1 digit represents the sign, 8 digits represent the exponent, and 23 digits represent the mantissa 
float64/float	   	 double precision floating-point number (64 bit) : among them, 1 digit represents the sign, 11 digits represent the exponent, and 52 digits represent the mantissa 
complex64	      	 complex numbers, represented by two 32-bit floating-point numbers for the real and imaginary parts, respectively 
complex128/complex	 complex numbers, represented by two 64 bit floating-point numbers for the real and imaginary parts, respectively

import numpy as np
#each data type has a corresponding type conversion function 
a = np.float64(42)
print(a.dtype)
print(a)
b = np.int8(42.0)
print(b.dtype)
print(b)
c = np.bool(42)
print(type(c))
print(c)
d = np.bool(0)
print(type(d))
print(d)
e = np.bool(42.0)
print(type(e))
print(e)
f = np.float(True)
print(type(f))
print(f)
g = np.float(False)
print(type(g))
print(g)
h = np.arange(7, dtype=np.uint16)  #specify the data type in the parameters of the function 
print(h.dtype)
print(h)
# np.int(42.0 + 1.j)    #complex numbers cannot be converted to integers 
# np.float(42.0 + 1.j)  #complex numbers cannot be converted to integers 
j = np.complex(1.5)  #floating point numbers can be converted to complex numbers, with the imaginary part being 0 
print(type(j))
print(j)

import numpy as np
#the data type object is numpy.dtype an instance of a class. data type objects can provide the number of bytes occupied by a single array element in memory 
# dtype class itemsize attribute :
a = np.array([[1, 2], [3, 4]])
print(a.dtype.itemsize)  #the element size of this data type object 
print(a.dtype.byteorder)  #characters indicating the byte order of this data type object

 character encoding :NumPy character encoding can be used to represent data types for compatibility purposes NumPy the predecessor of Numeric . not recommended, but sometimes used. should be prioritized for use dtype objects are used to represent data types, rather than encoding these characters. 

 data type 	 describe 
 integer 	i
 unsigned integer 	u
 single-precision floating-point 	f
 double precision floating-point number 	d
 boolean value 	b
 complex 	D
 character string 	S
unicode character string U
void (empty) 	V

import numpy as np
a = np.array([np.arange(0, 3)], dtype="d")
print(a)
print(a.dtype)
print(a.shape)

import numpy as np
a = np.array([np.arange(0, 3)], dtype="D")
print(a)
print(a.dtype)
print(a.shape)

import numpy as np
a = np.array([np.arange(0, 3)], dtype="float")  #floating point types 
b = np.array([np.arange(0, 3)], dtype="f")      #single precision floating-point number type 
c = np.array([np.arange(0, 3)], dtype="d")      #double precision floating-point number type 
#take two characters as parameters, the first one : data type, second : the number of bytes occupied by this type in memory 
#2. 4 and 8 represent floating-point numbers with precision of 16, 32, and 64 bits, respectively 
d = np.array([np.arange(0, 3)], dtype="f8")
e = np.array([np.arange(0, 3)], dtype="i8")
print(a.dtype)
print(b.dtype)
print(c.dtype)
print(d.dtype)
print(e.dtype)

#complete NumPy the list of data types can be found in the sctypeDict.keys found in () 
print(np.sctypeDict.keys())

# dtype properties of class 
import numpy as np
a = np.array([np.arange(0, 3)], dtype="int16")
print(a)
print(a.dtype.char)  #obtain the character encoding of the data type 
print(a.dtype.type)  #the data type of array elements 
print(a.dtype.str)  #provide a string representation of the data type

A custom data type is a heterogeneous data type that can be used as a structure to record a row of data in a spreadsheet or database. Example: A custom heterogeneous data type was created, which includes a name recorded as a string, a number recorded as an integer, and a price recorded as a floating-point number.

import numpy as np
#create data type 
t = np.dtype([('name', np.str_, 40), ('numitems', np.int32), ('price', np.float32)])
print(t)
#view data type (you can also view the data type of a certain field) 
print(t['name'])
#in use array when creating an array with a function, if no data type is specified in the parameters, it will default to a floating-point type 
#to create an array of custom data types, the data type must be specified in the parameters, otherwise it will trigger TypeError error 
item = np.array([('Meaning of life DVD', 42, 3.14),('Butter', 13, 2.72)], dtype=t)
print(item)
print(item[1])

4. Index and slicing of arrays (one-dimensional, multi-dimensional)

The slicing operation of a one-dimensional array is similar to the slicing operation of a Python list.

import numpy as np
a = np.arange(9)
print(a)
print(a[3:7])
print(a[:7:2])
print(a[::-1])

s = slice(3,7,2)
print(type(s))
print(s.start)
print(s.stop)
print(s.step)
print(a[s])
s = slice(None,None,-1)
print(a[s])

Darray supports slicing operations on multidimensional arrays. For convenience, an ellipsis (...) can be used to indicate traversing the remaining dimensions.

#use first arange function creates a 3 * 3 array without changing its dimensions 
a = np.arange(24).reshape(2, 3, 4)
print(a.shape)
print(a)

#you can use 3d coordinates to select any element, including position, row number, and column number 
print(a[0, 0, 0])

#select elements from the first row and first column of all positions 
print(a[:, 0, 0])
#select all elements at position 1 
print(a[0, :, :])
#multiple colons can use an ellipsis（ ... ）instead, the above code is equivalent to 
print(a[0, ...])
#select all elements in the first position and second row 
print(a[0, 1])

#elements can be selected at intervals between array slices 
print(a[0, 1, ::2])

#all positions, elements in column 2 
print(a[:, :, 1])
print(a[..., 1])
#all positions, elements in line 2 
print(a[:, 1, :])
#element in position 1, column 2 
print(a[0, :, 1])

#all elements in the first position and last column 
print(a[0, :, -1])
#reverse selection of all elements in the first position and last column 
print(a[0, ::-1, -1])
#at position 1, select elements at intervals in the middle of the array slice 
print(a[0, ::2, -1])
#execute the command to flip a one-dimensional array in a multidimensional array, flipping the order of elements in the frontmost dimension 
print(a[::-1])

5. Change the dimension of the array

Travel() method: Complete the flattening operation.

a = np.arange(24).reshape(2, 3, 4)
print(a)
print(a.ravel())

Flatten() method: It has the same functionality as the travel function. However, the flatter function requests memory allocation to save the result, while the travel function only returns a view of the array.

a = np.arange(24).reshape(2, 3, 4)
print(a)
print(a.flatten())

Shape property: Use tuples to set dimensions. In addition to using the reshape function, you can also directly use a positive integer tuple to set the dimensions of an array.

a = np.arange(24).reshape(2, 3, 4)
print(a)
a.shape = (3, 8)
print(a)
b = a.reshape(4,2,3)
print(b)

Transpose() method: In linear algebra, transpose matrices are a common operation.

a = np.arange(24).reshape(2, 3, 4)
a.shape = (3, 8)
print(a)
print(a.transpose())

Resize() method: Resize and reshape functions have the same functionality, but resize directly modifies the array being manipulated.

a = np.arange(24).reshape(2, 3, 4)
print(a)
a.resize((2, 12))
print(a)

6. Combination of arrays

The NumPy array can be combined in various ways, including horizontal, vertical, and depth combinations, using vstack, dstack, hstack, and column_Stack, row_Stack and concatenate functions are used to complete the combination of arrays.

import numpy as np
a = np.arange(12).reshape(3, 4)
b = a ** 3
print(a)
print(b)

#horizontal combination 
c1 = np.hstack((a, b))
print(c)
c2 = np.concatenate((a, b), axis=1)
print(c2)

#vertical combination 
d1 = np.vstack((a, b))
print(d1)
d2 = np.concatenate((a, b), axis=0)
print(d2)

Deep combination: Pass the same tuple as a parameter to the dstack function to complete the deep combination of the array. The so-called depth combination is to stack and combine a series of arrays along the vertical axis (depth) direction.

e = np.dstack((a, b))
print(e)

Column combination: column_Stack() function.

#for one-dimensional arrays, they will be combined in the column direction. 
a1 = np.arange(2)
a2 = a1 * 2
print(a1)
print(a2)
a3 = np.column_stack((a1, a2))
print(a3)

#for two-dimensional arrays, column_stack related to hstack the effect is the same 
a = np.arange(12).reshape(3, 4)
b = a ** 3
c1 = np.column_stack((a, b))
c2 = np.hstack((a, b))
print(c1)
print(c2)
c1 == c2

Row combination: row_Stack() function.

#for two one-dimensional arrays, stack them directly to form a two-dimensional array 
a1 = np.arange(2)
a2 = a1 * 2
print(a1)
print(a2)
a3 = np.row_stack((a1, a2))
print(a3)

#for two-dimensional arrays, row_stack related to vstack the effect is the same 
a = np.arange(12).reshape(3, 4)
b = a ** 3
c1 = np.row_stack((a, b))
c2 = np.vstack((a, b))
print(c1)
print(c2)
c1 == c2

7. Splitting arrays

NumPy arrays can be split horizontally, vertically, or deeply, with related functions including hsplit, vsplit, dsplit, and split. The array can be divided into sub arrays of the same size, or the position in the original array that needs to be split can be specified.

#horizontal segmentation 
a = np.arange(12).reshape(3, 4)
print(a)
b1 = np.hsplit(a, 4)  #divide the array horizontally into three equally sized sub arrays 
print(b1)
b2 = np.split(a, 4, axis=1)  #divide the array horizontally into three equally sized sub arrays 
print(b2)

#vertical segmentation 
a = np.arange(12).reshape(3, 4)
print(a)
b1 = np.vsplit(a, 3)
print(b1)
b2 = np.split(a, 3, axis=0)

#deep segmentation 
a = np.arange(27).reshape(3,3,3)
print(a)
b = np.dsplit(a,3)
print(b)

8. Properties of arrays

In addition to the shape and dtype properties, the ndarray object has many other properties.

a = np.arange(12).reshape(2, 2, 3)
print(a)

print(a.ndim)  # ndim attribute, giving the dimension of the array or the number of array axes

print(a.size)  # size attribute, giving the total number of array elements

print(a.dtype.itemsize)  # itemsize attribute, providing the number of bytes occupied by the elements in the array in memory

print(a.nbytes)  #use nbytes property to view the storage space occupied by the entire array 
print(a.size * a.dtype.itemsize)

a.resize(3, 4)  # T the effect of attributes and transpose same function 
print(a)
print(a.T)

a = np.array([1+1.j, 3+2.j])  #stay NumPy the imaginary part of a complex number is represented by j represented 
print(a)
print(a.real)  # real property, giving the real part of a complex array 
print(a.imag)  # imag attribute, providing the imaginary part of a complex array 
print(a.dtype)
print(a.dtype.str)

a = np.array([1+1.j, 3+2.j])
print(a)
print(a.dtype)
b1 = a.tolist()  #convert to list 
print(b1)
print(type(b1))

a = np.array([1.+1.j,  3.+2.j])
print(a)
print(a.astype(int))
print(a.astype('complex'))

Summary

For arrays, there are many properties that can be used to describe them, and data type is one of them. In NumPy, the data type of an array is represented using objects to refine it. Similar to Python lists, NumPy arrays can also be easily sliced and indexed. On multidimensional arrays, NumPy has significant advantages. There are many operations that involve changing the dimensions of an array - combining, adjusting, setting dimensions, and splitting.