Python_Lecture9_DataStuctureswithPandas.pptx

LECTURE 9
DATA STRUCTURES
AND ANALYSIS WITH
PANDAS

Thus, whenever you see pd. in code, it’s referring to pandas. You
may also find it easier to import Series and DataFrame into the
local namespace since they are so frequently used:

Introduction to pandas Data Structures
Series.
■ A Series is a one-
dimensional array-like
object containing a
sequence of values (of
similar types to NumPy
types) and an associated
array of data labels, called
its index. The simplest
Series is formed from only
an array of data:

■ Using NumPy functions or NumPy-like operations, such as
filtering with a boolean array, scalar multiplication, or applying
math functions, will preserve the index-value link:

■ Another way to think about a Series is as a fixed-length, ordered dict, as it is a
map ping of index values to data values. It can be used in many contexts
‐
where you might use a dict:
■ Should you have data contained in a Python dict, you can create a Series
from it by passing the dict:

Data Frame
■ A DataFrame represents a rectangular table of data and
contains an ordered collec tion of columns, each of which
‐
can be a different value type (numeric, string, boolean, etc.).

Data Frame
■ When you are assigning lists or arrays to a column, the value’s
length must match the length of the DataFrame. If you assign a
Series, its labels will be realigned exactly to the DataFrame’s index,
inserting missing values in any holes:

Possible data inputs to Data Frame constructor

Index Objects
■ pandas’s Index objects are responsible for holding the axis labels
and other metadata (like the axis name or names). Any array or
other sequence of labels you use when constructing a Series or
DataFrame is internally converted to an Index:

Reindexing
■ An important method on pandas objects is reindex, which
means to create a new object with the data conformed to a
new index.

Dropping Entries from an Axis
■ Dropping one or more entries from an axis is easy if you already have an index array
or list without those entries. As that can require a bit of munging and set logic, the
drop method will return a new object with the indicated value or values deleted from
an axis:

Indexing, Selection and Filtering
■ Series indexing (obj[...]) works analogously to NumPy array
indexing, except you can use the Series’s index values
instead of only integers
■ Slicing with labels behaves differently than normal Python
slicing in that the end point is inclusive
‐
■ Setting using these methods modifies the corresponding
section of the Series
■ Indexing into a DataFrame is for retrieving one or more
columns either with a single value or sequence

Selection with loc and iloc
■ For DataFrame label-indexing on the rows, the special
indexing operators loc and iloc are introduced. They
enable you to select a subset of the rows and columns
from a DataFrame with NumPy-like notation using either
axis labels (loc) or integers (iloc).

Indexing options with DataFrame

Arithmetic and Data Alignment
■ An important pandas feature for some applications is the behavior of arithmetic
between objects with different indexes. When you are adding together objects, if any
index pairs are not the same, the respective index in the result will be the union of
the index pairs. This is similar to an automatic outer join on the index labels
■ The internal data alignment introduces missing values in the label locations that
don’t overlap. Missing values will then propagate in further arithmetic computations.

Function Application and Mapping
■ NumPy ufuncs (element-wise array methods) also work with pandas objects
■ Another frequent operation is applying a function on one-dimensional arrays to each
column or row. DataFrame’s apply method does exactly this
■ Many of the most common array statistics (like sum and mean) are DataFrame
methods, so using apply is not necessary.

Sorting and Ranking
■ Sorting a dataset by some criterion is another important built-in operation. To sort
lexicographically by row or column index, use the sort_index method, which returns a new,
sorted object:

■ Ranking assigns ranks from one through the number of valid data points in an array. The
rank methods for Series and DataFrame are the place to look; by default rank breaks ties by
assigning each group the mean rank:

Tie-breaking methods with rank

Axis Indexes with Duplicate Labels

Summarizing and Computing
Descriptive Statistics

Descriptive and summary statistics

Unique Values, Value Counts and
Membership

Python_Lecture9_DataStuctureswithPandas.pptx

More Related Content

Similar to Python_Lecture9_DataStuctureswithPandas.pptx

Recently uploaded

Python_Lecture9_DataStuctureswithPandas.pptx