Python Pandas Objects - Pandas Series and Pandas Dataframe

In the last post, we discussed introduction and installation of Python Pandas. In this post, we will learn about pandas’ data structures/objects. Pandas provide two type of data structures:-

Pandas Series

Pandas Dataframe

Pandas Series

Pandas Series is a one dimensional indexed data, which can hold datatypes like integer, string, boolean, float, python object etc. A Pandas Series can hold only one data type at a time. The axis label of the data is called the index of the series. The labels need not to be unique but must be a hashable type. The index of the series can be integer, string and even time-series data. In general, Pandas Series is nothing but a column of an excel sheet with row index being the index of the series.

Different ways of creating/constructing a Pandas Series

We can create a Pandas Series by using the following pandas.Series() constructor:-

pandas.Series([data, index, dtype, name, copy, …])

The parameters for the constructor of a Python Pandas Series are detailed as under:-

Parameters Remarks data : array-like, Iterable, dict, or scalar value Contains data stored in Series. Changed in version 0.23.0: If data is a dict, argument order is maintained for Python 3.6 and later. index : array-like or Index (1d) Values must be hashable and have the same length as data. Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, …, n) if not provided. If both a dict and index sequence are used, the index will override the keys found in the dict. dtype : str, numpy.dtype, or ExtensionDtype, optional Data type for the output Series. If not specified, this will be inferred from data. See the user guide for more usages. copy : bool, default False Copy input data.

How to create an empty Pandas Series?

You can create an empty Pandas Series using pandas.Series() as under:-

import pandas as pd empty_series = pd.Series() print(empty_series) # Output Series([], dtype: float64)

How to create a Pandas Series from a list?

You can create a Pandas Series from a Python list by passing the list to Pandas.Series() as under. In this case, the pandas will set the default index of the Series:-

import pandas as pd data = pd.Series(['a', 'b', 'c', 'd']) print (data) # Output 0 a 1 b 2 c 3 d dtype: object

Values and index of a Pandas Series

A Pandas Series consists of two parts - an index and values, you can check the index and values of a Pandas Series using Series.values and Series.index as under:-

print(data.values) print(data.index) # Output ['a' 'b' 'c' 'd'] RangeIndex(start=0, stop=4, step=1)

Setting an explicit index of a Pandas Series

In the above example, we did not specify any index for our Pandas Series, a default index ranging from 0 to n-1 (n being the length of the data) is created. You can also explicitly define the index of a Pandas Series as under:-

import pandas as pd data_2 = pd.Series(['One', 'Two', 'Three', 'Four'], index=['a', 'b', 'c', 'd']) print(data_2) # Output a One b Two c Three d Four dtype: object

How to create a Pandas Series from Numpy array?

You can also create a Pandas Series from a numpy array by passing the Numpy array to pandas.Series() as under:-

import numpy as np import pandas as pd data = np.array(['a', 'b', 'c', 'd', 'e', 'f']) a_series = pd.Series(data) print(a_series) # Output 0 a 1 b 2 c 3 d 4 e 5 f dtype: object

How to create a Pandas Series from a Python Dictionary?

You can create a Pandas Series from a dictionary by passing the dictionary to pandas.Series() as under. In this case, the index of the Pandas Series will be the keys of the dictionary and the values will be the values of the dictionary. It can be inferred that a Pandas Series is like a specialisation of a Python dictionary. A dictionary is a structure that maps arbitrary keys to a set of arbitrary values, and a Pandas Series is a structure that maps typed keys to a set of typed values. :-

import pandas as pd a_dict = {'one': 1, 'two': 2, 'three': 3, 'four': 4, } a_series = pd.Series(a_dict) print(a_series) # Output one 1 two 2 three 3 four 4 dtype: int64

You can also create a Pandas Series only from desired/selected keys of the Python dictionary by explicitly passing only desired indexes to pd.Series() as under:-

import pandas as pd a_dict = {'one': 1, 'two': 2, 'three': 3, 'four': 4, } a_series = pd.Series(a_dict, index = ['one', 'three']) print(a_series) # Output one 1 three 3 dtype: int64

In the above, example though we have passed the whole dictionary to pd.Series() but the Pandas Series has ignored the keys/values pair for the keys missing in the index argument.

How to create a Pandas Series from scalar data?

You can also, create a Pandas series from a scalar data. But, if you pass a single value with multiple indexes, the value will be same for all the indexes.

a_series = pd.Series(5, index=[100, 200, 300]) print(a_series) # Output 100 5 200 5 300 5 dtype: int64

Pandas Dataframe

Pandas dataframe is a primary data structure of pandas. Pandas dataframe is a two-dimensional size mutable array with both flexible row indices and flexible column names. In general, it is just like an excel sheet or SQL table. It can also be seen as a python’s dict-like container for series objects.

Different ways of creating a Pandas Dataframe

A Pandas Dataframe can be created/constructed using the following pandas.DataFrame() constructor:-

pd.DataFrame([data, index, columns, dtype, name, copy, …])

A Pandas Dataframe can be created from:-

Dict of 1D ndarrays, lists, dicts, or Series

2-D numpy.ndarray

Structured or record ndarray

A Series

Another DataFrame

The parameters for the constuctor of a Pandas Dataframe are detailed as under:-

Parameters Remarks data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame Dict can contain Series, arrays, constants, or list-like objects Changed in version 0.23.0: If data is a dict, column order follows insertion-order for Python 3.6 and later. Changed in version 0.25.0: If data is a list of dicts, column order follows insertion-order for Python 3.6 and later. index : Index or array-like Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided columns : Index or array-like Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided dtype, default None Data type to force. Only a single dtype is allowed. If None, infer copy : bool, default False Copy data from inputs. Only affects DataFrame / 2d ndarray input

How to create an empty Pandas Dataframe in Python?

You can create an empty Pandas Dataframe using pandas.Dataframe() and later on you can add the columns using df.columns = [list of column names] and append rows to it.

>>> import pandas as pd >>> df = pd.DataFrame() >>> df Empty DataFrame Columns: [] Index: [] >>>

How to create a Pandas Dataframe from a single Series object?

We can create a Pandas Dataframe from a sing Pandas Series by passing the series in pd.DataFrame(), the index of the series will become the index of the dataframe and pandas will automatically set 0 as the column name of the Dataframe:-

population_dict = {'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 'Florida': 19552860, 'Illinois': 12882135} population = pd.Series(population_dict) df = pd.DataFrame(population) print (df) # Output 0 California 38332521 Texas 26448193 New York 19651127 Florida 19552860 Illinois 12882135

Since, we have not passed the columns argument, it has been given a default value of 0.

How to create a Pandas Dataframe from a dictionary of two or more (multiple) Pandas Series?

We can create a Pandas Dataframe from multiple Pandas Series by passing the dictionary of multiple series to pd.DataFrame() as under. The keys of the dictionary will comprise the columns of the Pandas Dataframe:-

import pandas as pd area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297, 'Florida': 170312, 'Illinois': 149995} population_dict = {'California': 38332521, 'Texas': 26448193, 'New York': 19651127, 'Florida': 19552860, 'Illinois': 12882135} area = pd.Series(area_dict) population = pd.Series(population_dict) states = pd.DataFrame({'population': population, 'area': area}) print(states) # Output population area California 38332521 423967 Texas 26448193 695662 New York 19651127 141297 Florida 19552860 170312 Illinois 12882135 149995

As you can see here, the resulting index is the union of the keys of the dictionaries and the missing value will be replaced by NaN (not a number). You can optionally pass index (row labels) and columns (column labels) arguments also. A dict of series alongwith specific index will discard all data not matching the passed index.

How to create a Pandas Dataframe from a list of Python Dictionaries?

We can create a Pandas Dataframe from python dictionaries by passing the list of the dictionaries to pd.DataFrame():-

import pandas as pd df = pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}]) print(df) # Output a b c 0 1.0 2 NaN 1 NaN 3 4.0

Here, the Pandas Dataframe has been constructed with columns as a union of keys of the dictionaries and the missing value has been added as ‘NaN’.

How to create a Pandas Dataframe from 2D Numpy array?

A pandas dataframe can also be created from a 2 dimensional numpy array by using the following code:-

import pandas as pd import numpy as np df = pd.DataFrame(np.random.rand(3, 2)) print(df) # Output 0 1 0 0.059926 0.119440 1 0.548637 0.232405 2 0.343573 0.809589

Since, we have not passed the column and index, the default integers have been used for the same. Alternatively, we can pass the columns and index in the constructor itself:-

df = pd.DataFrame(np.random.rand(3, 2), index = ['a','b','c'], columns = ['x', 'y']) print(df) # Output x y a 0.854185 0.871370 b 0.419274 0.123717 c 0.989986 0.811176

How to create a Pandas Dataframe from a Dictionary of Numpy arrays or list?

Alternatively, a Pandas Dataframe can also be created from a dictionary of nd arrays or list, the keys of the dictionaries will be the columns of the dataframe and it will have the default integer index, if no index is passed.

import pandas as pd a_dict = {'one': [1., 2., 3., 4.], 'two': [4., 3., 2., 1.]} df = pd.DataFrame(a_dict) print(df) # Output one two 0 1.0 4.0 1 2.0 3.0 2 3.0 2.0 3 4.0 1.0

How to create Pandas Dataframe from a Numpy structured array?

We can create a Pandas Dataframe from a numpy structured array using the following code:-

import pandas as pd import numpy as np data = np.zeros((2, ), dtype=[('A', 'i4'), ('B', 'f4'), ('C', 'a10')]) data[:] = [(1, 2., 'Hello'), (2, 3., "World")] df = pd.DataFrame(data) print(df) # Output A B C 0 1 2.0 b'Hello' 1 2 3.0 b'World'

How to check the Index and columns of a Pandas Dataframe?

You can get the index and column of a pandas dataframe using the following codes:-