Select a subset of data while doing data analysis with pandas is an essential skill which you should master before move on to other things. In this post, I will show you how to select data in a clear and easy way.

For this demo, we will be working with a small version of “IMDB_movie” data set, which contain only 3 columns: movie_title , director_name , and title_year

Overview

Select a subset of the above data frame could be in one of following cases:

select one column (with the . operator)

operator) select multiple columns (with [] operator)

operator) select one or multiple rows (with loc or iloc operator)

or operator) select one or multiple rows within one or multiple columns (with loc or iloc operator)

We will go to each operator . [] .loc .iloc and see how it could be used.

. operator to select one columns

After load data with the read_csv() function above, all column inside data the frame become its property, so it is natural to access one column with . operator like object access its own property.

[ ] operator to select multiple columns

To select multiple columns, you use the [] operator and put in columns list, that is why you will see a double [[]]

.loc operator to select multiple rows between multiple columns

Before use .loc we should understand about pandas data frame label, image below show columns label and rows label in a data frame.

.loc operator uses the following structure to access rows and columns

Let try with an example with one row and one column

Another example to select multiple rows within multiple columns

.iloc operator to select multiple rows between multiple columns

Befor using .iloc , we should understand about pandas data frame position. The image below shows the system of row position and column position. Imagine each row / column has a position number, and this number starts from 0.

Not like .loc which works based on label , .iloc select rows and columns based on position system.

Try with the example below to select one row, one column

and select multiple rows within multiple columns

Wrapping Up

Above example show that the way we select rows are the same for .loc and .iloc. That is because in this case (and normally) the label and position for rows are same [0, 1, 2 …]

But you should remember that the panda's label could be reassigned and not use default system with [0,1,2 …] and at that time, label and position on rows are different.

That it

Hope you enjoyed and had fun. I have a series on pandas and will be publishing it soon. If you want to be notify, you could follow me.

Full course on master data sciences

Other post in my series:

Pandas Made Easy: groupby