With data analysis, filtering data is a crucial task. This post will help you understand how to do filtering with pandas.

First, let’s load some data for our demo. This is movie data set which contains 5 columns ‘movie_title’, ‘director_name’, ‘imdb_score’, ‘duration’, ‘genres’.

how filtering works

Filter work base on logical operator result — that’s it. For example we need to find all movies directed by ‘James Cameron’. Let see follow logical operator on pandas, it returns a series of True and False .

Now if we put the result of the above logical inside selection operator, pandas will only select rows where the logical returns True .

what are good movies to see ?

IMDB score of a movie represents how people like that movie. So let find out movies with imdb_score > 9 only. I see ‘The Godfather’ here :)

find all movies which duration from 60 mins to 90 mins ?

To solve this question, we need to combine 2 logical operators, but the way it works is the same, pandas will select rows which have logical result is True .

Now let go further more complicated.

find all sci-fi movies

First of all let see some data from ‘genres’ column.

Each movie could have multiple genres, so we should create a separate function to check if ‘Sci-Fi’ in genres or not.

The apply() function returns a series of True or False corresponding to a movie is an sci-fi movie or not. Now let’s apply the result of above function to selector operation, we will have final result.

so in total how many sci-fi movie we have ?

Just put sum() on result of apply() we will have result needed.

that’s it

Hope you clear and understand how to filter data with pandas. Have any discuss or comment, please comment below.