groupby is one of several powerful functions in pandas. Using groupby() with just one function, we could have answer for a fairly complicated question.

Let say we have a data frame about movies contain 3 columns: “director_name”, “movie_title”, “movie_facebook_likes”.

The million dollar question

Suppose we are doing data analysis on the above data table and have a question need to have answer: “Based on data above of 3 directors, who has the largest number of Facebook likes?”

So let’s think about possible steps to solve the question:

Step 1: Go to each row and find all row which has director name is “James Cameron”

Step 2: With each row found in step 1, count facebook likes

Step 3: Repeat step 1 and 2 for remain directors — “Quentin Tarantino” and “Christopher Nolan”

Have any other better way with pandas?

Groupby

Pandas has a function called groupby() , combining code group together by row which has the same value in ‘director_name’ column

We could imagine after groupby() function above, the original table is split into multiple small tables based on each unique value in columns ‘director_name’.

Apply

After grouping all rows which have the same director_name, we need to apply function sum() to accumulate facebook likes for each director. After the function sum() is executed, a new data frame is built which has the index value of ‘director_name’ column.

The Answer

So now we ready to have the answer for the above question, just one more step to sort _values() for column ‘movie_facebook_likes’. So it seems “Quentin Tarantino” is winner.

df_imdb.groupby('director_name').sum().sort_values('movie_facebook_likes', ascending=False)

That it

Hope you have some fun time with pandas groupby() function, one of very powerful tool.

Full course on master data sciences

Other post in my series Pandas made easy :

Pandas Made Easy: Selecting Data