Python tutorial May 23, 2019

Data science best practices with pandas (video tutorial)

The pandas library is a powerful tool for multiple phases of the data science workflow, including data cleaning, visualization, and exploratory data analysis. However, the size and complexity of the pandas library makes it challenging to discover the best way to accomplish any given task.

In this in-depth tutorial, which I presented at PyCon 2019, you'll use pandas to answer questions about a real-world dataset. Through each exercise, you'll learn important data science skills as well as "best practices" for using pandas. By the end of the tutorial, you'll be more fluent at using pandas to correctly and efficiently answer your own data science questions.

"This tutorial is one of the best courses on pandas, if not the best, especially for people who don't have advanced level in pandas. I have been using pandas for some time but I discovered things in this course that were amazing for me." - S.R.

This is an intermediate level tutorial, so if you're new to pandas, I recommend starting with my other video series: Easier data analysis with pandas.

If you want to follow along with the exercises at home, you can download the dataset and notebook from GitHub.

Here are some of the topics covered in the video:

adjusting for bias in your dataset

handling missing values

choosing an appropriate plot

customizing your plot

using the datetime data type

filtering using loc versus query

using multiple aggregation functions

checking for small sample sizes

method chaining

verifying your results using random samples

evaluating a "stringifed" Python container

applying a custom function to a Series

writing lambda functions

Let me know if you have any questions, and I'm happy to answer them!

P.S. If you like this video, you should check out my interactive pandas course, Analyzing Police Activity with pandas.