I’m a big consumer of podcasts. Ever since I started living on my own while in graduate school I’ve found that having funny and interesting people in my ears helps me get through the day. Even now that I’m cohabiting with my wife I haven’t left my trusty podcasts behind. They’re great for the long commutes back and forth to San Diego, for reducing my stress while stuck in LA traffic, and for making my laugh while I cook, clean, and exercise.

I’m a fan and donor (support the things you love!) of one podcast network in particular, Maximum Fun, so much so that I’m a semi-regular participant in their Facebook group and subreddit. Recently, someone in the Facebook group asked for some data about the network, in particular the number of shows that have been published, in order to visualize the growth of the network. As someone who’s keen to keep my data analysis skills fresh and nimble I took this as an opportunity to dive back into R. Here’s what I’ve done so far:

Podcasts are unique in that they’re basically just a simple feed of audio files. That feed has data embedded into it that we can access and save. I’m pretty new to web-scrapping, but I was able to find a really nice example of how to scrape an RSS feed in R here. I adapted that to scrape and save data from each of the podcasts in the Maximum Fun network.

I probably could have created a function to run through all the shows, but instead I processed each show individually. It was actually useful as a few of the shows had missing episodes or titles and durations that didn’t match up.

Edit: I was able to find the feed for the The Goosedown and the entire backlog of Bullseye/The Sound of Young America and have updated the data/visualization to include it.

Once I had all the data scrapped from the feeds I was able to combine it into one dataset of 4,202 episodes from 25 different shows. The date/duration variables were pretty messy so I noodled around a bit and cleaned them up into something manageable. I’ve saved that final data in Rdata and .csv formats if you want to play with them yourself.

Visualizations

Once we have all the data in a good format creating visualizations is actually pretty easy! Let’s start with a simple bar chart that plots the number of shows per month: