Quantifying the types of topics discussed on r/JoeRogan with Python

As you all know, reddit houses several different communities in the form of subreddits. Actually, several is an understatement. There is a reddit community for every topic you could think of. I spent some time thinking about what subreddit would contain the most interesting topics, and I thought what could be better than the sub dedicated to the Joe Rogan Experience?

The Joe Rogan Experience is well known for discussing a wide range of topics, from hunting and Mixed Martial Arts to psychedelic philosophy. Using praw and an NLP library called textblob, I sought to quantify the most frequently discussed topics on the subreddit dedicated to the JRE, r/JoeRogan. Looking at the last 1440 posts, I populated a database with the post text and the nouns mentioned in that text. I also pulled the adjectives describing those nouns. What nouns and adjectives are used most frequently? That question can be answered with a wordcloud…

Wordclouds (a.k.a. tag clouds) are graphical representations of text data. They are often used to visualize the frequency of words in a corpus (body of text). The frequency/importance of each word is denoted by the size of each word on the graph. The largest words are the words used most frequently. Wordclouds can be generated easily in python thanks to Andreas C. Müller, the core developer of a well known machine learning library in python, Sci-Kit Learn. First, lets look at a wordcloud depicting the most common nouns (including names) mentioned in the last 1440 posts:

Wordcould of Nouns

Here we see some of our favorite podcast guests. Eddie Bravo (most likely because of his interesting conspiracy theories), Alex Jones, and Theo Von seem to be the most frequently discussed podcast guests, with Joey Diaz right behind. We also see Jamie and “Young Jamie” appear frequently, which is of no surprise. Trump and Roseanne also make a frequent appearance in the last 1440 posts. I had to smile when I saw “DMT” make an appearance in the wordcloud. I remember first hearing Joe talk about DMT way back when I was in high school, and DMT-like philosophy as remained a common theme on his podcast.

The method here is obviously not perfect. Some names appear in the wordcloud more than once, i.e. Joe and Joe Rogan, and it is possible that some names appear multiple times in one post. With that said, it still seems that the wordcloud is a good representation of the various topics discussed on the JRE. Lets look at what adjectives are frequently used in r/JoeRogan posts:

Wordcloud of Adjectives

I had to laugh when I saw “dire physical” show up in the wordcloud, because I believe “consequences” shows up somewhere in the nouns wordcloud. For those of you who don’t know, Joe Rogan often describes MMA as, “high level problem solving with dire physical consequences”. He uses this phrase so often that it has started to annoy some listeners. We see “high” is a frequently used adjective, which is not surprising. Another one that made me laugh was “vegan”. Rogan has expressed his feelings on veganism several times on the podcast, and I’m sure many of his listeners feel the same way that he does. Speaking of listeners, my next question was about the contribution to r/JoeRogan. Are there only a few reddit users who are responsible for the majority of the the posts on r/JoeRogan, or is the contribution fairly spread out? To answer this question, I plotted the top ten most frequent posters on r/JoeRogan below:

Number of Posts

The top ten most frequent posters from the last 1440 posts make up about 17% of the total posts. There are 891 unique posters in the sample, so the top 1.1% (in terms of post quantity) are responsible for 17% of the overall posts. The user “Chimpgainz” (aside from having one of the greatest user names I’ve seen) is at the top of the leaderboard with 59 posts. Curious to see what topics Chimpgainz was posting about, I decided to generate a wordcloud populated only by posts from Chimpgainz:

Wordcloud of Chimpgainz’s Posts

This user seems to be a big fan of Mixed Martial Arts. Most of the names he mentions are names of fighters. He mentions Rose Namajunas, Robert Whittaker, Robbie Lawler, and many more. Also, we see the word “consequences” appear, which I am willing to bet comes from Rogan’s trademark phrase that I mentioned earlier. The two most commonly used words, well, you all can see them above for yourselves. To be fair, Given the small sample size (59 posts) this may mean he only used that phrase 3 or 4 times. Wordcloud behaves differently with smaller sample sizes. And yes, the wordcloud above was modeled over an image of a chimpanzee wearing headphones.

Thank you all for reading! At the very least, I hope it gave you a good laugh. I had a lot of fun with this mini-project, and I plan on doing some more projects like this in the future. Reddit is full of interesting and goofy data. If you are interested, you can find the code I wrote for this project below.

To create the database:

And to create the graphs:

Thanks!!