Reddit's Vocabulary - Default Subreddits

Based on some work I did on the Vocabulary of Rolling Stone's Top 100 Artists, I thought it might be interesting to do a similar analysis of the comments in the 50 default subreddits. This is what I ended up with.

I used the Reddit JSON API to collect the comments from the top 100 hot threads on each default sub. This collection was done around the 30 July 2014.

Below is a visualisation of the comments I collected. I've filtered out the subreddits where I collected less than 50,000 words. This left me with words for 35 subs. The left set of bars shows the number of different words used in each subreddits's first 50,000 words. Click an sureddit name bar on the left and a bubble chart with that sureddit's top N words is presented, N can be adjusted to show more or less words. There is also a set of radio buttons allowing you to filter out sets of words. "None" includes all words in the artitst top list. "Common" filters out all words in Wikipedia's 100 most common words list (just base words, not lexemes). "Pronouns" filters out pronouns. The right set of bars shows the number of times each word appears when the top 100 words for all sureddits are combined. The bar colour changes to match the colour of the word in the bubble chart.

So that's what I've got at the moment, it's a work in progress and I've got a few more ideas about what I can do with this set of data. I'll see where it goes next.

Let me know what you think, notrommit -at- gmail -dot- com