For our project, we accessed every publicly available Reddit comment for the month of August 2015 via a data dump shared on reddit . The dataset is very large, composing of over 58 million comments, each represented as a JSON object, and is 28 GB uncompressed. We looked at the top 100 words (excluding stopwords) for each of our selected subreddits and collected count and score data for each word/subreddit combination. Due to words appearing in the top 100 of multiple subreddits, we ended up with exactly 304 words in our final dataset. Additionally, since certain subreddits had a much larger userbase, we normalized the count and score statistics to allow for a fair comparison across subreddit groupings.

Our visualization provides the user with a means to visually explore differences (and similarities) in language across different Reddit communities (subreddits). Specifically, we want to explore differences in communities which are typically considered “opposites,” such as cats and dogs. We pulled word usage data from Reddit’s publicly available comment database for eight popular subreddits, or four pairs of “opposed” subreddits.

Our visualization provides the user with a means to visually explore differences (and similarities) in language across different Reddit communities (subreddits). Specifically, we want to explore differences in communities which are typically considered “opposites,” such as cats and dogs. We pulled word usage data from Reddit’s publicly available comment database for eight popular subreddits, or four pairs of “opposed” subreddits.

Our first visualization shows side-by-side wordclouds for each group of opposed subreddits, with the size of the word relative to its normalized count in that particular subreddit. Use the buttons at the top to choose subreddits for comparison. Hover over each word to see its position in the opposing wordcloud, as well as summary statistics in the center box.

Our first visualization shows side-by-side wordclouds for each group of opposed subreddits, with the size of the word relative to its normalized count in that particular subreddit. Use the buttons at the top to choose subreddits for comparison. Hover over each word to see its position in the opposing wordcloud, as well as summary statistics in the center box.