Two weeks ago (at the time of publishing), our team set out with a simple idea: to make a cool visualization of Reddit that would allow us to explore the relationships between different communities. Such visualizations have been done before of course, but only in a static manner that presents an overwhelming amount of information, our goal, above all, was to present the data in a manner that would be useful.

This is our product.

For those who are unfamiliar, Reddit is a social news aggregation site where you can comment and rate content submitted by users. Content is divided into topics or areas of interest called subreddits that Reddit users, Redditors, can subscribe to.

The Reddit Visualizer (RV) site pulls data from Reddit and visualizes connections between subreddits to an interactive graph. Each subreddit is represented by a sized and colored node. Node size indicates the relative subreddit subscriber count and colors codify whether a node is expandable, repeated, active, not safe for work (nsfw), or the currently selected node. Expanding a node shows the top related subreddits linked by lines — wider lines represent stronger connections and connections are determined by shared Redditor activity.

Specifically, for each subreddit we gathered the hottest fifty posts at the time data was gathered, and checked the last 100 comments of each redditor who made those posts. Each post made in another subreddit by that redditor counted as a single point towards that connection. It is also important to note that connections are directional — just because a small subreddit leads to a larger subreddit, does not mean that larger subreddit leads back to the smaller one.

Big data was a large driving force behind RV — it was built with a two major goals: visualize an aesthetic representation of real and meaningful relationships between subreddits and create an interactive and explorable graph. As of now, there are over 7,000 subreddits represented on the map, with well over 43,000 connections total (although not all connections are displayed).

Some Interesting Data

From looking at the visualization, some interesting trends emerge. For example, you can see here there’s a tight coupling between subreddits dedicated to Blizzard games, a behavior you’d expect to see:

There are also other interesting trends, like nsfw subreddit clustering. As you’d expect people don’t usually associate their nsfw purchasing/watching/commenting habits with their day to day life, or accounts that might be discovered by people in their lives, thus we see “clustering”, that is accounts that post to nsfw subreddits post to a *lot* of those subreddits, hinting at dedicated nsfw accounts.

There are also relationships you can’t see — negative connections you would expect but that aren’t there: MOBA (multiplayer online battle arena) subreddits seem to be decoupled. This seems counterintiuitive, given that you’d expect similar types of games to sort of cluster. This hints that MOBA players tend to focus on one specific game (a known phenomenon, mostly due to the amount of investment in time and potentially money most MOBAs require).

Two levels deep

If you’re interested in the technical bits, keep reading, otherwise you may want to skip ahead to the bugs/bloopers near the bottom of the page.