Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.

This file contains the posting preferences for over 850,000 active reddit users. This sample was taken in mid-2013.

This data was used to generate the interactive visualization, "redditviz," and will be analyzed in detail in an upcoming research article.

Please cite our paper "Navigating the massive world of reddit" if you use this data in your work. URL: http://arxiv.org/abs/1312.3387

The file is organized as follows:

Each line is an entry for an anonymous user. Each user was randomly assigned a unique ID, which is what shows in the first entry of each line.

Following the user ID, separated by commas, are the subreddits (i.e., interests) that the user regularly posts in. In order for a user to be considered "active" in that subreddit, they had to post or comment there at least 10 times in their last 1,000 posts and comments.