So I have been experimenting with data recently. I used Python to take a snapshot of Reddit — “The Front Page of the Internet” once a minute over a number of days. They have an API for this which is cool. I pushed the data into a time series database, analyzed the results, and plotted them in R. The chart shows the rank of a post on the front page of reddit, as a function of time, wherein all posts have the time normalised so that t=0 when each post first reaches its highest ranking.

I examined every post that reached the top ten during this period (475 unique posts in total), and plotted their rise and fall from a lowly ranking of 500, up to the front page, and then back down again. The chart is broken down to show an overall average, as well as results for those “most successful” subreddits (all those with individual n>=30). On the way up there is much less variance than on the way down. For the first couple of hours after reaching the top, r/todayilearned hangs around the longest, with r/funny falling off the fastest, but before long both have fallen away, with r/gifs holding up best for about ten hours. In the end r/aww is the only one of these top subs beating the overall Reddit average.

One interesting observation which doesn’t show up too well in the averages is that quite a lot of posts have a “second page” effect, wherein their ranking will grow steadily over time until they reach the second page (ranked higher than 50), after which they will suddenly be catapulted to the top. So it is not necessarily the front page of the internet which matters as much as people think, but the front two pages.

I have collected a decent amount of data for this chart and plan to publish more on this in the next week or two. Shout out if there is anything you think would be interesting to look at.