How it works

The above table shows 20 new posts on reddit that are the most likely to catch fire.



New posts are defined as being under 130 minutes old.



A post catches fire if it recieves a final score of 6000 or higher.



Clicking any post in the table will link you to it.



We have trained a RandomForestClassifier using the scikit-learn library on a bunch of reddit posts we scraped. Scraping a post involved logging it when it is new, then going back the next day to check what the final score of the post was and logging that. In this way we can predict whether a new post is going to blow up thanks to our scraped data and model.



A single input to the model has the following features: post's age in minutes

posts score

post's number of comments

subreddit subscriber count of the sub the post is in

subreddit active users count of the sub the post is in

rank in /hot in the subreddit

rank of score/per min in /new in the subrredit



To give an idea of our models accuracy, here is the classification_report ouput sklearn gives: label precision recall f1-score support not viral - train 98 100 99 6544 viral - train 68 22 33 9



We are also tracking a variety of other features such as the content tags of images/videos that are posted as well as NLP related features and more generic categorical data to encapsulate time of day, day of week, subreddit, etc. We will be adding to this and expanding with all this. Github link