A team of Stanford computer scientists has cracked Reddit using math.

Ph. D student Himabindu Lakkaraju, computer science post-doc Julian McAuley, and assistant professor Jure Leskovec released a paper earlier this summer outlining the algorithm they used to figure out exactly what makes Reddit titles perform well.

Here's the concept. Reddit is comprised of a vast number of communities, or subreddits, where people submit content around a common theme.

The /r/Pics subreddit is a clearinghouse for pictures, /r/Funny is a community focused on comedy, and /r/Gaming is all about video games. Other subreddits get more specific, like /r/Gifs, technically a subsection of /r/Pics, is all about .GIF images. And /r/GifSound is all about pairing appropriate music to GIFs. The rabbit hole goes deep.

When people submit an image to reddit, there are four central elements. There's the community they submit it to, when they submit it, the value of the content itself, and the title.

The Stanford team set out to tackle a fascinating problem relevant to any kind of media: When you factor out the content's merits, what goes in to a successful title?

The team developed a number of algorithms to isolate the effect of a positive title, and the way they accomplished that is clever to say the least.

"If you browse Reddit long enough," said McAuley, "You see the same content posted multiple times, and often when it makes it to the front page it's not actually the first time that you've seen it."

People often submit the same piece of content to reddit multiple times in the form or re-posts. This will be submitted a certain amount of time after an original submission, and oftentimes this reposted content will have a new title, or be in a new community.

This data is available on KarmaDecay, a site that inventories successful content and re-posts. Without going into too much detail over the models themselves — read the full paper here for the clever strategy — Lakkaraju, McAuley, and Leskovec scraped this data and developed models to ascertain what, precisely, goes into a good title when you normalize for the content itself.

Essentially, they were able to find the impact of a title alone, and were able to factor out the community the content was submitted to, the time it was submitted, and especially the inherent "value" of the content image itself.

The results were fascinating, to say the least.

"First, of course the community that you're trying to target," said Lakkaraju. "For example, if we look at more niche communities like /r/atheism, you need to focus on the community topics, or the topics that the community considers as important."

"On the other hand, when you go to other communities such as /r/pics, you know you need to pick the titles that are more catchy. You don't need to tailor it to specific content"

While that may sound basic — niche communities like niche topics — their factual findings were quite amusing.

Click to enlarge Himabindu Lakkaraju, Julian McAuley, Jure Leskovec Take a look at this chart sent to us by the Stanford team.

The x-axis is a mathematical articulation of how similar the title of a submission is to other titles in each community, where 0 is completely irrelevant and 1 is exactly like the rest.

The Y-axis shows the probability a submission is successful.

Notice that for generic communities — funny, GIFs, and pics — there's something of a bell curve. If you title is way too unique, the post does poorly. If the title is like every other title, the post does badly.

"The idea is, be different, but not too different," said Leskovec. "If you are too different then you are off topic. If you're not different then you are the same and again, you don't get noticed."

Then, take a look at the niche communities, atheism and gaming. The more like all the other posts you make your title, the better it does, with no consequent drop. There's a word for this effect on Reddit.

So obviously you have to tailor it to the community. What else?

"It's good if you use some of the positive sentiment words in the title," said Lakkaraju. Reddit posts with positive ideas or words perform much better than Reddit submissions with negative ideas or words.

Speaking of word choice, every community likes to see different parts of speech in their post titles.

"Gaming actually prefers usage of nouns much higher than, for instance, the funny community,"said Lakkaraju.

"When we come to adjectives, communities like you having a good sentiment words. Positive adjectives are preferred by communities like funny and pics."

Finally, sentence composition also has a major impact.

"Shorter sentences are better and sentences that are questions are better," said Leskovec.

"Particularly for pics and funny, they put a lot more emphasis on shorter titles and also the ones which are phrased like a question," said Lakkaraju.

This makes a ton of sense, especially for funny, they said. When you tell jokes in real life, the format is often a question then a punchline. It's the same thing on Reddit, only instead of a verbal punchline you've got a picture.

"It's easier to capture imagination," said Leskovec. "If I ask you a question, you want to know, "So what's here?", right?"

Of course, since they studied reposts, timing is also crucial.

First off, resubmissions work when you take something from a small community — like GifSound — and then post it to a larger community, like Pics. This falls apart completely the other way around.

But what if you want to resubmit content to the big communities?

"If people are interested in submitting already-submitted content, I think they should at least wait for the stipulated number of days," said Lakkaraju, which according to Leskovec was between 20 and 40.

Still, according to McAuley, there's no replacement for original content. "If you do really want to make a good submission to reddit, the best thing to do is actually to make original content. Really the best thing to do, is to create something new."

If you don't believe that they were able to do all this with a mere algorithm, don't worry. They did a test run.

The team pooled 85 images from their database and assigned two titles to each one, one considered "bad" by their model and the other considered "good."

They then submitted the same image with two titles at around the same time to two similar subreddits — for instance, Pics and Funny — and then figured out how their model performed.

Well, two of the "good" titled pictures made the front page, and the good titles got roughly three times higher scores — 10,959 points versus 3,438 points — over the course of a day.

The point? Even though a picture might say a thousand words, you'll need a few extra to build the well-crafted title that really makes people click.

Read the full paper here >