Data analysis,

(5 min) ● Pierre de Wulf, ● 24 March, 2020

Introduction

For those of you who don't know, Hacker News is a successful social news website focusing on computer science and entrepreneurship visited by more than 10m people per month (source: SimilarWeb).

Founded by Paul Graham, it works similarly to Reddit, users submit contents which can be upvoted by the community. The most upvoted content, mostly links, then reach the front-page, resulting in tens of thousands of visits for the lucky website.

But the competition for the front page is fierce. Around 1000 posts get submitted each day, but most importantly, HN readers are known for valuing content quality a lot.

Knowing all this, and for the sake of experiment, I wondered if some titles performed better than others and decided to see what would the best HN post title, statistically speaking, look like.

To write this post, I analyzed all (2,6+ millions) HN submissions titles since 2006 and up until the end of 2019 coming from this dataset.

This is what I found.

General findings

Number of submissions

According to Similar Web HN has more than 10 millions visitors per month.

Here is the number of posts per quarter since the launch of the website, as you can see, this number has been pretty stable for the last 6 years.

Points per submissions

As said earlier, it is hard to have your post liked by the community. By default, all posts have 1 point.

Some users have the ability to downvote posts and so some posts have a score of 0.

Below is the distribution of posts score.

As you can see, almost 66% of all posts don't manage to have more than 2 points, which mean that 66% of all posts are not even upvoted by another person as every posts begin with a score of 1.

Title length distribution

I also wanted to know the title length distribution of posts.

The distribution is almost perfectly centered on seven. It is actually not a surprise to have a number that low, post title in HN can't exceed 80 characters.

Knowing all this, let's now move to the most interesting part.

The quest to the perfect title

The importance of title length

I decided to plot the median score per number of words in post title.

Unsurprisingly, no matter your title length, your median score will be 2 or 3. The fact that more than 66% of every posts never reach more than 2 points explains it.

But if we choose to discard posts that don't perform, let's say post with less than 5 points, the chart is not really the same.

A good way to phrase this finding:

No matter your title length, your post won't perform well, but in the case it does, you'd want your title to be as short as possible

I was curious about those “1 word title” submissions that performs so well, here are some fun ones:

2048: release of the famous game, 2903 points

Hyperloop

Atom: Atom announcement

Best categories

Some submission on Hacker News relate to precise topics and follow a strict title format. They can be split into 5 categories:

Title format Category Ask HN:<question>? People asking HN readers a particular question. Ask HN: who is hiring? Opportunity for companies to post job listing. Show HN:<project name> Showing to HN readers something you recently build: Open-source project, side-project or company. <something> YC YYYY <something>? A post related to a company that did Y-Combinator. Classic post

It is worth noting that a post can overlap multiple categories. For example a post whose title is " Scale (YC S16) is hiring engineers to build infrastructure for AI”, is counted as a “YC” post and an “is hiring” post.

Posts related to YC companies seem to perform better than the others. However, if we only take into account posts with more thant 5 upvotes, those who talk about companies currently hiring are the champions.

Top performing words

Now let's find if some words, when they appeared in title, are correlated to posts performing better.

For every posts in the dataset, I've split its title in single words, removing stop words (a, the, an, etc..).

I ended up with a big array which size is the number of words per titles times the number of posts, where each line was a word of one post's title associated with the score of said post.

Finally I aggregated everything, and was able to find the top 50 words that had the best score.

I must admit it took quite a bit of time and almost killed my 2013 MacBook pro. Below is the scatter plot of my findings.

The word “zwiebelfreunde” appeared in 7 submissions titles, and the median score of all those submission was 121 points. Impressive.

As you can see, some words really stand out.

“Profit/Month” is here because of those posts talking about the history of FipLab.

“Mathics” is an open source alternative to Mathematica

“Jmap” is an open API standard for modern mail clients

Conclusion

So here is what we know can make a post successful on HN:

a short title

talking about a YC startup

talking about hiring

JMAP or LinuxInsides in the title

Hence our result for the best HN post title possible: “JMAP”.

…

Ok this was a bit of a let down, let's try something else: “JMAP (YC S10) Linux Inside is hiring” and see how it goes 🤷‍♂️.

Bonus

While writing this blog post I've made other fun discoveries.

Those were not big enough to deserve their own article but were interesting nonetheless.

Most common programming language

Here are the 25 programming language appearing the most time in HN post titles.

C is not here as it was considered a stop-word.

HN trend

I've also analysed the evolution of number of occurrence of particular words in post titles over time.

Here are a few: