Hey! If you are a web developer, you should know about CatchJS. It’s a service for tracking and logging errors in JavaScript, with some pretty exciting features.

“F.D.R.’s War Plans!” reads a headline from a 1941 Chicago Daily Tribune. Had this article been written today, it might rather have said “21 War Plans F.D.R. Does Not Want You To Know About. Number 6 may shock you!”. Modern writers have become very good at squeezing out the maximum clickability out of every headline. But this sort of writing seems formulaic and unoriginal. What if we could automate the writing of these, thus freeing up clickbait writers to do useful work?

If this sort of writing truly is formulaic and unoriginal, we should be able to produce it automatically. Using Recurrent Neural Networks, we can try to pull this off.

Standard artificial neural networks are prediction machines, that can learn how to map some input to some output, given enough examples of each. Recently, as people have figured out how to train deep (multi-layered) neural nets, very powerful models have been created, increasing the hype surrounding this so-called deep learning. In some sense the deepest of these models are Recurrent Neural Networks (RNNs), a class of neural nets that feed their state at the previous timestep into the current timestep. These recurrent connections make these models well suited for operating on sequences, like text.

We can show an RNN a bunch of sentences, and get it to predict the next word, given the previous words. So, given a string of words like “Which Disney Character Are __”, we want the network to produce a reasonable guess like “You”, rather than, say, “Spreadsheet”. If this model can learn to predict the next word with some accuracy, we get a language model that tells us something about the texts we trained it on. If we ask this model to guess the next word, and then add that word to the sequence and ask it for the next word after that, and so on, we can generate text of arbitrary length. During training, we tweak the weights of this network so as to minimize the prediction error, maximizing its ability to guess the right next word. Thus RNNs operate on the opposite principle of clickbait: What happens next may not surprise you.

I based this on Andrej Karpathy’s wonderful char-rnn library for Lua/Torch, but modified it to be more of a “word-rnn”, so it predicts word-by-word, rather than character-by-character. ( Code will be put up on github soon. Here is the code.) Predicting word-by-word will use more memory, but means the model does not need to learn how to spell before it learns how to perform modern journalism. (It still needs to learn some notion of grammar.) Some more changes were useful for this particular use case. First, each input word was represented as a dense vector of numbers. The hope is that having a continuous rather than discrete representation for words will allow the network to make better mistakes, as long as similar words get similar vectors. Second, the Adam optimizer was used for training. Third, the word vectors went through a particular training rigmarole: They received two stages of pretraining, and were then frozen in the final architecture – more details on this later in the article.

The final network architecture looked like this:

One Neat Trick Every 90s Connectionist Will Know

Whereas traditional neural nets are built around stacks of simple units that do a weighted sum followed by some simple non-linear function (like a tanh), we’ll use a more complicated unit called Long Short-Term Memory (LSTM). This is something two Germans came up with in the late 90s that makes it easier for RNNs to learn long-term dependencies through time. The LSTM units give the network memory cells with read, write and reset operations. These operations are differentiable, so that during training, the network can learn when it should remember data and when it should throw it away.

To generate clickbait, we’ll train such an RNN on ~2 000 000 headlines, scraped from Buzzfeed, Gawker, Jezebel, Huffington Post and Upworthy.

How realistic can we expect the output of this model to be? Even if it can learn to generate text with correct syntax and grammar, it surely can’t produce headlines that contain any new knowledge of the real world? It can’t do reporting? This may be true, but it’s not clear that clickbait needs to have any relation to the real world in order to be successful. When this work was begun, the top story on BuzzFeed was “50 Disney Channel Original Movies, Ranked By Feminism“. More recently they published “22 Faces Everyone Who Has Pooped Will Immediately Recognized“. It’s not clear that these headlines are much more than a semi-random concatenation of topics their userbase likes, and as seen in the latter case, 100% correct grammar is not a requirement.

The training converges after a few days of number crunching on a GTX980 GPU. Let’s take a look at the results.

Early on in the training, the model is stringing together words with very little over all coherency. This is what it produces after having seen about 40000 headlines:

2 0 Million 9 0 1 3 Say Hours To Stars The Kids For From Internet

Adobe ‘ s Saving New Japan

Real Walk Join Their Back For Plane To French Sarah York

State 7

Dr 5 Gameplay : Oscars Strong As The Dead

Economic Lessons To Actress To Ex – Takes A App

You ‘ s Schools ‘ : A Improve Story

However, after having had multiple passes through the data, the training converges and the results are remarkably better. Here are its first outputs after completed training:

John McCain Warns Supreme Court To Stand Up For Birth Control Reform

Earth Defense Force : Record Olympic Fans

Kate Middleton , Prince William & Prince George Leave Kate For The Queen

The Most Creative Part Of U . S . History

Biden Responds To Hillary Clinton ‘ s Speech

The Children Of Free Speech

Adam Smith And Jennifer Lawrence ( And Tiger Woods ” Break The Ice Ball , For This Tornado )

Romney Camp : ‘ I Think You Are A Bad President ‘

Here ‘ s What A Boy Is Really Doing To Women In Prison Is Amazing

L . A . ‘ S First Ever Man Review

Why Health Care System Is Still A Winner

Why Are The Kids On The Golf Team Changing The World ?

2 1 Of The Most Life – Changing Food Magazine Moments Of 2 0 1 3

More Problems For ‘ Breaking Bad ‘ And ‘ Real Truth ‘ Before Death

Raw : DC Helps In Storm Victims ‘ Homes

U . S . Students ‘ Latest Aid Problem

Beyonce Is A Major Woman To Right – To – Buy At The Same Time

Taylor Swift Becomes New Face Of Victim Of Peace Talks

Star Wars : The Old Force : Gameplay From A Picture With Dark Past ( Part 2 )

Sarah Palin : ‘ If I Don ‘ t Have To Stop Using ‘ Law , Doesn ‘ t Like His Brother ‘ s Talk On His ‘ Big Media ‘

Israeli Forces : Muslim – American Wife ‘ s Murder To Be Shot In The U . S .

And It ‘ s A ‘ Celebrity ‘

Mary J . Williams On Coming Out As A Woman

Wall Street Makes $ 1 Billion For America : Of Who ‘ s The Most Important Republican Girl ?

How To Get Your Kids To See The Light

Kate Middleton Looks Into Marriage Plans At Charity Event

Adorable High – Tech Phone Is Billion – Dollar Media

Tips From Two And A Half Men : Getting Real

Hawaii Has Big No Place To Go

‘ American Child ‘ Film Clip

How To Get T – Pain

How To Make A Cheese In A Slow – Cut

WATCH : Mitt Romney ‘ s New Book

Iran ‘ s President Warns Way To Hold Nuclear Talks As Possible

Official : ‘ Extreme Weather ‘ Of The Planet Of North Korea

How To Create A Golden Fast Look To Greece ‘ s Team

Sony Super Play G 5 Hands – On At CES 2 0 1 2

1 5 – Year – Old , Son Suicide , Is Now A Non – Anti

” I ” s From Hell ”

God Of War : The World Gets Me Trailer

How To Use The Screen On The IPhone 3 Music Player

World ‘ s Most Dangerous Plane

The 1 9 Most Beautiful Fashion Tips For ( Again ) Of The Vacation

Miley Cyrus Turns 1 3

This Guy Thinks His Cat Was Drunk For His Five Years , He Gets A Sex Assault At A Home

Job Interview Wins Right To Support Gay Rights

Chef Ryan Johnson On ” A . K . A . M . C . D . ” : ” ” They Were Just Run From The Late Inspired ”

Final Fantasy X / X – 2 HD : Visits Apple

A Tour Of The Future Of Hot Dogs In The United States

Man With Can – Fired Down After Top – Of – The – Box Insider Club Finds

WATCH : Gay Teens Made Emotional Letter To J . K . Williams

It surprised me how good these headlines turned out. Most of them are grammatically correct, and a lot of them even make sense.

Consider the sentence “Mary J. Williams On Coming Out As A Woman”. I suspected this might be a case where the network had simply memorized a headline from the dataset. It turns out this was not the case. The only thing similar to “Coming Out As A Woman” is the headline “Former Marine Chronicles Journey Coming Out As A Trans Woman On YouTube”. The name “Mary J. Williams” does not appear in the dataset. The network has apparently learned that this is a plausible name, and also that such a name is the type of thing that can come out as a woman.

Another good one is “Romney Camp: ‘I Think You Are A Bad President'”. It’s suspiciously good – it wouldn’t surprise me if this was a real headline that some website had published. But it’s not in the dataset, not even close. While “Romney Camp” occurs 17 times in the dataset, none of these contain any statement about the president (or even the word president). “Bad President” occurs only once in the dataset, in the headline “Rubio: Obama Is A ‘Bad President'”. Yet, the network knows that the Romney Camp criticizing the president is a plausible headline. The network knows something about language, and it has some level of knowledge about the world by knowing what words are semantically associated.

Kim Kardashian Is Married With A Baby In New Mexico

Let’s investigate these semantic associations. By seeding the model with the start of a sentence, and getting the RNN to complete it, we can get a peek into what the model knows. For example, we can ask it to complete “Barack Obama Says” and “Kim Kardashian Says”, and compare the outputs.

Here are the 10 first completions of “Barack Obama Says”:

Barack Obama Says It’s Wrong To Talk About Iraq

Barack Obama Says He’s Like ‘A Single Mother’ And ‘Over The Top’

Barack Obama Says He Is “The First Threat Gay Woman In The World”

Barack Obama Says About Ohio Law

Barack Obama Says He Is Wrong

Barack Obama Says He Will Get The American Idol

Barack Obama Says Himself Are “Doing Well Around The World”

Barack Obama Says As He Leaves Politics With His Wife

Barack Obama Says He Did 48 Things Over

Barack Obama Says GOP Needs To Be Key To New Immigration Policy

And here are the 10 first completions of “Kim Kardashian Says”:

Kim Kardashian Says She Wants To Sign Again

Kim Kardashian Says ‘Idea’ To Her Mean Baby!

Kim Kardashian Says North West Is Even More ‘Important’

Kim Kardashian Says She Would Love Kanye

Kim Kardashian Says She’s A Hero

Kim Kardashian Says She Looks Fake

Kim Kardashian Says It Was Over Before They Call Her

Kim Kardashian Says Her Book Used To Lose Her Cool

Kim Kardashian Says She’s Married With A Baby In New Mexico

Kim Kardashian Says Kanye West Needs A Break From Her

Question Answering

By getting the RNN to complete our sentences, we can effectively ask questions of the model. Ilya Sutskever and Geoff Hinton trained a character level RNN on Wikipedia, and asked it to complete the phrase “The meaning of life is”. The RNN essentially answered “human reproduction”. It’s funny that you can get an RNN to read Wikipedia for a month, and have it essentially tell you that meaning of life is to have sex. It’s probably also a correct answer from a biological perspective.

We can’t directly replicate this experiment on the clickbait model, because the word “meaning” is not in its vocabulary. But we can ask it to complete the phrase “Life Is About”, for similar effect. These are the first 10 results:

Life Is About The Weather!

Life Is About The (Wild) Truth About Human-Rights

Life Is About The True Love Of Mr. Mom

Life Is About Where He Were Now

Life Is About Kids

Life Is About What It Takes If Being On The Spot Is Tough

Life Is About A Giant White House Close To A Body In These Red Carpet Looks From Prince William’s Epic ‘Dinner With Johnny’

Life Is About — Or Still Didn’t Know Me

Life Is About… An Eating Story

Life Is About The Truth Now

Network details

With some experimentation, I ended with the following architecture and training procedure. The initial RNN had 2 recurrent layers, each containing 1200 LSTM units. Each word was represented as a 200 dimensional word vector, connected to the rest of the network via a tanh. These word vectors were initialized to the pretrained GloVe vectors released by its inventors, trained on 6 billion tokens from Wikipedia. GloVe, like word2vec, is a way of obtaining representations of words as vectors. These vectors were trained for a related task on a very big dataset, so they should provide a good initial representation for our words. During training, we can follow the gradient down into these word vectors and fine-tune the vector representations specifically for the task of generating clickbait, thus further improving the generalization accuracy of the complete model.

It turns out that if we then take the word vectors learned from this model of 2 recurrent layers, and stick them in an architecture with 3 recurrent layers, and then freeze them, we get even better performance. Trying to backpropagate into the word vectors through the 3 recurrent layers turned out to actually hurt performance.

To summarize the word vector story: Initially, some good guys at Standford invented GloVe, ran it over 6 billion tokens, and got a bunch of vectors. We then took these vectors, stuck them under 2 recurrent LSTM layers, and optimized them for generating clickbait. Finally we froze the vectors, and put them in a 3 LSTM layer architecture.

The network was trained with the Adam optimizer. I found this to be a Big Deal: It cut the training time almost in half, and found better optima, compared to using rmsprop with exponential decay. It’s possible that similar results could be obtained with rmsprop had I found a better learning and decay rate, but I’m very happy not having to do that tuning.

Building The Website

While many headlines produced from this model are good, some of them are rambling non-sense. To filter out the non-sense, we can do what Reddit does and crowd source the problem.

To this end, I created Click-o-Tron, possibly the first website in the world where all articles are written in their entirety by a Recurrent Neural Network. New articles are published every 20 minutes.

Any user can vote articles up and down. Each article gets an associated score determined by the number of votes and views the article has gotten. This score is then taken into account when ordering the front page. To get a trade-off between clickbaitiness and freshness, we can use the Hacker News algorithm:

In practice, this can look like the following in PostgreSQL:

CREATE FUNCTION hotness(articles) RETURNS double precision LANGUAGE sql STABLE AS $_$ SELECT $1.score / POW(1+EXTRACT(EPOCH FROM (NOW()-$1.publish_date))/(3*3600), 1.5) $_$;

The articles are a result of three seperate language models: One for the headlines, one for the article bodies, and one for the author name.

The article body neural network was seeded with the words from the headline, so that the body text has a chance to be thematically consistent with the headline. The headlines were not used during training.

For the author names, a character level LSTM-RNN was trained on a corpus of all first and last names in the US. It was then asked to produce a list of names. This list was then filtered so that the only remaining names were the ones where neither the first nor the last name was in the original corpus. This creates a nice list of plausible, yet original names, such as Flodrice Golpo and Richaldo Aariza.

Finally, each article’s picture is found by searching the Wikimedia API with the headline text, and selecting the images with a permissive license.

In total, this gives us an infinite source of useless journalism, available at no cost. If I remember correctly from economics class, this should drive the market value of useless journalism down to zero, forcing other producers of useless journalism to produce something else.

As they say on BuzzFeed: Win!