Making Smarter Guesses

Back in Part 1, we created a simple algorithm that estimated the value of a house based on its attributes. Given data about a house like this:

We ended up with this simple estimation function:

def estimate_house_sales_price(num_of_bedrooms, sqft, neighborhood):

price = 0 # a little pinch of this

price += num_of_bedrooms * 0.123 # and a big pinch of that

price += sqft * 0.41 # maybe a handful of this

price += neighborhood * 0.57 return price

In other words, we estimated the value of the house by multiplying each of its attributes by a weight. Then we just added those numbers up to get the house’s value.

Instead of using code, let’s represent that same function as a simple diagram:

The arrows represent the weights in our function.

However this algorithm only works for simple problems where the result has a linear relationship with the input. What if the truth behind house prices isn’t so simple? For example, maybe the neighborhood matters a lot for big houses and small houses but doesn’t matter at all for medium-sized houses. How could we capture that kind of complicated detail in our model?

To be more clever, we could run this algorithm multiple times with different of weights that each capture different edge cases:

Let’s try solving the problem four different ways

Now we have four different price estimates. Let’s combine those four price estimates into one final estimate. We’ll run them through the same algorithm again (but using another set of weights)!

Our new Super Answer combines the estimates from our four different attempts to solve the problem. Because of this, it can model more cases than we could capture in one simple model.

What is a Neural Network?

Let’s combine our four attempts to guess into one big diagram:

This is a neural network! Each node knows how to take in a set of inputs, apply weights to them, and calculate an output value. By chaining together lots of these nodes, we can model complex functions.

There’s a lot that I’m skipping over to keep this brief (including feature scaling and the activation function), but the most important part is that these basic ideas click:

We made a simple estimation function that takes in a set of inputs and multiplies them by weights to get an output. Call this simple function a neuron .

. By chaining lots of simple neurons together, we can model functions that are too complicated to be modeled by one single neuron.

It’s just like LEGO! We can’t model much with one single LEGO block, but we can model anything if we have enough basic LEGO blocks to stick together:

A grim preview of our plastic animal future? Only time can tell…

Giving our Neural Network a Memory

The neural network we’ve seen always returns the same answer when you give it the same inputs. It has no memory. In programming terms, it’s a stateless algorithm.

In many cases (like estimating the price of house), that’s exactly what you want. But the one thing this kind of model can’t do is respond to patterns in data over time.

Imagine I handed you a keyboard and asked you to write a story. But before you start, my job is to guess the very first letter that you will type. What letter should I guess?

I can use my knowledge of English to increase my odds of guessing the right letter. For example, you will probably type a letter that is common at the beginning of words. If I looked at stories you wrote in the past, I could narrow it down further based on the words you usually use at the beginning of your stories. Once I had all that data, I could use it to build a neural network to model how likely it is that you would start with any given letter.

Our model might look like this:

But let’s make the problem harder. Let’s say I need to guess the next letter you are going to type at any point in your story. This is a much more interesting problem.

Let’s use the first few words of Ernest Hemingway’s The Sun Also Rises as an example:

Robert Cohn was once middleweight boxi

What letter is going to come next?

You probably guessed ’n’ — the word is probably going to be boxing. We know this based on the letters we’ve already seen in the sentence and our knowledge of common words in English. Also, the word ‘middleweight’ gives us an extra clue that we are talking about boxing.

In other words, it’s easy to guess the next letter if we take into account the sequence of letters that came right before it and combine that with our knowledge of the rules of English.

To solve this problem with a neural network, we need to add state to our model. Each time we ask our neural network for an answer, we also save a set of our intermediate calculations and re-use them the next time as part of our input. That way, our model will adjust its predictions based on the input that it has seen recently.

Keeping track of state in our model makes it possible to not just predict the most likely first letter in the story, but to predict the most likely next letter given all previous letters.

This is the basic idea of a Recurrent Neural Network. We are updating the network each time we use it. This allows it to update its predictions based on what it saw most recently. It can even model patterns over time as long as we give it enough of a memory.

What’s a single letter good for?

Predicting the next letter in a story might seem pretty useless. What’s the point?

One cool use might be auto-predict for a mobile phone keyboard:

The next most likely letter is “t”.

But what if we took this idea to the extreme? What if we asked the model to predict the next most likely character over and over — forever? We’d be asking it to write a complete story for us!

Generating a story

We saw how we could guess the next letter in Hemingway’s sentence. Let’s try generating a whole story in the style of Hemingway.

To do this, we are going to use the Recurrent Neural Network implementation that Andrej Karpathy wrote. Andrej is a Deep-Learning researcher at Stanford and he wrote an excellent introduction to generating text with RNNs, You can view all the code for the model on github.

We’ll create our model from the complete text of The Sun Also Rises — 362,239 characters using 84 unique letters (including punctuation, uppercase/lowercase, etc). This data set is actually really small compared to typical real-world applications. To generate a really good model of Hemingway’s style, it would be much better to have at several times as much sample text. But this is good enough to play around with as an example.

As we just start to train the RNN, it’s not very good at predicting letters. Here’s what it generates after a 100 loops of training:

hjCTCnhoofeoxelif edElobe negnk e iohehasenoldndAmdaI ayio pe e h’e btentmuhgehi bcgdltt. gey heho grpiahe.

Ddelnss.eelaishaner” cot AAfhB ht ltny

ehbih a”on bhnte ectrsnae abeahngy

amo k ns aeo?cdse nh a taei.rairrhelardr er deffijha

You can see that it has figured out that sometimes words have spaces between them, but that’s about it.

After about 1000 iterations, things are looking more promising:

hing soor ither. And the caraos, and the crowebel for figttier and ale the room of me? Streat was not to him Bill-stook of the momansbed mig out ust on the bull, out here. I been soms

inick stalling that aid.

“Hon’t me and acrained on .Hw’s don’t you for the roed,” In’s pair.”

“Alough marith him.”

The model has started to identify the patterns in basic sentence structure. It’s adding periods at the ends of sentences and even quoting dialog. A few words are recognizable, but there’s also still a lot of nonsense.

But after several thousand more training iterations, it looks pretty good:

He went over to the gate of the café. It was like a country bed.

“Do you know it’s been me.”

“Damned us,” Bill said.

“I was dangerous,” I said. “You were she did it and think I would a fine cape you,” I said.

“I can’t look strange in the cab.”

“You know I was this is though,” Brett said.

“It’s a fights no matter?”

“It makes to do it.”

“You make it?”

“Sit down,” I said. “I wish I wasn’t do a little with the man.”

“You found it.”

“I don’t know.”

“You see, I’m sorry of chatches,” Bill said. “You think it’s a friend off back and make you really drunk.”

At this point, the algorithm has captured the basic pattern of Hemingway’s short, direct dialog. A few sentences even sort of make sense.

Compare that with some real text from the book:

There were a few people inside at the bar, and outside, alone, sat Harvey Stone. He had a pile of saucers in front of him, and he needed a shave.

“Sit down,” said Harvey, “I’ve been looking for you.”

“What’s the matter?”

“Nothing. Just looking for you.”

“Been out to the races?”

“No. Not since Sunday.”

“What do you hear from the States?”

“Nothing. Absolutely nothing.”

“What’s the matter?”

Even by only looking for patterns one character at a time, our algorithm has reproduced plausible-looking prose with proper formatting. That is kind of amazing!

We don’t have to generate text completely from scratch, either. We can seed the algorithm by supplying the first few letters and just let it find the next few letters.

For fun, let’s make a fake book cover for our imaginary book by generating a new author name and a new title using the seed text of “Er”, “He”, and “The S”:

The real book is on the left and our silly computer-generated nonsense book is on the right.

Not bad!

But the really mind-blowing part is that this algorithm can figure out patterns in any sequence of data. It can easily generate real-looking recipes or fake Obama speeches. But why limit ourselves human language? We can apply this same idea to any kind of sequential data that has a pattern.

Making Mario without actually Making Mario

In 2015, Nintendo released Super Mario Maker™ for the Wii U gaming system.

Every kid’s dream!

This game lets you draw out your own Super Mario Brothers levels on the gamepad and then upload them to the internet so you friends can play through them. You can include all the classic power-ups and enemies from the original Mario games in your levels. It’s like a virtual LEGO set for people who grew up playing Super Mario Brothers.

Can we use the same model that generated fake Hemingway text to generate fake Super Mario Brothers levels?

First, we need a data set for training our model. Let’s take all the outdoor levels from the original Super Mario Brothers game released in 1985:

Best Christmas Ever. Thanks Mom and Dad!

This game has 32 levels and about 70% of them have the same outdoor style. So we’ll stick to those.

To get the designs for each level, I took an original copy of the game and wrote a program to pull the level designs out of the game’s memory. Super Mario Bros. is a 30-year-old game and there are lots of resources online that help you figure out how the levels were stored in the game’s memory. Extracting level data from an old video game is a fun programming exercise that you should try sometime.

Here’s the first level from the game (which you probably remember if you ever played it):