In this article, I’m going to show the concept of making an AI Bot that plays Tetris like a real human. It’s not 100% perfect, but it’s quite good.

To simulate a human brain, I used Machine Learning with Convolutional Neural Network.



The game is programmed in Javascript using Phaser 2 framework and Tensorflow.js library. It runs directly in a browser without any issue. You can try it at the end of this presentation.

1. Video Trailer

Let’s get started with watching this video trailer:

2. Getting Data

To train the network, I needed a high-quality dataset of the various board configurations described by images with corresponding labels where:

the image is a snapshot of the board

the label is an action that represents the final column placement and rotation of a played piece on that board

How did I get this data?

Well, there is Youtube channel with a lot of videos showing Tetris World Championship matches. There, all competitors play Tetris at the top level producing a small number of errors. Besides, they aim to score the most points by clearing four rows at once all the time.

So these matches are a great resource of high-quality Tetris data! You only have to scrape it somehow. And that was my original idea of getting the training data.

So I made Tetris Data Scraper, a specific tool that collects data from videos of Tetris matches:

Based on the image processing of each video frame, this tool analyzes differences between pixel colors of the previous and current frames.

This way, it recognizes the current board configuration and the position of the currently played tetromino.

Using this tool, I processed 15 Tetris World Championship matches, generating around 50,000 useful records.





3. Augmenting Data

Unfortunately, 50,000 data records were not enough to train the network.

I estimated that I needed at least 1,000,000 records.

So I made the Tetris Data Augmentation tool to artificially expand the dataset by creating modified versions of the original boards.

But before processing, I changed all boards by converting all gaps into occupied fields.

Why?

Well, while testing the model, I found it was more cost-effective to ignore all the gaps than leaving them on the boards.

The image below shows the concept:

Figure 1 shows an original 20×10 board scraped from the video.

Then in figure 2, we see there are two gaps on this board.

Finally, in figure 3, the gaps are converted into occupied fields.

Now, let’s look at the concept of data augmentation. Using the board shown in figure 3, the tool generated these boards:

The upper boards are created by inserting new lines at the bottom.

The lower boards are created by removing lines one by one.

After generating 1,000,000 data records with this tool, I merged all of them into one final 180MB dataset file.

Now, I could train the convolutional neural network.

4. Building CNN Model

The image below shows complete neural network architecture used in this game:

The model is sequential, meaning it consists of a linear stack of layers with no branching.

To compile the model, I used the following parameters:

Optimizer: Adam with a learning rate of 0.0005 Loss Function: Categorical Crossentropy Evaluation Metric: Accuracy

4.1. The Input

When dealing with CNN, it’s fine to have a square-shaped input.

So the input to the model is a 20×20 black/white image representing a board configuration.

Since the images in the dataset are 20×10, they must be expanded by 5 blank columns on each side.

Here I found a smart way on how to use these extra columns.

Instead of keeping a piece on its initial position in the middle of the first row, it’s better to place each piece on their specific locations within the extra columns.

It seems, CNN better recognizes the different pieces when placed in separate locations.

As the picture speaks more than words, here are examples of inputs for the same board configuration, but with different pieces:

4.2. The Output

The output from the model is one of the 44 possible actions (labels).

And why are there 44 output actions?

Well, each action represents a combination of the final column position and rotation for the piece played on the current board configuration.

So let’s first look at the body structure of each piece. They are built of 4×4 blocks. Besides, each piece has 4 rotations, as shown in the image below:

Now, let’s go back to the dimension of the board. We know, its original size is 20×10.

But to show a piece at the board edges, considering that its body size is 4×4, it is essential to extend the board with hidden rows and columns.

Therefore, we need to use an extended 25×14 board as shown in the examples below:

So to place any piece on its final position, we need 11 columns (marked from 0 to 10).

Since each piece has 4 rotations (marked from 0 to 3), that is a total of 44 actions (11 * 4).





5. Training CNN Model

To get good predictions from CNN, we need to train it through numerous iterations using previously prepared data.

During the training, the neural network was showing gradual progress but nothing spectacular until 40000 iterations. Then it began to place pieces in the correct positions and clear the rows continuously.

After 75000 training iterations, the bot was already playing pretty well.

Yes, of course, it still makes some funny mistakes during gameplay, but it is also able to clear four rows at once and score more than 999,999 points.

It also knows how to get out of some tricky situations most of the time.

However, it is not always so unbeatable and sometimes loses the game very quickly, but generally speaking, it can survive for a long time.

6. Playing Tetris with AI Bot

While playing the game, the AI bot uses a trained CNN to control pieces.

The image below shows the entire process:

The input to CNN is a 20×20 black/white image of the current board configuration.

So at first, the bot captures a snapshot of the whole board (region of interest).

After that, the snapshot is downsampled and converted to a normalized 20×20 array.

That means all colorized (occupied) fields are mapped to value 1, and all black fields (voids) are mapped to value 0.

Here we have an exception: since we decided to ignore gaps in training data, then we also need to ignore them during gameplay. So all black fields that are gaps must be mapped to value 1 (in this example, there is one gap).

Likewise, we must also replace the playing piece from its initial position to its specific location within extra columns. In this example, we see the I-piece replaced from the center to the top left corner.

After feeding CNN with such a prepared 20×20 input, it produces an output prediction. And this prediction is one of the 44 possible actions.

To get the final column position and rotation of the piece from the output prediction, the bot uses these simple calculations:

rotation = (output / 11) = {0, 1, 2, 3}

column = (output % 11) = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}





7. Live Demo

And here is the game!

Press the Load button to load a pre-trained model and enjoy the magic of artificial intelligence (although it’s not 100% perfect).

Please note that the Train button is disabled in the online version, and the dataset is not loaded. It’s because the size of the dataset file is too large to be loaded online (about 180 MB).

So you can’t train the model online. Instead, I already pre-trained it locally.



Your browser do not support [iframe] tag to display an embedded object!



8. Conclusion

The goal of this project was to create an artificial intelligence (AI) that learns to play Tetris using a convolutional neural network (CNN).

It wasn’t an easy task, and the biggest challenge was how to generate a high-quality dataset to train the network for playing Tetris.

Furthermore, to teach the network as many skills as possible, I also needed a large amount of data.

Now I can say that most of the time, I spent not on programming but on collecting and pre-processing data.

In the end, though, I was able to train CNN with a pretty good dataset.

However, there is still no guarantee that the predictions from such a trained model will always be 100% accurate.

Due to the variability of the real world, the network doesn’t know how to handle all situations. It then makes stupid mistakes that ultimately lead to the loss of the game.

Anyhow, the predictions in this game are generally 90% accurate and quite acceptable.

To conclude, with a large and high-quality dataset, we can teach a convolutional neural network to play Tetris pretty well.

9. Source Code

The source code will be available on Github as soon as possible.

Currently, it is not so clear and readable to be ready for download.

So stay tuned, and don’t forget to like, share and subscribe.