Now that we have a framework in place to feed input to the bot and to let its output control the game, we come to the interesting part: learning game intelligence. This is done in two steps by (1) using convolution neural network for understanding the screenshot image and (2) using long short term memory networks to decide appropriate action based on the understanding of the image.

STEP 1: Training Convolution Neural Network (CNN)

CNNs are well known for their ability to detect objects in an image with high accuracy. Add to that fast GPUs and intelligent network architectures and we have a CNN model that can run in real time.

For making our bot understand the image it is given as input, I use an extremely light weight and fast CNN called MobileNet. The feature map extracted from this network represents a high level understanding of the image, like where the players and other objects of interest are located on the screen. This feature map is then used with a Single-Shot Multi-Box to detect the players on the pitch along with the ball and the goal.

STEP 2: Training Long Short Term Memory Networks (LSTM)

Now that we have an understanding of the image, we could go ahead and decide what move we want to make. However, we don’t want to look at just one frame and take action. We’d rather look at a short sequence of these images. This is where LSTMs come into picture as they are well known for being able to model temporal sequences in data. Consecutive frames are used as time steps in our sequence, and a feature map is extracted for each frame using the CNN model. These are then fed into two LSTM networks simultaneously.

The first LSTM performs the task of learning what movement the player needs to make. Thus, it’s a multi-class classification model. The second LSTM gets the same input and has to decide what action to take out of cross, through, pass and shoot: another multi-class classification model. The outputs from these two classification problems are then converted to key presses to control the actions in the game.

These networks have been trained on data collected by manually playing the game and recording the input image and the target key press. One of the few instances where gathering labelled data does not feel like a chore!

Evaluating the bot’s performance

I don’t know what accuracy measure to use in order to judge the bot’s performance, other than to let it just go out there and play the game. Based on only 400 minutes of training, the bot has learned to make runs towards the opponent’s goal, make forward passes and take shots when it detects the goal. In the beginner mode of FIFA 18, it has already scored 4 goals in about 6 games, 1 more than Paul Pogba has in the 17/18 season as of time of writing.

Video clips of the bot playing against the inbuilt bot can be found on my YouTube channel, with the video embedded below.

Conclusion

My initial impressions on this approach of building game bots are certainly positive. With limited training, the bot has already picked up on basic rules of the game: making movements towards the goal and putting the ball in the back of the net. I believe it can get very close to human level performance with many more hours of training data, something that would be easy for the game developer to collect. Moreover, extending the model training to learn from real world footage of matches played would enable the game developers to make the bot’s behavior much more natural and realistic. Now if only anyone from EA sports was reading this…