AI-powered highlights

Deep-learning the interestingness in Starcraft 2 videos

Structure vs. chaos

All movies follow a certain structure, which affects the high and low points of the story. This rise and fall of tension has been studied for more than 2000 years. Scholars such as Aristotle pinned down the idea of the three-part structure, which was extended by Gustav Freytag to the now common five parts. They are exposition, rising action, climax, falling action, and resolution. These are the building blocks of nearly every movie or TV show you have watched.

Structure vs. Chaos

Chaos is the opposite of structure. This includes all live, unscripted forms of entertainment. Any sports match has to follow rules. But we can’t predict what kind of storyline will be created or which moments of the game will be interesting. All the excitement can be compressed into the first few minutes or it could be boring from start to finish. We just don’t know.

How can we select the interesting moments, the highlights, if there is no structure to hold on to? How can we find the bit to share, without watching hours of boring gameplay? All this with no inherent structure on which to base our selection on.

Bring in the AI

This is where we bring in the artificial intelligence (AI). We can train an algorithm to learn how humans find particular moments in videos interesting. That’s easy to say, how can we test this statement? And what are its limitations?

We need to focus on a single category and prepare a model, which understand the excitement, the high and low points in tension.

This AI will not know what we find intriguing in everyday life or other media, only in this single category. It can only learn as much as we will teach it.

We need a ton of videos, preferably with similar views on the action and with a limited number of visible parts. Additionally we don’t want to read text or numbers that might be shown in the video; we want to teach the raw excitement of the visuals.

Video games

Video games are complex virtual environments, constrained by parameters such as camera angles and available units. They are a well-known testbed for machine learning models and used heavily in reinforcement learning.

In Starcraft 2, a popular real-time strategy game, in the growing eSports market two players compete with one of three different races. They fight in terms of strategy and timely execution, averaging 300 input actions with keyboard and mouse per minute. These matches follow no structure, they can last five minutes or go on for hours. A game with a long history and dedicated fans, it is supported by a pro-gamer scene and a lively community.

A highlight or interesting moment has multiple interpretations. This project is concerned with the visual fight sequences between the armies of two players.

This defines our pipeline as receiving any Starcraft 2 video and giving the predicted interestingness over time. Input and output are defined, but what happens in between?

From video to interest

The video is split into frames and processed by the first algorithm that decides if the frame shows Starcraft 2 or commentators, the crowd or anything else.

The next neural network rates each frame on its level of attraction. The model is based on the VGG16 architecture with several convolutional layers added. The output is achieved by repeatedly feeding the network several gigabytes of data. The training concludes after several days with an acceptable level of accuracy.

An early example of a recent Starcraft 2 match

Next steps

The given output can be processed further to show only the moments where the levels of interest are at their peak. These highlights can be collected to give a summary of the video.

Take action

We need your help to refine the AI.

Contribute at highlighthero.com and win prizes!

Choose the image you find more interesting. Requires no or limited knowledge of Starcraft.

For every decision you promote your unique point of view. This helps to represent the community as a whole and to achieve better results.

Thanks so much for reading this far.

Any feedback is welcome, technical or otherwise.

Comment here or write to team [|at|] highlighthero.com

This project was started as part of the DataScienceRetreat in Berlin, where I studied in Batch 08.