Video game/E-sports streaming is a huge and ever rising market. In the world championship of League of Legends (LoL) last year, one semifinal attracted 106 million viewers, even more than the 2018 Super Bowl. Another successful example is Twitch, where thousands of players broadcast their gameplay to millions of viewers. Visor, a company that provides personalized game analytics to players, wants a model to estimate the winning rate of a team in real time.

An Oracle that foresees the game

There can be many use cases for the model. For example, it can provide useful feedback to players so they can improve skills; on the audiences’ side, it can be a great tool for engagement, especially attractive to those potential viewers who are not familiar with the game yet; last but not least, if a model can outperform human’s prediction, its potential is unprecedented in the betting of E-sports.

The great engagement of Data 2 TI Word Championship

Introduction to Overwatch

The game I modeled is Overwatch. Overwatch is a team-based multiplayer online shooter game. Each team has six players and each player chooses a hero (game character, like Mario in Super Mario) from a hero pool (26 heroes), and fights with the other team. Each game is played on a specific game map, which is determined before the game starts. A simple analogy is a football game, where you have two teams and they play in a specific stadium.

There are many factors that are informative to the prediction of the game, and most of them are categorical features. For example, the choices of heroes can serve as a strong indicator of the outcomes, especially in the early stage of the game. Therefore the challenge really lies in how to handle these categorical features. If I just use one-hot encoding, the feature space can easily grow beyond hundreds of dimensions. Unfortunately, it is nearly impossible to collect enough gameplay to feed this high dimensional monster.

Prediction accuracy v.s game progress. Prediction is modeled via logistic regression with one-hot encoding and feature selections. The prediction is accurate approaching the end of the game, but is nearly a random guess (0.5 accuracy) in the beginning of the game. The challenge lies in how to improve the early-stage predictions.

However, this notorious monster can be defeated by the “heroes”. This blog will focus on how to model these game characters via embeddings, and how they improve the predictions.

For more details and implementations, please refer to my Github link.