In the latest of a string of bot versus human video game showdowns, Tencent AI Lab’s Strategical Collaborative AI “Juewu” has beat the world’s top human player in the mobile Real Time Strategy (RTS) game “Arena of Valor.” Tencent published the results in the paper Hierarchical Macro Strategy Model for MOBA Game AI.

The bot versus human showdown is a trend that took off in 2017 when DeepMind’s AlphaGo defeated top human player Ke Jie in the ancient Chinese board game Go. Although Go is tremendously complicated, video games bring a number of additional challenges and have become something of a testing ground for AI researchers. Just months ago, the “OpenAI Five” team of bots lost to a human team in a highly anticipated “Dota2” showdown at a Vancouver arena. OpenAI hinted their bots will be back.

Tencent AI’s Juewu competed against a team of Arena of Valor human players ranked among the world’s top one percent.

Mastering RTS games involve both macro strategies and delicate micro level execution. Tencent AI Lab says that although their current technology has made considerable progress in micro level execution, it still lags in macro strategy solutions. Compared to Go, RTS games generally present four unique challenges:

Computation complexity: the motion space and state space could lead the computation complexity up to 10²⁰,000. Go only reaches complexity of 10²⁵⁰. Multi-Agents: RTS games usually involves numbers of agents. The tasks of coordination and cooperation between agents can be key to winning the game. Information incompleteness: Go is a perfect information game; whereas many RTS games include some sort of of “Mist” feature which hides parts of the environment map to increase difficulty. Sparse and delayed rewards: A game of Go usually involves less than 361 steps, but a typical MOBA game such as Arena of Valor involves about 20,000 frames.

Computation complexity comparison between Go and MOBA games

To build “Juewu,” Tencent AI Lab designed and developed a MOBA AI Macro Strategy Architecture inspired by the way human players make strategic decisions. When playing a MOBA game, a top human player will build their own understanding of game phases (opening, landing, midgame, late-game phase). During each game phase, a human player will deploy their hero (game character) to different locations depending on game developments such as map changes and war state.

Tencent researchers formulated their macro strategy operation process as “phase recognition → attention prediction → execution”. The team proposed a two-layer macro strategy architecture, i.e., phase and attention:

The Phase layer aims to recognize the current game phase to advise the attention layer where to direct attention.

layer aims to recognize the current game phase to advise the attention layer where to direct attention. The Attention layer aims to predict the best region on game maps to dispatch heroes.

The phase and attention layers together provide high level guidance for micro level execution.

With comprehensive testing on Arena of Valor, the Juewu AI achieved a 48 percent win rate against human players.