The Dota 2-playing bot team “AI OpenAI Five” have already demonstrated expert-level performance in the popular video game at Internet scale and have even learned effective human–AI cooperation skills. Now they’re gotten even stronger, as detailed by Open AI researchers in the new paper Dota 2 with Large Scale Deep Reinforcement Learning .

Open AI committed to its Dota 2 project about three years ago, and this April their bot team beat 2018 Dota 2 world champions Team OG. But the learning curve did not stop there — Open AI has since trained a new agent, “Rerun,” which has notched a 98 percent win rate against the OpenAI Five.

Dota 2 is a multiplayer online battle arena video game in which two teams of five players compete to collectively destroy the “Ancient” home base of their opponents whilst defending their own. The game presents various challenges for AI systems to deal with, such as long time horizons, imperfect information, and continuous state-action spaces. The key to solving such a complex environment was to scale existing reinforcement learning systems to unprecedented levels using thousands of GPUs over multiple months.

What’s impressive about Rerun — as also seen in the DeepMind AlphaSeries — is that model performance has increased while training time and compute requirements have decreased.

In advance of last year’s showdown with Team OG, the OpenAI Five trained on the equivalent of 10,000 years of self-play over a 10-month period. Researchers also used custom “surgery” tools about every two weeks to enable the resumption of bot training after improvements to the strongest version with minimal loss in performance and in a shorter time compared to the typical practice of retraining each new version from scratch.

“If we had trained from scratch after each of our twenty major surgeries, the project would have taken 40 months instead of 10,” note the researchers in their paper. As AI systems tackle larger and harder problems in the real world, research on enabling AI models to deal with more complex and dynamic environments will be critical.

Rerun’s training was based on the OpenAI Five’s final settings, which sped things up. Training was completed in two months, without any surgeries, and required only 20 percent of the resources used to train the OpenAI Five.

The OpenAI Five and now Rerun demonstrate that successfully scaled up, modern reinforcement learning techniques can achieve superhuman performance in competitive e-sports games. OpenAI has always stressed that its long-term goal is to tackle general and real-world problems, and that video game environments are platforms for research toward the development of artificial general intelligence (AGI).

The paper Dota 2 with Large Scale Deep Reinforcement Learning is available here.