AI agents continue to rack up wins in the video game world. Last week, OpenAI’s bots were playing Dota 2; this week, it’s Quake III, with a team of researchers from Google’s DeepMind subsidiary successfully training agents that can beat humans at a game of capture the flag.

As we’ve seen with previous examples of AI playing video games, the challenge here is training an agent that can navigate a complex 3D environment with imperfect information. DeepMind’s researchers used a method of AI training that’s also becoming standard: reinforcement learning, which is basically training by trial and error at a huge scale.

DeepMind’s bots learned by playing 450,000 games against themselves

Agents are given no instructions on how to play the game, but simply compete against themselves until they work out the strategies needed to win. Usually this means one version of the AI agent playing against an identical clone. DeepMind gave extra depth to this formula by training a whole cohort of 30 agents to introduce a “diversity” of play styles. How many games does it take to train an AI this way? Nearly half a million, each lasting five minutes.

As ever, it’s impressive how such a conceptually simple technique can generate complex behavior on behalf of the bots. DeepMind’s agents not only learned the basic rules of capture the flag (grab your opponents’ flag from their base and return it to your own before they do the same to you), but strategies like guarding your own flag, camping at your opponent’s base, and following teammates around so you can gang up on the enemy.

To make the challenge harder for the agents, each game was played on a completely new, procedurally generated map. This ensured the bots weren’t learning strategies that only worked on a single map.

Unlike OpenAI’s Dota 2 bots, DeepMind’s agents also didn’t have access to raw numerical data about the game — feeds of numbers that represents information like the distance between opponents and health bars. Instead, they learned to play just by looking at the visual input from the screen, the same as a human. However, this does not necessarily mean that DeepMind’s bots faced a greater challenge; Dota 2 is overall a much more complex game than the stripped-down version of Quake III that was used in this research.

To test the AI agents’ abilities, DeepMind held a tournament, with two-player teams of only bots, only humans, and a mixture of bots and humans squaring off against one another. The bot-only teams were most successful, with a 74 percent win probability. This compared to 43 precent probability for average human players, and 52 percent probability for strong human players. So: clearly the AI agents are the better players.

However, it’s worth noting that the greater the number of DeepMind bots on a team, the worse they did. A team of four DeepMind bots had a win probability of 65 percent, suggesting that while the researchers’ AI agents did learn some elements of cooperative play, these don’t necessarily scale up to more complex team dynamics.

As ever with research like this, the aim is not to actually beat humans at video games, but to find new ways of teaching agents to navigate complex environments while pursuing a shared goal. In other words, it’s about teaching collective intelligence — something that has (despite abundant evidence to the contrary) been integral to humanity’s success as a species. Capture the flag is just a proxy for bigger games to come.