At first, gameplay was crude. The agents scurried around — sometimes forward, sometimes backward — like confused toddlers. (To make the AI agents easy to observe, and to match the playful spirit of the game, the programmers made their creations look like little jelly monsters with oversize heads, wide smiles and bright eyes. The bots were adorable bumblers.)

The experiment went through six phases. First, motivated only by the incentive to gain points, the agents learned to chase or run away. (This was a sort of pre-tool phase.) Then, after about 25 million games, the hiders learned to make forts out of the boxes and walls. That advantage didn’t last, however. After another 75 million games, seekers learned to push a ramp to the edge of a fort, climb up and jump in. Hiders had to adopt a new strategy. In the fourth phase, 10 million games later, hiders retaliated by shoving ramps to the outer edges of the playing field — where they wouldn’t be useful — and locking them in place.

As impressive as the agents’ ingenuity was, the development of these first four strategies didn’t surprise the OpenAI team. After those 10 million games, the researchers suspected that the program had run its course. But the AI kept on changing — and learning.

Evolving Beyond Hide-and-Seek

After almost 390 million games came the fifth phase and the introduction of box surfing. During these rounds seekers learned they could still use the locked ramps by moving a box close to one, climbing the ramp and jumping onto the box. (The boxes were too high to scale without a ramp.) Once on a box, a bot could move it around the arena while remaining on top of it. This allowed it to effectively ride around, or surf, on the box, searching for hiders. This gave seekers the advantage of height and mobility. In the sixth and final phase of the game — which emerged after 458 million rounds — the hiders finally learned to lock the boxes beforehand, preventing the surfing.

The OpenAI researchers see these unexpected but advantageous behaviors as proof that their system can discover tasks beyond what was expected, and in a setting with real-world rules. “Now you see behavior … on a computer that replicates behavior you see in a real, living being,” said Lange. “So now your head starts spinning a bit.”

The team’s next step is to see if their findings scale up to more complicated tasks in the real world. Lange thinks this is a realistic goal. “There’s nothing here that prevents this from sort of going on a path where tool usage gets more and more complex,” he said. More complex problems in the virtual world could suggest useful applications in the real world.

One way to increase complexity — and see how far self-learning can go — is to increase the number of agents playing the game. “It will definitely be a challenge to deal with if you want tens, hundreds, thousands of agents,” said Baker. Each one will require its own independent algorithm, and the project will require much more computational power. But Baker isn’t worried: The simple rules of hide-and-seek make it an energy-efficient test of AI.

And he says an AI system that can complete increasingly complex tasks raises questions about intelligence itself. During their post-game analysis of hide-and-seek, the OpenAI team devised and ran intelligence tests to see how the AI agents gained and organized knowledge. But despite the sophisticated tool use, the results weren’t clear. “Their behaviors seemed humanlike, but it’s not actually clear that the way that knowledge is organized in the agent’s head is humanlike,” Mordatch said.

Some experts, like the computer scientist Marc Toussaint at the University of Stuttgart in Germany, caution that these kinds of AI projects still haven’t answered a pivotal open question. “Does such work aim to mimic evolution’s ability to train or evolve agents for one particular niche? Or does it aim to mimic human and higher animals’ ability for in situ problem solving, generalization [and] dealing with never-experienced situations and learning?”

Baker doesn’t claim that the hide-and-seek game is a reliable model of evolution, or that the agents are convincingly humanlike. “Personally, I believe they’re so far from anything we would consider intelligent or sentient,” he said.

Nevertheless, the way the AI agents used self-play and competition to develop tools does look a lot like evolution — of some variety — to some researchers in the field. Leibo notes that the history of life on Earth is rich with cases in which an innovation or change by one species prompted other species to adapt. Billions of years ago, for example, tiny algaelike creatures pumped the atmosphere full of oxygen, which allowed for the evolution of larger organisms that depend on the gas. He sees a similar pattern in human culture, which has evolved by introducing and adapting to new standards and practices, from agriculture to the 40-hour workweek to the prominence of social media.

“We’re wondering if something similar had happened — if the history of life itself is a self-play process that continually responds to its own previous innovations,” Leibo said. In March, he was part of a quartet of researchers at DeepMind who released a manifesto describing how cooperation and competition in multi-agent AI systems leads to innovation. “Innovations arise when perturbations push parts of the system away from stable equilibria into new regimes where previously well-adapted solutions no longer work,” they wrote. In other words: When push comes to shove, shove better.

They saw it happen when AlphaGo bested the best human players at Go, and Leibo says the hide-and-seek game offers another robust example. The bots’ unexpected use of tools emerged from the increasingly difficult tasks they created for each other.

Baker similarly sees parallels between hide-and-seek and natural adaptation. “When one side learns a skill, it’s like it mutates,” he said. “It’s a beneficial mutation, and they keep it. And that creates pressure for all the other organisms to adapt.”