OpenAI, the artificial intelligence research lab backed by Elon Musk, found that AI pitted against another AI opponent could continuously learn and adapt to its foe, as well as explore new and different ways to capitalize on weaknesses.

The experiments took two versions of the same AI code, built to learn from attempting a simple task many times, and made them compete in simple virtual challenges that require complex movements, like sumo wrestling. One of the two related research papers released today focused on the method of learning between rounds, while the other studied the interaction between AI agents as they competed.

When the AI was humanoid, it figured out techniques that mirror how humans perform the activity, like crouching to gain better stability, without any coaching or prompting to do so. The AI even figured out how to deceive its opponents, luring them to the edge of the ring and then dodging out of the way as the opponent’s momentum caused it to fall.

“It does seem to be matching what humans might be doing in a similar wrestling setting, and additionally [learning] strategies like deception,” says Igor Mordatch, who led the research at OpenAI.

The basis of this research is a subset of artificial intelligence research called reinforcement learning. An AI agent is made to repeat a task over and over with slightly variations until the task can be completed. Researchers tinker with how the agent brings experience from one attempt to the next, but a large part of the research is figuring out how the agent is told that an action was good or bad, called a reward. In the sumo wrestling test, the fighters were programmed to get +1,000 points if they won, -1,000 points if they lost, and -1,000 points if the match ended in a tie.

In order to win, the agents naturally learned stances that made them more stable. They lowered their heads and torsos, and extended their arms to the side, similar to the stance associated with human sumo wrestlers. Arms were also used to hook and move opponents towards the ledge.

As the agents learned, the rewards in some tests needed to be changed. In a soccer-like game, researchers first rewarded the agents for learning to walk. After thousands of tries, the agents learned to walk, and then the reward was switched to +1,000 points for successfully defending or scoring (depending on which agent) plus bonus points for standing at the end of the round.

But since the AI’s knowledge is learned slowly over thousands of iterations, researchers say it’s difficult to track exactly how the learning takes place or why. As the AI learned to sumo wrestle, one of the agents figured out how to fake its opponent out, deceiving them into lunging forward near the edge of the ring and then stepping out of the way. But what the team doesn’t know is whether the agent predicted that strategy would help it win or it was merely an accident that got rewarded into a successful behavior.

While these exact skills or knowledge of how to walk in a specific simulation might not be useful on their own, Mordatch says this research helps further understanding of learning complex goals in competitive games, like the lab’s work in mastering competitive video game Dota 2.