In this game two agents, a red and a blue dot, have to gather green-dot apples.

Image: Google DeepMind/YouTube

Scientists at Google-owned DeepMind have found its AIs behave almost the way humans do when faced with scarce resources.

In a new study, DeepMind scientists plugged its AI agents, trained with deep reinforcement learning, into two multi-agent 2D games to model how conflict or cooperation emerges between selfish participants in a theoretical economy.

As DeepMind explains, they trained their AI agents to behave the way some economists model human decision making. That is, selfish and always rational.

"The research may enable us to better understand and control the behaviour of complex multi-agent systems such as the economy, traffic, and environmental challenges," DeepMind's researchers explain in a blog.

In one game two agents, a red and a blue dot, are faced with the task of gathering apples represented by green dots. The agents can simply collect apples together, suggesting cooperation, or they can 'tag' the other to prevent them collecting apples.

After several thousand rounds, they found that when there's an abundance of apples the agents collect as many as possible and leave each other alone. However, when DeepMind restricted the supply, the agents became more aggressive, figuring out that it may be optimal to block their rival to boost their chances of taking what's available.

"The Gathering game predicts that conflict may emerge from competition for scarce resources, but is less likely to emerge when resources are plentiful," they write in a new paper.

"These results show that agents learn aggressive policies in environments that combine a scarcity of resources with the possibility of costly action. Less aggressive policies emerge from learning in relatively abundant environments with less possibility for costly action," they note.

DeepMind also found that smarter agents with a larger network, enabling them to devise more complex strategies, tried to block their fellow gatherer more frequently, regardless of how much scarcity was introduced.

However, a second game called Wolfpack produced different behaviors when they were equipped to devise more complex strategies.

In this game, two wolves represented by red dots work together to capture the blue dot prey and face the risk of losing the carcass to scavengers.

If the wolves cooperate, they can get a higher reward since two wolves are better at protecting the catch than one. In this case, DeepMind found that a greater capacity to implement complex strategies resulted in more cooperation.

DeepMind found that in Wolfpack, cooperation behavior is more complex and requires a larger network size because agents need to coordinate hunting to collect team rewards.

Image: Google DeepMind/YouTube

They also found the wolves developed two different strategies for killing the prey and protecting the carcass.

"On the one hand, the wolves could cooperate by first finding one another and then moving together to hunt the prey, while on the other hand, a wolf could first find the prey and then wait for the other wolf to arrive before capturing it," they note in the paper.

DeepMind offers this explanation for why network size made the agents more competitive in the gathering game, yet more cooperative in the hunting game.

"In Gathering, defection behavior is more complex and requires a larger network size to learn than cooperative behavior. This is the case because defection requires the difficult task of targeting the opposing agent with the beam whereas peacefully collecting apples is almost independent of the opposing agent's behavior," they write.

"In Wolfpack, cooperation behavior is more complex and requires a larger network size because the agents need to coordinate their hunting behaviors to collect the team reward, whereas the lone-wolf behavior does not require coordination with the other agent and hence requires less network capacity," they write.

More on Google's DeepMind