A famous 2015 study showed Google DeepMind AI learnt to play Atari video games like Video Pinball to human level, but notoriously failed to learn a path to the first key in 1980s video game Montezuma’s Revenge due to the game’s complexity.

In the new method developed at RMIT University in Melbourne, Australia, computers set up to autonomously play Montezuma’s Revenge learnt from mistakes and identified sub-goals 10 times faster than Google DeepMind to finish the game.

Associate Professor Fabio Zambetta from RMIT University unveils the new approach this Friday at the 33rd AAAI Conference on Artificial Intelligence in the United States.

The method, developed in collaboration with RMIT’s Professor John Thangarajah and Michael Dann, combines “carrot-and-stick” reinforcement learning with an intrinsic motivation approach that rewards the AI for being curious and exploring its environment.

“Truly intelligent AI needs to be able to learn to complete tasks autonomously in ambiguous environments,” Zambetta says.

“We’ve shown that the right kind of algorithms can improve results using a smarter approach rather than purely brute forcing a problem end-to-end on very powerful computers.

“Our results show how much closer we’re getting to autonomous AI and could be a key line of inquiry if we want to keep making substantial progress in this field.”

Zambetta’s method rewards the system for autonomously exploring useful sub-goals such as ‘climb that ladder’ or ‘jump over that pit’, which may not be obvious to a computer, within the context of completing a larger mission.

Other state-of-the-art systems have required human input to identify these sub-goals or else decided what to do next randomly.

“Not only did our algorithms autonomously identify relevant tasks roughly 10 times faster than Google DeepMind while playing Montezuma’s Revenge, they also exhibited relatively human-like behaviour while doing so,” Zambetta says.

“For example, before you can get to the second screen of the game you need to identify sub-tasks such as climbing ladders, jumping over an enemy and then finally picking up a key, roughly in that order.

“This would eventually happen randomly after a huge amount of time but to happen so naturally in our testing shows some sort of intent.

“This makes ours the first fully autonomous sub-goal-oriented agent to be truly competitive with state-of-the-art agents on these games.”