Uber AI Lab has created a buzz in the machine learning community with the publication of a paper introducing a new reinforcement learning algorithm called Go-Explore. The algorithm is designed to overcome the challenges of intelligence exploration in reinforcement learning to improve performance on hard-exploration tasks.

“Two Atari games serve as benchmarks for such hard-exploration domains: Montezuma’s Revenge and Pitfall. On both games, current RL algorithms perform poorly, even those with intrinsic Motivation, which is the dominant method to improve performance on hard-exploration domains. To address this shortfall, we introduce a new algorithm called Go-Explore. Go-Explore opens up many new research directions into improving it and weaving its insights into current RL algorithms. It may also enable progress on previously unsolvable hard-exploration problems in many domains, especially those that harness a Simulator during training (eg robotics)”. (arXiv).

Synced invited Northeastern University Professor and machine learning researcher Rose Yu to share her thoughts on Go-Explore.

How would you describe Go-Explore?

Go-Explore is an algorithm developed by Uber AI Lab for addressing the effective exploration problem in reinforcement learning.

How effective is this technology?

Go Explore solved Atari’s Montezuma’s Revenge, scoring almost four times the previous state-of-the-art, which is a classic example of a difficult exploration problem.

What impact might this bring to the AI community?

The tech addresses the fundamental challenge of effective exploration in reinforcement learning (RL), which could accelerate progress of RL in complex real-world domains such as robotics.

Can you identify any bottlenecks?

In order to maintain a memory of previously visited novel states, the algorithm represents a state as a downsampled 8x11 grayscale image with game-specific knowledge. This would not generalize easily to other domains.

Go-Explore also assumes a deterministic environment is available during training, which allows it to initialize to arbitrary states. However, such an assumption is not made in most previous work.

Could you predict any potential future trends related to this tech?

Using imitation learning to robustify policies learned in a deterministic environment.

Many RL applications in the future will make use of simulators, in which case exploiting simulator properties such as deterministic environments might be plausible.

The paper Go-Explore: a New Approach for Hard-Exploration Problems is on arXiv.

About Prof. Rose Yu

Rose Yu is an Assistant Professor in Northeastern University Khoury College of Computer Sciences. Previously, she was a postdoctoral researcher in the Department of Computing and Mathematical Sciences at Caltech. She earned her PhD in Computer Science at the University of Southern California and was a visiting researcher at Stanford University. Her research focuses on machine learning for large-scale spatiotemporal data. She is generally interested in optimization, deep learning and reinforcement/imitation learning. Yu has over a dozen publications in leading machine learning and data mining conference and several patents. She received the USC Best Dissertation Award, the Annenberg Fellowship, and was named one of MIT’s Rising Stars in EECS.

Synced Insight Partner Program

The Synced Insight Partner Program is an invitation-only program that brings together influential organizations, companies, academic experts and industry leaders to share professional experiences and insights through interviews and public speaking engagements, etc. Synced invites all industry experts, professionals, analysts, and others working in AI technologies and machine learning to participate.

Simply Apply for the Synced Insight Partner Program and let us know about yourself and your focus in AI. We will give you a response once your application is approved.