Model-free reinforcement learning can be used to learn effective strategies for complex tasks such as Atari games, but it usually requires a large amount of interaction, which adds significant time and cost. Earlier this month Google Brain researchers introduced SimPLe (Simulated Policy Learning) an entirely model-based deep reinforcement learning algorithm based on video prediction models. SimPLe achieves competitive results on a series of Atari games.

“In this paper, we explore how video prediction models can similarly enable agents to solve Atari games with orders of magnitude fewer interactions than model-free methods. We describe Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models and present a comparison of several model architectures, including a novel architecture that yields the best results in our setting.” (arXiv)

Synced invited Ahmet Salim Bilgin, the Founder of FinBrain Technologies and an Electrical-Electronics Engineer with a strong background in Mathematics, Statistics, Controls and Signal Processing fields, to share his thoughts on SimPLe.

How would you describe SimPLe?

SimPLe is a complete model-based deep RL algorithm that utilizes video prediction techniques and can train a policy to play a game within the learned model.

Why does SimPLe matter?

SimPLe outperforms model-free algorithms in terms of learning speed on nearly all of the games, and in the case of a few games, does so by over an order of magnitude. The best model-free reinforcement learning algorithms require tens or hundreds of millions of time steps — equivalent to several weeks. SimPLe has obtained competitive results with only 100K interactions between the agent and the environment on Atari games, which corresponds to about two hours of real-time play.

What impact might this research bring to the research community?

Humans possess an intuitive understanding of the physical processes that are represented in video games, so they can learn how the game works and predict which actions will lead to desirable outcomes. SimPLe uses video prediction in the context of learning how to play the game well and positively verifies that learned simulators could be used to train a policy useful in original environments. In this respect, SimPLe has brought a new approach to model-free algorithms by incorporating video prediction techniques. This new predictive approach can reduce the number of the steps required to learn playing Atari games directly from images of the game screen.

Can you address some bottlenecks in SimPLe?

The first one of the two important bottlenecks of the method is that SimPLe’s final scores on Atari Games are lower than the best state-of-the-art model-free methods. This is mainly because the model-based RL algorithms excel more in learning efficiency rather than final performance.

The second important downside of the model is that SimPLe’s performance also differs on a large scale between different runs on the same game. The model makes guesses when it extrapolates the behavior of the game under a new policy. The resulting policy performs well if these guesses are correct.

Can you predict any potential future developments related to this research?

The model-based reinforcement learning based on stochastic predictive models is a strong alternative to the model-free Reinforcement Learning methods.

Model-based RL algorithms still require better models to be built, as they focus more on learning efficiency rather than final performance. However, future studies can eliminate this problem and improve the performance and the robustness.

SimPLe can be very well applied to stochastic domains as it has stochastic latent variables. Also, the predictive neural network models could be used in other fields of application such as robotics and autonomous vehicles.

The paper Model-Based Reinforcement Learning for Atari is on arXiv.