Some AI systems achieve goals in challenging environments by drawing on representations of the world informed by past experiences. They generalize these to novel situations, enabling them to complete tasks even in settings they haven’t encountered before. As it turns out, reinforcement learning — a training technique that employs rewards to drive software policies toward goals — is particularly well-suited to learning world models that summarize an agent’s experience, and by extension to facilitating the learning of novel behaviors.

Researchers hailing from Google, Alphabet subsidiary DeepMind, and the University of Toronto sought to exploit this with an agent — Dreamer — designed to internalize a world model and plan ahead to select actions by “imagining” their long-term outcomes. They say that it not only works for any learning objective, but that Dreamer exceeds existing approaches in data efficiency and computation time as well as final performance.

Throughout an AI agent’s lifetime, either interleaved or in parallel, Dreamer learns a latent dynamics model to predict rewards from both actions and observations. In this context, “latent dynamics model” refers to a model that’s learned from image inputs and performs planning to gather new experience. The “latent” bit indicates that it relies on a compact sequence of hidden or latent states, which enables it to learn more abstract representations, such as the positions and velocities of objects. Effectively, information from the input images is integrated into the hidden states using an encoder component, after which the hidden states are projected forward in time to anticipate images and rewards.

Image Credit: DeepMind

Dreamer uses a multi-part latent dynamics model that’s somewhat complex in structure. A representation bit encodes observations and actions, and a transition bit anticipates states without seeing the observations that will cause them. A third component — a reward component — projects the rewards given the model states, and an action model implements learned policies and aims to predict actions that solve imagined environments. Finally, a value model estimates the expected imagined rewards that the action model achieves, while an observation model provides feedback signals.

Image Credit: DeepMind

In a series of experiments, the researchers tested Dreamer on 20 visual control tasks within the DeepMind Control Suite, simulation software for evaluating machine learning-driven agents. They first trained it using an Nvidia V100 graphics chip and 10 processor cores for each training run, which they say took 9 hours per 106 environment steps on the control suite. (That’s compared with the 17 hours it took Google’s PlaNet, a Dreamer predecessor, to reach similar performance.)

Image Credit: DeepMind

They report that Dreamer effectively used learned world models to generalize from small amounts of experience, and that its success demonstrates learning behaviors

by latent imagination can outperform top methods. They also say that Dreamer’s value model performs well even for short-term planning, outperforming alternative models on 16 of 20 tasks, with 4 ties.

“Future research on representation learning can likely scale latent imagination to environments of higher visual complexity,” wrote the researchers, who plan to present their work at NeurIPS 2019 in Vancouver this week. The Dreamer project’s code is publicly available on GitHub.