Deep Learning for Robotics

Pieter Abbeel

Pieter started his invited talk by summarizing some of the key differences between supervised learning and Reinforcement Learning (RL). In essence, RL is mainly concerned with learning an effective policy to have an agent interact with the world in a way that best achieves a goal. For example, learning a policy on how to walk.

Can we teach this person how to walk from scratch?

Recently, RL has seen many success stories, such as learning to play Atari games from the raw pixel inputs, mastering the game of Go to a superhuman level, or effectively teaching simulated characters how to walk from scratch. However, one big gap between RL algorithms and humans, remains the time it takes to acquire new and effective policies. In fact, experiments have shown that humans usually need only 15 minutes to match the level of state-of-the-art algorithms trained for over 100 hours.

In order to bridge that gap, researchers are exploring ways to generate more task specific RL algorithms. To accomplish this, Abbeel and others propose we rethink how we learn from an environment. In short, we create a model that looks at an environment or a task and then designs a custom RL algorithm that can more effectively learn from it. This approach is called Meta-Learning and has shown very encouraging results. Amongst others, one impressive result is that this allows for a better generalization from one task to another. That is, models that have been trained on similar tasks manage to pick up adjacent ones quickly, much like humans do.

Some of these techniques have started to be extended to tasks such as one-shot imitation learning, a holy grail of Artificial Intelligence, which would allow an AI to learn a task by simply seeing a human perform it once (as opposed to seeing thousands of examples for most methods today). This would considerably simplify the training of models and allow them to be used for complex tasks. According to preliminary results like this paper, we are getting close to making that a reality!

Imagination-Augmented Agents for Deep Reinforcement Learning

Théophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing,

Arthur Guez, Danilo Rezende, Adria Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter Battaglia, David Silver, Daan Wierstra

Unless you’re offended by the use of “imagination” to describe an artificial neural network, this looks like an exciting way to perform reinforcement learning (RL). This work aims to surmount two key challenges currently facing RL: 1) Existing model-free methods for deep RL require a lot of data and 2) they typically don’t generalize well. In contrast, model-based methods are expensive in high dimensional environments. Seeing these tradeoffs, this team from Google DeepMind chose to take the best of both worlds. First they train a model that learns the features of the environment. Using this model as a reference, they then train an agent to interpret predictions queried from the environment model to inform their model-free agent.

Impressive work by DeepMind, showing how well their model-free RL performs under noise.

The technique was demonstrated on Sokoban, a puzzle game originally developed in the 80’s where the player moves boxes to predefined locations in an environment. The results were impressive, showing marked improvement over standard model-free techniques. A particularly intriguing result was when they degraded the environment model, the model-free agent learned more slowly, but still converged to nearly identical performance. In contrast, a Monte Carlo search method that randomly explores the environment saw catastrophic failure as the sampling couldn’t overcome the low information in the degraded environment model. It will be exciting to see how this method performs on more complicated, realistic tasks, but it isn’t hard to imagine a successful future.

Unsupervised Learning of Disentangled Representations from Video

Emily Denton, Vighnesh Birodkar

At the end of Day 3 attention around the halls was noticeably flagging from information overload, so it might have been easy to miss one of the more impressive results in generative networks. A new model named DrNet aims to disentangle the parts of a video that stay the same from the parts that change. This information can be used to generate new videos in which an object remains coherent but its pose and location change over time. The approach uses two parallel networks, one that learns the content of the video, for example what a person looks like, while the other learns the temporal variations in the content, for example the pose of the person. In order to make this work, a key finding was how to properly penalize the network learning the pose so it wouldn’t capture too much information from the content.

Videos generated from a few initial frames

The results are impressive to say the least. Applying this method to the first few frames of a video, they introduce a standard LSTM using the pose and content models to generate hundreds of new frames into the future. Although they aren’t identical to the real frames, the similarity is remarkable, and compared to other methods, the DrNet generates clean frames with very little smearing or ghosting. Even better, the model is relatively simple compared to the generative adversarial networks that have been generating a lot of buzz. Perhaps next year we’ll see a little more diversity in the generative network space.