Deep reinforcement learning — an algorithmic training technique that drives agents to achieve goals through the use of rewards — has shown great promise in the vision-based navigation domain. Researchers at the University of Colorado recently demonstrated a system that helps robots figure out the direction of hiking trails from camera footage, and scientists at ETH Zurich described in a January paper a machine learning framework that aids four-legged robots in getting up from the ground when they trip and fall.

But might such AI perform just as proficiently when applied to a drone rather than machines planted firmly on the ground? A team at the University of California at Berkeley set out to find out.

In a newly published paper on the preprint server Arxiv (“Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight“), the team proposes a “hybrid” deep reinforcement learning algorithm that combines data from both a digital simulation and the real world to guide a quadcopter through carpeted corridors.

“In this work, we … aim to devise a transfer learning algorithm where the physical behavior of the vehicle is learned,” the paper’s authors wrote. “In essence, real-world experience is used to learn how to fly, while simulated experience is used to learn how to generalize.”

Why use simulated data? As the researchers note, generalization is strongly dependent on dataset size and diversity. Generally speaking, the greater the quantity and diversity of the data, the better the performance, and acquiring real-world data is both time-consuming and expensive. But there’s a problem with simulated data, and it’s a big one: It’s of inherently lower quality with respect to flight data — complex physics and air currents are often modeled poorly or not at all.

The researchers’ solution was to leverage real-world data to train the dynamics of the system, and simulated data to learn a generalizable perception policy. Their machine learning architecture comprised two parts: a perception subsystem that transferred visual features from simulation, and a control subsystem fed with real-world data.

To train the simulation policy, the team used Stanford’s Gibson simulator, which contains a large variety of 3D-scanned environments (the researchers gathered data in 16) and modeled a virtual quadcopter with a camera in such a way that actions directly controlled the pose of the camera. They had 17 million simulation-gathered data points when all was said and done, which they combined with 14,000 data points captured by running the simulation-trained policy in a single hallway on the 5th floor of Cory Hall at UC Berkeley.

With just one hour of real-world data, the team demonstrated that the AI system could guide a 27-gram quadcopter — the Crazyflie 2.0 — through new environments with lighting and geometry it had never encountered before, and help it to avoid collisions. Its only window into the real world was a monocular camera; it communicated with a nearby laptop via a radio-to-USB dongle.

The researchers noted that models trained for collision avoidance and navigation transferred better than task-agnostic policies learned with other approaches, like unsupervised learning and pretraining techniques on large image recognition projects. Moreover, when the AI system did fail, it was often “reasonable” — in 30 percent of trials with curved hallways, for instance, the quadcopter collided with a glass door.

“The main contribution of our [work] is a method for combining large amounts of simulated data with small amounts of real-world experience to train real-world collision avoidance policies for autonomous flight with deep reinforcement learning,” the paper’s authors wrote. “The principle underlying our method is to learn about the physical properties of the vehicle and its dynamics in the real world, while learning visual invariances and patterns from simulation.”