As robots take over industrial manufacturing, specific and accurate robot control is becoming more important. Conventional feedback control methods can effectively solve various types of robot control problems by capturing structures with explicit models such as motion equations. It is however difficult to achieve sufficient accuracy and robustness for problems in modern industrial manufacturing that involve contact between the robot and its environment and friction, where controllers have to be manually adjusted.

In the paper Residual Reinforcement Learning for Robot Control, researchers from Siemens Corporation, University of California Berkeley, and Hamburg University of Technology, propose a new, residual reinforcement learning approach for solving real-world robot control problems such as friction and contact.

“Reinforcement learning (RL) methods have been demonstrated to be capable of learning continuous robot controllers from interactions with the environment, even for problems that include friction and contacts. In this paper, we study how we can solve difficult control problems in the real world by decomposing them into a part that is solved efficiently by conventional feedback control methods, and the residual which is solved with RL. The final control policy is a superposition of both control signals. We demonstrate our approach by training an agent to successfully perform a real-world block assembly task involving contacts and unstable objects.” (arXiv).

Synced invited Associate Professor and Director of the Intelligent Motion Lab at Duke University Kris Hauser, whose work focuses on robot motion planning and control, to share his thoughts on residual reinforcement learning for robot control.

How would you describe Residual Reinforcement Learning?

Residual reinforcement learning is an approach that uses a hand-engineered controller as the default action in reinforcement learning (RL). This shifts the burden of the RL system to learning the difference (residual) between the optimal control action and the controller’s action.

Why does this research matter?

In robotics, it can be difficult to engineer controllers that are able to account for changes from lab conditions to the real world. At the same time, reinforcement learning has been shown to require a considerable amount of data before converging to adequate performance. Residual reinforcement learning proposes to combine the strengths of both approaches, by relying on the engineered controller as a starting point and using RL to correct for the controller’s mistakes.

What impact might this research bring to the research community?

Traditional control approaches such as PID control, LQR, trajectory optimization, or path planning are excellent at generating highly precise robot movements, but during interactions with the external world these approaches can be brittle because hand-engineered manipulation strategies need accurate models of friction and contact that are difficult to obtain. Robots need to be much more adaptive to close the gap from the lab to the real world, and residual reinforcement learning is one mechanism that may help close that gap.

At the same time, it can take a huge amount of data for RL techniques to perform as well as traditional robot control approaches, and by leveraging a controller as a starting point, residual reinforcement learning can learn to perform a task with less data than RL from scratch. Moreover, for a complex optimal policy and a well-designed controller, the form of the residual policy may be simpler and more amenable to learning in the small sample regime.

Can you identify any bottlenecks in the research?

The main bottlenecks for this research are that it is unclear what problems benefit the most from the proposed technique, how well the hand-engineered controller needs to perform, and which aspects of a problem are best handled through a hand-engineered controller versus RL. Although the proposed technique appears to be a useful trick which could have practical impact, the theoretical insights of this work are not as well developed.

Moreover, this approach is reminiscent of techniques like model-based reinforcement learning and adaptive control that use online learning to improve the dynamics model beyond an initial guess. It is plausible to suspect that learning a model may generalize faster than learning a residual, and so more analysis will be needed to justify the proposed approach.

Finally, although the authors have made efforts to evaluate their method with a variety of initial conditions, it should be noted that the evaluated task — inserting a foam block between two other blocks — is rather simple. It remains to be seen whether the approach can work well on a more complex, real-world task.

Can you predict any potential future developments related to this research?

A current trend in reinforcement learning is the growing awareness of the challenges in applying RL to robotics problems, and one direction of research addresses these challenges by “easing” the learning problem, i.e., making it simpler for the learning algorithm to find useful patterns in the data. Residual reinforcement learning is a good example of this philosophy.

Another possible direction of research seeks to understand why some robotics problems tend to be challenging to learn compared to, e.g., vision problems. Armed with a deeper understanding, it may be possible to develop model structures that are tailored specifically to overcome the issues of hybrid dynamics of contact, policy discontinuities inherent in nonlinear optimal control, multi-modal uncertainty, and 3D volumetric geometry, amongst others.

The paper Residual Reinforcement Learning for Robot Control is on arXiv.