Deep Reinforcement Learning (DRL) has been called a breakthrough technology. Heralded as a step forward in Artificial Intelligence, it has racked up some very impressive achievements in a short time. It is now the leading algorithm used in a number of games, beating top professionals in Poker, Chess, Go, and some competitive video games. Despite all that success, DRL is slow to take hold in the business world. In this article I lay out the trajectory DRL has been following so far, the obstacles it has encountered along the way, and explain where it may be going from here.

On past vacations or business trips to unfamiliar places, you may have unwittingly done some reinforcement learning. Imagine the first night you ate your first bland, overpriced meal at the hotel lobby restaurant. That poor dining experience served as negative reinforcement; you are unlikely to eat at the hotel again. Over the next few days you went out exploring new restaurants and found a few places you loved, returning to some multiple times. Those experiences served as positive reinforcement. Congratulations! You have successfully done reinforcement learning. At first you knew nothing, so you tried things at random. Eventually, you learned what decisions led to good outcomes. By the end of the trip you could reliably meet your goal of finding a good meal.

The basic trial-and-error concept behind reinforcement learning was described in 1948 by Alan Turing. However, Deep Reinforcement Learning has only recently entered a golden age. In December 2013 researchers combined reinforcement learning with Deep Neural Networks. They created a single algorithm that could learn to play eight completely different Atari games, often scoring better than human gamers. The world took notice and DRL became a very hot topic among researchers. In October 2015 a DRL algorithm beat the European champion in the game of Go. This came as a surprise to many experts who had expected such a feat to be decades away. Simultaneously researchers made progress on robotics tasks by demonstrating an algorithm that could learn to walk by controlling the joints of simple skeletal creatures.

In the years since 2015, DRL algorithms have become more capable, more reliable, and faster to train. However, critics have found DRL’s impact on industry to be underwhelming. Earlier this year Alex Irpan laid out his critical case in his blog post entitled “Deep Reinforcement Learning Doesn’t Work Yet”. He reviewed a number of technical challenges, but closed with a simple observation: “… it’s hard to find cases where deep RL has created practical real world value.”

In recent years a number of companies made interesting demos and published insightful academic papers. Audi demoed a self-parking remote control car that was trained with DRL. MobilEye released papers using DRL for a toy autonomous driving task. OpenAI published a number of interesting papers that made use of a one-armed mobile robot from Fetch Robotics that could pick up and move small blocks. There are many more than I could mention here, and while impressive, they don’t yet create the real world value you might expect from a “Top 10 Technology of the Year” (awarded by MIT Technology Review).

Recently however, DRL’s impact has started to live up to its potential. One such example of this is my company’s partnership with Siemens. Together we created an algorithm that helped generate concrete business value by calibrating Siemens CNC manufacturing equipment more than 30x faster than an expert human operator could. While the 30x speed-up is significant, the most important thing to remember is that this process was done autonomously. In the past, a machine that lost calibration would be out of service until an expert operator could be flown on-site. Decoupling the frequency of calibration from the availability of expert operators opens up entirely new business models. To my knowledge this is one of the first examples of DRL transforming the value proposition of a real-world commercial system.

We are now entering the age of profit-making DRL. What took so long for this technology to make the leap from board games to the boardroom? In my view, there have been two limiting factors. First, the ecosystem of products and services to support DRL was lacking. While many popular machine learning techniques can learn from static data sets, DRL requires an interactive process of trial and error. Training is almost always conducted by interacting with simulators, which is much more scalable that collecting data from real-world systems. More importantly, simulators provide the algorithm with the opportunity to make mistakes without causing real damage. Many companies already invest in simulation technology. However, their simulators are generally made to be used by a single human engineer at a desk, rather than by swarms of machine learning algorithms in the cloud.

An early leader in addressing this problem is Unity, which is rolling out support for machine learning as a core part of their simulation engine. Similarly, other tools integrate legacy and emerging simulators directly into deep reinforcement learning platforms. Removing a barrier like simulation integration is just one way to help users exploit the massive potential of deep reinforcement learning.

The second limiting factor has been a lack of focus on the role of the subject matter expert — the person who actually has a decision-making problem to optimize. In most cases that person is not a DRL expert, nor should they have to be. For example, Siemens’ CNC machine calibration algorithm was trained by mechanical engineers using a web interface. They didn’t need to dig into the guts of the algorithms or undergo formal DRL education. Instead, it was made possible through a ‘machine teaching’ problem-solving approach. It focuses on which tasks needs to be taught and the best way to teach each one, rather than the details of the algorithms.

In hindsight it’s obvious that increasing the usability of DRL for subject matter experts is critical. This has actually often been discouraged in academic research settings. It is often seen as a shortcoming when DRL algorithms take advantage of what human experts already know about solutions to problems. In academia it is preferable to use algorithms that start with zero knowledge. That mindset makes sense for fundamental computer science research: the more general your work is, the more easily others can build on it. In applied settings however, striking the right balance of subject matter expertise and state-of-the-art algorithms can result in faster learning and better results.

DRL is no longer just for playing games and writing academic papers. Over the next few years many of the obstacles that previously held back applied DRL will be eliminated. You will see a steady stream of use cases emerge that yield significant business impact. The first DRL killer apps, like machine calibration, are emerging and many more will follow.

About the Author

Andrew Vaziri is a Senior Artificial Intelligence Engineer at Bonsai where he is building the algorithms behind the world’s first AI platform built on the concept of machine teaching. Andrew has a cross-functional background having worked as an electrical engineer, controls engineer, embedded systems engineer and roboticist before specializing in Deep Reinforcement Learning. Prior to joining Bonsai, Andrew worked on projects at various robotics startups, Alphabet’s X and Apple’s Special Projects Group​, where he focus​ed​ on deep learning and particularly reinforcement learning​.

Sign up for the free insideBIGDATA newsletter.