All the biggest labs leading AI research will have you believe that their fancy game-playing software bots will one day be applicable to the real world. The skills from playing Go, Poker or Dota 2 will be transferable to algorithms designing new drugs, controlling robots, teaching computers how to negotiate – you name it.

One startup, Kindred.AI, decided to put some of those claims to the test, particularly with respect to robotics and machine-learning software. “We wanted to know the readiness of the state-of-the-art RL algorithms on real robotic applications,” Rupam Mahmood, lead of AI at Kindred, told The Register.

Reinforcement learning (RL) is a popular AI method that teaches agents how to perform a specific task by rewarding them every time they get closer to the stated goal. For example, in the shooting game Doom, agents earn points for picking up guns and bullets but lose them for getting shot. Over time, the agent gets better at playing Doom, learning to shoot enemies quickly and focus on bagging equipment.

A crew of researchers assessed four RL algorithms on a range of real robots ordered to do different tasks. They tested the Deep Deterministic Policy Gradient (DDPG) algorithm, soft-Q learning - both developed by DeepMind, OpenAI’s Proximal Policy Optimization (PPO), and the Trust Region Policy Optimization (TRPO) built by researchers at the University of California, Berkeley.

They used the UR5 robot, a commercial mechanical arm, a Dynamixel MX-64AT, that controls a specific motion, and Create 2, a disc-like machine used for vacuuming. Tasks included reaching and tracking objects as well as docking to a charging station.

Testing each robot on each different algorithms for the specific task required running more than 450 independent experiments that took over 950 hours. It’s painstaking work, and all the results and code have been published on arXiv and GitHub.

We’ll spare you the nitty gritty details, but DDPG performed the worst and TRPO was the best. Success boils down to how robust the algorithms were. In other words, how sensitive each one was to changing hyperparameters - the external conditions the AI has to deal with. Deep learning systems work well under specific conditions set by developers, these hyperparameters are carefully tuned to help them learn patterns from data.

It's all about those pesky hyperparameters

“Hyperparameter sensitivity doesn’t matter as much in the lab, where you can just try a bunch of values and pick the best one, but if we’re talking about robots that learn out in the world, we need to be comfortable with the hyperparameter choice we use,” Mahmood said.

"An example is running a machine learning model real-time to operate a self-driving car to adapt to new experiences, where we would require algorithms that do not lead to catastrophic failures due to a hyperparameter choice."

For example, the reflecting light off a speed sign might obscure its visibility and so a self-driving car might not know it has to slow down. In fact, hyperparameter choice is so important that it ends up making a bigger impact than the choice of the algorithm itself in many cases.

It also means that the standard method of pre-programming robots using controllers are still more effective than using current RL techniques in most tasks. But that doesn’t mean there’s no point to RL at all.

Mobile, on wheels, or in the cloud... how do you want to do AI? READ MORE

“Naturally, outperforming scripted programs [will] be easier in tasks, where a scripted or engineering solution is not obvious or readily available. For example, learning to grasp and manipulate arbitrary objects in dynamic situations, [would require] scripting to envision and account for numerous plausible situations,” Mahmood told El Reg.

“The scripted programs were developed based on decades of scientific, technological and engineering advancements, whereas the RL algorithms started from tabula rasa, knowing nothing about the task and learned a solution in a couple of hours.”

It’ll be awhile before RL can catch up yet. There are also hardware challenges when training robots. The algorithms encourage agents or robots to explore their local environments, they make incremental improvements failing often before they’re able to learn a specific task. This requires massive amounts of computation, and during the experiments the robots frequently overheated, failed, or even encountered sillier problems like tangled cables.

Mahmood is optimistic however, and believes the turning point will come when RL performs comparably as well as traditional programs. “At that point, RL will start to become more cost-effective than scripting by human experts. These algorithms are not too far from some use cases in robotics," he explained.

"In fact, it wouldn't be surprising to see some applications soon based on the current algorithms. One of the main steps we need here is an honest appreciation of the difference between learning in simulated and physical systems and the difficulty of learning with the latter.” ®

We'll be examining machine learning, artificial intelligence, and data analytics, and what they mean for you, at Minds Mastering Machines in London, between October 15 and 17. Head to the website for the full agenda and ticket information.