Today, Intel is announcing the release of our Reinforcement Learning Coach — an open source research framework for training and evaluating reinforcement learning (RL) agents by harnessing the power of multi-core CPU processing to achieve state-of-the-art results. Coach contains multi-threaded implementations for some of today’s leading RL algorithms, combined with various games and robotics environments. It enables efficient training of reinforcement learning agents on a desktop computer, without requiring any additional hardware.



Since the introduction of asynchronous methods for deep reinforcement learning [1] in 2016, many algorithms have been able to achieve better policies faster by running multiple instances in parallel on many CPU cores. So far, these algorithms include A3C [1], DDPG [2], PPO [3], DFP [4] and NAF [5], and we believe that this is only the beginning. Coach includes implementations of these and other state-of-the-art algorithms, and is a good starting point for anyone who wants to use and build on the best techniques available in the field.



How do you use Coach? Start by defining the problem that you would like to solve, or select an existing one. Choose a set of reinforcement learning algorithms to use and make progress towards solving your problem. Coach enables easy experimentation with existing algorithms and is used as a sandbox for simplifying the development of new algorithms. The framework defines a set of APIs and key components used in reinforcement learning that enables the user to easily reuse components and build new algorithms on top of existing ones.

Coach is integrated with some of the top available environments, such as OpenAI Gym*, Roboschool* and ViZDoom*. It also offers various techniques for visualizing the training process and understanding the underlying mechanisms of the agents. All of the algorithms are implemented using Intel-optimized TensorFlow, and some are also available through our neon™ framework.



The Agents

Coach contains implementations for many agent types, including seamless transition from single threaded implementations to multithreaded implementations. The agents are implemented in a modular way, to allow the reuse of different building blocks for building new and more complex agents. Moreover, Coach enables writing new agents with a single worker in mind, and switching to a synchronous or asynchronous multi-worker implementation, with a minimal amount of changes.



A variety of agent types that were introduced in the past few years are implemented in Coach. This allows users to solve environments with different requirements and means of interaction with the agent, such as continuous and discrete action spaces, visual observations spaces or observation spaces that include only raw measurements.