We are happy to announce the release of the latest version of ML-Agents Toolkit: v0.4. It contains a number of features, which we hope everyone will enjoy.

It includes the option to train your environments directly from the editor, rather than as built executables, making iteration time much quicker. In addition, we are introducing a set of new challenging environments, as well as algorithmic improvements to help the agents learn to solve tasks that might previously only be learned with great difficulty or in some cases not at all. You can try out the new release by going to our GitHub release page. More exciting news – we are partnering with Udacity to launch an online education program – Deep Reinforcement Learning Nanodegree. Read on below to learn more.

Environments

We include two new environments with our latest release: Walker and Pyramids. Walker is physics-based humanoid ragdoll and Pyramids is a complex sparse-reward environment.

Walker

﻿

The first new example environment we are including is called “Walker.” It contains agents which are humanoid ragdolls. They are completely physics-based, so the goal is for the agent to learn to control its limbs in a way that can allow it to walk forward. It learns this… with somewhat humorous results. Since there are many degrees of freedom in the agent’s body, we think this can serve as a great benchmark for Reinforcement Learning algorithms that research might develop.

Pyramids

﻿

The second new environment is called “Pyramids.” It features the return of our favorite blue cube agent. Rather than collecting bananas or hopping over walls, this time around the agent has to get to a golden brick atop a pyramid of other bricks. The trick, however, is that this pyramid only appears once a randomly placed switch has been activated. The agent only gets a positive reward upon reaching the brick, making this a very sparse-rewarding environment.

Additional environment variations

Additionally, we are providing visual observation and imitation learning versions of many of our existing environments. The visual observation environments, in particular, are designed as a challenge for researchers interested in benchmarking neural network models which utilize convolutional neural networks (CNNs).

To learn more about our provided example environments, follow this link.

Improved learning with Curiosity

To help agents solve tasks in which the rewards are fewer and far between, we’ve added an optional augmentation to our PPO algorithm. That augmentation is an implementation of the Intrinsic Curiosity Module, as described in this research paper from last year. In essence, the addition allows the agent to reward itself using an intrinsic reward signal based on how surprised it is by the outcome of its actions. This will enable it to more easily and frequently solve very sparse-reward environments, such as the Pyramid environment described above.

In-Editor training

One feature which has been requested since the announcement of ML-Agents toolkit is the ability to perform training from within the Unity Editor. We are happy to be taking the first step toward that goal in this release. It is now possible to simply launch the learn.py script, and then press the “play” button from within the editor to perform training. This will allow training to happen without having to build an executable and allows for faster iterations. We think this will save our users a lot of time, as well as shortening the gap between traditional game development workflows and the ML-Agents training process. This is made possible by a revamping of our communication system. Our improvements to the developer workflow will not stop here though. This is just the first step toward even closer integration with the Unity Editor which will be rolling out throughout 2018.

TensorFlowSharp upgrade

Lastly, we are happy to share that the TensorFlowSharp plugin has now been upgraded from 1.4 to 1.7.1. This means that developers and researchers can now use Unity ML-Agents Toolkit with models built using the near-latest version of TensorFlow and maintain compatibility between the models they train and the models they can embed into Unity projects. We have also improved our documentation around creating Android and iOS executables which take advantage of ML-Agents toolkit. You can check it out here.

Udacity Deep Reinforcement Learning Nanodegree

We are proud to announce that we are partnering with Udacity on a new nanodegree to help students and our community of users who want a deeper understanding of reinforcement learning. This Udacity course uses ML-Agents toolkit as a way to illustrate and teach the various concepts. If you’ve been using ML-Agents toolkit or want to know the math, algorithms, and theories behind reinforcement learning, sign up.

﻿

Feedback

In addition to the features described above, we’ve also improved the performance of PPO, fixed a number of bugs, and improved the quality of tests provided with the ML-Agents codebase. As always, we welcome any feedback which you might have. Feel free to reach out to us on our GitHub issues page, or email us directly at ml-agents@unity3d.com.