I’ve been playing with the new Unity3D Machine Learning system for a few days now and made a little progress. I wanted to share the steps I found to get a newly created agent setup and trained to complete a basic task. In this post, you’ll see how to setup a basic agent with the goal of reaching a randomly chosen number using nothing but reinforced machine learning. We’ll use the new Unity ML Agent system and tensorflow to create and train the agent to complete the task and discuss ways to extend this into a real game AI.

Setup Tensorflow and UnityML

If you don’t have tensorflow setup yet, you’ll need to follow the steps outlined here: https://unity3d.college/2017/10/25/machine-learning-in-unity3d-setting-up-the-environment-tensorflow-for-agentml-on-windows-10/

Video Version

Want to see this all in video? Here ya go:

AgentML Scene Setup

Once you’ve gone through that process, open the Unity project and create a new scene.

The first thing we need is an academy. Create a new gameobject, name it “NumberAcademy“.

Add the “TemplateAcademy” component to the “NumberAcademy“. Our setup doesn’t need the academy to do anything special, so we can start with the basic blank academy provided in the template.

Under the Academy, create another child gameobject. Name it “NumberBrain“.

Add a Brain component to it.

Set the State & Action size variables to 2.

Set the Action Space Type to Discrete. We’ll be using 2 discrete actions (up or down) in our project. We use discrete because these are represented as integers.

Set the State Space type to Continuous. We’ll be tracking two floats, for state, so we use continuous.

Set the Brain Type to “Player”

Add 2 actions. Choose any 2 keys you want (I went with A & B), but set the Values to 0 and 1. The key bound to value 0 decrements the #, the key bound to 1 will increment it.

The NumberDemoAgent Script

Create a new script named NumberDemoAgent.cs

Set the base class to Agent (replace the : MonoBehaviour with : Agent)

Add the following fields:

The currentNumber and targetNumber fields are the most important here. Everything else is just for debugging and visualizing.

Our agent will pick a random targetNumber and try to get the currentNumber to our target using our increment and decrement commands.

Next we need to override the CollectState method like this:

Here, we’re returning our two floats for current and target number as the state of our agent. Notice how this matches up with our 2 state variables on the brain and they’re floats which is why we have it set to continuous state instead of discrete.

For our agent to train, we need to select random target numbers. To do that, we’ll override the AgentReset() method like this:

The final and most important part we need is the AgentStep() method. This is where we take in actions (aka input), perform some tasks (respond to the actions), and reward our agent for successful choices.

The first thing you’ll see is our text update. This is only for debugging / visualizing. It allows us to see the current #, the target, and the # of times we’ve successfully solved the problem (reached the target number).

Next up is the switch where we look at the action and perform our task. In this case, we either respond to action 0 by decrementing the current number, or to action 1 by incrementing it. Any value out of that shouldn’t happen, but if we get one, we just ignore it and return.

Then we move our cube based on the currentNumber (using it for the x offset). This cube again is only for visualizing, it has no impact on the actual logic or training.

We then check the currentNumber against some known limits. Since we choose a random number between -1 & 1, if we reach -1.2 or +1.2, we can consider it a failure as it’s definitely going in the wrong direction. In that case, we set the reward to -1 to denote a failure, then mark done as true so the agent can reset and try again.

Then finally, we check to see if the currentNumber is within 0.01 of the target. If so, we consider that a match, set the reward to 1.0 for a success, and mark it as done. We also increment the solved counter for debugging purposes (it’s nice to see how many times it’s been successful).

Here’s the complete script:

Setup The Agent

With the script ready, we need to create a new gameobject and name it “NumberDemoAgent”.

Attach the NumberDemoAgent script to it and assign the brain.

Next create a Text object and place it where you can see it (ideally big in the middle of the screen).

Assign the text object to the NumberDemoAgent.

Create a Cube and a Sphere and assign them to the NumberDemoAgent as well (these will help you see what’s going on, much easier than reading #s).

Testing in Player Mode

Now press play. You should be able to move the cube left and right with your two hotkeys (remember I went with A & B for the hotkeys).

When you get the box to the sphere, it should increment the solved count and reset. If you go too far the wrong way it should also reset (remember that 1.2 limit).

Training

Once it works in player mode, select the brain and change the “Brain Type” to “External”

Save your scene and build an executable where the scene is the only thing included (with debug mode enabled).

For your output folder, choose the python subdirectory of your ml-agents project (included when you downloaded or cloned the source project). For example, mine is located here: C:\ml-agents\python.

Remember the name you give it, you’ll need that in just a minute.

Anaconda / Jupyter

Launch an anaconda prompt.

Change directory to the python folder you just built into. ex. “cd c:\ml-agents\python”

Enter the command “jupyter notebook” (you may need to hit enter a 2nd time btw)

You should be prompted with a web interface shortly after that looks like this:

Change the highlighted parts to match. On the env_name, don’t just put in “numberdemo”, use the name that you built your executable with. Buffer_size and batch_size you can copy though (it’s important to note that these #’s were only found by testing/trying, even after getting it working, I still barely understand what’s going on with them).

Once you’re done editing the hyperparameters, run the steps in order.

Start with step 1 & 2. (the * disappears and a # appears in the [*] when it’s done.

When you run step 3, you should see a window appear for your game (a small window though). The first time, you’ll probably also get a windows permissions dialog, make sure to allow it.

Once you start step 4… WAIT.. and watch the results come in (first one may take a minute so be patient)

Once it’s saved a couple times, hit the stop button. Then move on to step 5 and run it. This will export your training data to a .bytes file in the “python/models/ppo” subfolder.

Copy the .bytes file (again it’ll be named to match your executable name) and place it in your Unity project somewhere.

Select the brain and set the “Brain Type” to “Internal”.

Assign the .bytes file to the “Graph Model” field.

Save and press play!

Conclusions

This is a pretty simple sample, meant to just help get a basic understanding how how this system works. I’m excited to see where it goes though and build bigger more interesting projects to control game AI and make interesting gameplay / bots.

Reference

Unity ML Agents GitHub – https://github.com/Unity-Technologies/ml-agents

HyperParameters Doc – https://github.com/Unity-Technologies/ml-agents/blob/master/docs/best-practices-ppo.md

Machine Learning Playlist –https://www.youtube.com/watch?v=qxicgknzUG8&list=PLB5_EOMkLx_Ub1A4iHoDUx7vg37sVoL-E