by Changbai Li and Jan Dornig

It’s almost the anniversary of the last Gamejam we participated, so just in time and way too late, we finally wrote it up to share with you how we tackled our one question: how can Machine Learning be used as a game mechanic.

A 48 hour time-frame, and a two man team that can’t actually write machine learning algorithms from scratch would make this quite a hard sell but thanks to Unity ML agents there is hope.

Unity made amazing progress in enabling Machine learning(ML) experiments in their engine through the Unity ML Agents platform and provides different ML models for programmers to deploy fairly easily in games. When looking through the examples online, we were searching for possibilities to put the player in touch with the learning properties of the ML models- we wanted the player to

Witness meaningful learning in the algorithm Have as much impact as possible on the ML performance

Reinforcement Learning vs. Demonstrative Learning

Unity ML agents offer different learning models to choose from. Overall, what’s interesting for us were these two categories — reinforcement learning(RL) and demonstrative learning(DL).

Reinforcement learning

Typically, RL is one method that is often used with games. And the fact is that games are used to test RL algorithms, rather than the method being used to support games.

RL is done by rewarding the agents for behaving the way we want them to, and punishing when they don’t. — In short, I get points for accomplishing a goal and negative rewards when I fail — and since that’s the logic for a lot of games, the connection is pretty obvious. Through this, the algorithm slowly learns through trial-and-error the best strategies to maximize rewards/points/coins/wins. Since games often have these mechanics built in, they can often be directly utilized. But more difficult game scenarios also call for customized reward systems that help the AI find its way, rather than stumble in the dark for too long.

Apart from the actual math behind it all, this involves setting up reward-punishment systems, and importantly what the model/agents can sense/gather about the environment.

These are some of the demos available for RL from Unity. The way they are set up is usually that there is no human interference after the stage and model has been set.

Now, for an actual game, we of course want the player to interact in meaningful ways and enjoy the experience, for this to happen, we would have to look at different ways to create a human-in-the-loop system, or something close to what is being called Interactive Machine Learning. These systems are mostly defined by having human agency within them. And to make a good system with human agency, it needs to have observable progression, adjustable components, and feedback mechanisms.

Now some things we could imagine:

Player as Architect: e.g. Platformer game where the player constructs different levels with increasing difficulty to teach the AI increasingly complex movements with a given “end-level” that it has to master. A bit like the recent Unity challenge where they asked participants to write an AI than can solve different simple puzzles that get increasingly more difficult from one level to the next- but rather than writing the Ai directly, you would act as a trainer through preparing challenges for the AI. Player as Judge/Reward system: e’g. Have the player actually provide the rewards and punishments in real time. You might remember training your creature in Molyneux’s Black and White. In these instances, you are the trainer.

These examples hint at possible game designs and mechanics, and are things we certainly would like to try, but within the 48hour sprint, it seemed unclear whether we would be able to make such a system work, since the interactive parts would have to be implemented. At the same time it was very unclear if the AI would be able to learn in a meaningful way with the limited amount of feedback that can be given by a human.

When it comes to making judgements about how good the AI is doing, most unity examples simply look at how much progress is made within a certain time frame. We would really love to see what would happen if it’s a human who gives feedback on whether an agent is on the right track. Would this approach perform better? If you know some examples that tried that, let us know in the comments.

Demonstrative learning

To our surprise, Unity ML Agent also provides Demonstrative Learning. Instead of training the agents via rewards and punishments, this method is about giving them examples of how to execute the task. If you want the agent to push a box, for example, you can record a footage of yourself pushing the box, and the agent will follow. While we don’t need to set up a reward-punishment system, we still need to define what the agents can sense about the environment.

This training process resembles the real life process of coaching — where the coach first demonstrates how to do something, and then asks the students to repeat that action. This hints at a game mechanics where players take on the role of a coach, and teach agents via demonstration, interacting with the training process in a meaningful way. Having an easy-to-understand metaphor is also great, because that means players will understand how to play very quickly, even when they don’t have any experiences in machine learning.

One of the most important aspects for using this for a game is the speed with which the algorithm can learn, so we did some first tests. This was both to ensure technical feasibility, and also see how effective the training is — how long it takes for the demonstrations to affect the agents’ behavior. And it seemed to work.

The agents learned to push the box toward goal area, after just one demonstration.

Gamedesign

Now that we have a core mechanic that resembles a player training another, our mind naturally thought of sports, of two different coaches getting their team to play better than the other. To win our game then, you’d simply have to be better at training your team than the other player!

We started out thinking about all the things we could do with our newly trained and therefore smart little underlings. Building on the existing unity demo where the AI pushes boxes around, the first idea was to use our minions to collectively built a tower. The two players would be judged based on the height of the tower. Ideally the whole thing would be built directly by player & AI together and might lead to some funny situations where the AI keeps on building in really unstable ways with the player having to help out as an architect.

But a bit into testing, reality came crashing down as well and it became increasingly clear that what we have on our hands here, with the limited training time, is definitely closer to mindless Ai zombies than a box pushing terminator.

With a big part of the Gamejam time now gone, we focused on what we got from Unity and tried to make the best of it. The mindlessness of the henchmen-in-training were actually a cute aspect of the whole thing and reminiscent of the fun of watching the iconic yellow minions half help half destroy their master with their actions.

At this point we arrived at ZOMBOX: two teams would fight over a battle field with randomly spawning boxes, which they would have to push back to their lair to receive points. The team with more points wins.

Players control the Zombox Kings, and by controlling them, they show the minions how to push crates properly into the goal as well as make points themselves.

The system records the players actions and the input of their virtual sensors that track what the character sees around them. Part of the game mechanic is that players control when their movement is recorded and therefore used for training the zomboxes and when not. They can turn training mode on and off any time to choreograph the demonstrations carefully. Train them to attack the enemy, or train them to collect boxes, up to the player. At a later stage that might expand to training different groups of minions to perform different actions, but at this stage we realized a single group with shared behavior.

As with any good AI system, we added a feature that lets the players wipe the agents’ memory clean at any time, in case they want to restart training for the agents.

Having players pitted against each other means that the challenges they face will be quite dynamic — it will be totally up to how the opponents behave. Optimal strategies might change throughout the game, corresponding to how the opponent is playing. Therefore the player might have to teach the agents a new strategy to defeat the opponent’s strategy. The opponent could then re-train the agent with another new strategy. The feedback loop could go on forever!

Code

The ML agents toolkit is split into two parts: a python environment, managed by Anaconda trains the machine learning models; and the Unity SDK is used to run the simulations.

mlagents-learn config/online_bc_config.yaml — train — slow

Unity’s ML Agents toolkit uses “brains” to link the machine learning models, and agents in the game. The brains can record what the player is doing, use it to train the model, and let the model control agents in the game, all at the same time.

To train the neural network with this toolkit, we have to first start the python training process in a command prompt, then jump back to the Unity Editor and press play within 30 seconds. Admittedly, this is not a very friendly gaming experience; but with 48 hours and an avant-garde technology, we have to compromise.

In the newer version of this toolkit, Unity added support for the training to run with an executable. Nevertheless, we still have to start the training process manually. We hope this process can be further automated so that games like this can be ready for shipping.

So long as online_bc_config.yaml contains the brain’s configuration, and there’s an agent in the active game that uses that brain, the toolkit would start training.

The Unity examples only train a single brain each time, but there didn’t seem to be restrictions, so we tried putting in both team’s brain configuration and… hurrah! Both teams are learning from their respective player now. You can have a look at how we’ve set up the training configuration here.

Spiderbox

To allow the agents to gather information about their surroundings, we tagged everything in the scene, and then set the agents to read them using ray-casting. Much like a spider, the Zomboxes each have 8 “eyes” that can see in different directions; unlike a spider, one of the eyes look directly behind the Zombox. This represents the core input for the neural network that learns what action to take based on what it sees.

Game Assets

To keep it simple, we stuck with the overall box theme and expanded that to using a pixel/voxel style, making the game assets in MagicaVoxel before exporting to Unity. And a couple basic pixel graphics for the UI.