Gyroscope’s AI v AI with 8 Street Fighter II characters

We’ve spent a lot of time exhibiting at and attending various developers conferences, and last week we attended Samsung Developer Conference (SDC). One thing we’ve always found is that it is easy to have a boring booth; if people want to know about your product, the internet has made the traditional free t-shirt + product flyer obsolete. For SDC, we knew we didn’t want a boring booth — after all, we had to be at the booth ourselves for two full days! So we did the obvious thing: Used Gyroscope’s AI to play and win at Street Fighter II Turbo on SNES, and then held a tournament between all the characters that Gyroscope learned how to play.

Gyroscope’s AI doesn’t normally play videos games, nor did we have a SNES SDK. So, before the conference, we figured out how to extract game information from inside Street Fighter II Turbo, built the Gyroscope SNES SDK, then pitted the Gyroscope AI against in-game bots in thousands of games while we tweaked the AI for this special application. At the conference, we held a Final Four style single-elimination bracket of each character. We asked the conference attendees to pick which character they thought would win; those that picked correctly participated in a raffle for an SNES Classic. Our AI performed admirably and two attendees walked away with a new SNES Classic!

What follows below are the details of the AI and the event. If you want to compete against our AI, either with another AI or as a human and learn what happens next, sign-up!

Building the AI

First, we had to figure out what problem we were actually solving. We cast the problem of playing Street Fighter II as a reinforcement learning problem (one of the problem types that Gyroscope’s AI supports). In a reinforcement learning problem, the AI observes the world, selects an action to take, and receives a reward for that action. The AI’s goal is to maximize its reward over time given what it has observed in the past by taking optimal actions. Before we could start applying our AI, we needed to define the observations (i.e., what the AI “sees”), actions, and rewards for Street Fighter II.

Observations

You can think of these as what the AI “sees” in the environment. When a human looks at the game, they see each character, they see them jumping, moving, kicking, etc. They also see their health meter and the timer. We needed to distill this information into a format the AI can understand, a format called the “observation space”. In reinforcement learning, there are two commons ways to think of the observation space. The traditional approach is to measure specific signals that, we, as humans, believe are pertinent to the problem at hand. The modern approach is to give an AI images of the environment after each action and let it determine the important elements in the image. This modern approach is often considered the better approach because its more generic and makes less assumptions about feature importance. However, this approach often requires longer training time. Given time constraints, we chose the traditional approach and defined the observation space by hand.

Specifically, we defined the observation space as:

X and Y coordinates of each player

Health of each player

Whether each player is jumping

Whether each player is crouching

Move ID for each player

Absolute difference in X and Y coordinates between players

Game clock

Examples observations we needed from the game

Note that this observation space is huge! There are trillions, if not more, of unique observations.

Actions

Once the AI observes the environment, it must act. The simplest way to characterize the actions available are by considering the buttons on a Super Nintendo controller: Up, Down, Left, Right, A, B, X, Y, L, R. A single action, then, is a combination buttons being pressed. If we consider every possible combination of button presses, that would create 1024 (2¹⁰) possible actions. That is a lot of possible actions! It would take a while for an AI to learn which actions work and which do not, though the AI would eventually learn. However, any Street Fighter II player knows that not all buttons can be pressed at all times. Further, many moves evolve over sequences of button presses.

Directional controls (from the SNES SF II Turbo instruction manual)

Button controls (from the SNES SF II Turbo instruction manual)

Another way to consider the action space is the set of moves available (e.g., high kick, throw, uppercut, etc). The AI could select a move and then we would translate that move into button presses. Determining the moves for each character would take a while (lots of googling and playing) and would be required for each unique character. Again, for sake of training time, we simplified the action space to the combination of one directional control press and one button control press (e.g., “Up + A” or “L”), with each press being optional. This formulation reduced the action space to 35 possible actions. Note that more advanced moves and combos can still evolve over time, but they were left to the AI to discover!

Rewards

Finally, once an action is taken, the AI receives a reward. When humans play a game, they have a general perception about how well they doing, which is supported by things like health level and damage dealt. AIs need that perception boiled down to a single number (usually) so they can maximize it. We selected health gap in each frame as the reward. So, at each observation, the AI receives a reward equal to the health gap between the players. For example, if the AI acts by kicking the opponent for 10 damage, the health gap after will be 10 and the AI will be awarded that amount. If the AI does not take an action after the next observation, it will still be awarded another 10 for doing “nothing”. Why? Because it has maintained that health gap. Alternatively, if the AI is kicked and does not block, the health gap decreases. In fact, the gap can be negative and that’s a sign that things aren’t going well for the AI.

Reward for Dhalsim (a game character) during a Street Fighter training session

AI for AI

What we’ve discussed above is the final formulation of the problem we used in the competition. We also tweaked parameters in our AI system. Gyroscope’s proprietary AI is an algorithm of algorithms. It figures out which algorithm to use for each problem. With so much information in-hand about the Street Fighter problem, we short-circuited that loop and selected DQN as the reinforcement learning methodology, with several modifications, most notably, the absence of an image-based observation space. DQN uses a model to predict which actions are optimal to take instead of testing and knowing every possible action given every possible observation — doing such an exploration is nearly impossible given the size of the observation space. In another post, we’ll discuss the model in detail along with alternatives and show their effect on the performance of the AI.

The emulator glue

Before we could train the AI, we had to connect it to Street Fighter. Gyroscope is accessible via SDKs for iOS and Unity. We did not (yet!) have an SNES SDK, so we needed to find tools that could help us instrument an SNES game such that we could use our technology to the play those games. Fortunately, the tool assisted speedrun community — the folks who try to win a game as fast as possible, often by going frame by frame through a game looking for bugs in the game that allow them to skip ahead — has built amazing tools for interacting with classic game consoles.

BizHawk, about which we can’t say enough good things.

It is not just the emulator we needed; we also needed tools around the emulator core. We found BizHawk, which supported many emulator cores, including the SNES cores. BizHawk gave us a number of critical tools:

A Lua language scripting interface that gave us frame-by-frame control of games;

A suite of console memory watching tools which lets one inspect the game memory (either all of it or specific addresses);

The ability to run with no speed throttling and no display showing, thereby maximizing the frame rate of the game;

The BizHawk source code.

For Street Fighter specifically, the Lua interface allowed us to send joypad button presses, read button presses, read memory locations, and control the core emulator. The memory inspector gave us the ability to read the health of our opponents, read the moves the opponent is making, and other data that is required for our observations. Note that we only used signals that a human player has; we didn’t let the AI know anything a human doesn’t know.

Honestly, we can’t say enough good things about BizHawk. Not only is the product first-class, but the source code is extremely clean, readable, and extensible. It was a pleasure to work with this codebase — the source code became critical later, as you’ll see.

Reading the RAM: Finding the observations in SNES WRAM

We knew we’d need to figure out a few critical pieces of data to make our observation space:

The X & Y positions of the players

The health of the players

What move the player was doing (e.g., kind of punch, kick, throw, or special move)

The amount of time left on the fight clock

These are all the things a human knows when playing the game. We made an educated guess that these values were in the SNES RAM somewhere.

BizHawk RAM inspector

The SNES memory layout is well documented, and there’s not a lot of game RAM to look through. We used the BizHawk tools to monitor the change in RAM values between frames in order to find addresses that changed when we took actions like pressing left on a controller. It took us a few hours but we ended up locating all the data locations specified earlier. We were able to create a mapping from RAM to observation that looked like:

public int get_p1_health()

{

return _currentDomain.PeekByte(0x000530);

} public int get_p2_health()

{

return _currentDomain.PeekByte(0x000730);

}

And so on. This code let us access these values between frames and build a data structure of the entire game observation.

First try: Write the Gyroscope SDK in lua

BizHawk embeds a Lua scripting engine in the application and exposes a number of emulator functions to this engine. So, it was logical that the first thing we tried was to write our Gyroscope SDK in Lua. We wrote a Lua library for accessing all the memory locations that are later translated into an observation and for sending joypad presses to the emulator.

But, how to get the data out of Lua and into Gyroscope? The Lua interface doesn’t support any network I/O! Given that our service runs in the cloud, that was a big problem. The only I/O we could do from Lua was file I/O or SQLite I/O.

We wrote some python code to read a game observation from a file written by Lua and send it to Gyroscope, but it was very hard to synchronize with Lua and getting the actions (button presses) back to Lua was buggy. Plus, it was super slow, even after we moved the files to a RAM disk. We tried the same thing with SQLite but ran into the same speed problems.

At this point, we decided to move the SDK code from Lua to a native BizHawk tool; these tools are written in C#, the language that all of BizHawk is written in. We kept the python code we had written because it gave us an easy interface to our service (which speaks gRPC) and it provided synchronization between AI players playing each other (making sure they are on the same frame and so on). We called this python the EmulatorController.

Got it: Doing it all in C#

BizHawk provides an easy C# interface to implement tools that control various aspects of the game and emulator. We used this interface when porting our Lua code to C# and quickly had a working tool for manipulating Street Fighter in C#.

In C#, we had access to all of the .NET libraries, so we quickly got a socket connection up to our EmulatorController code. For each frame, we grabbed an observation from the game, sent this observation to the EmulatorController, and the controller would consult the Gyroscope AI, and sent the emulator back the action (buttons) that should be pressed in the next frame.

We now had a working method of running Street Fighter II as fast as the host machine could, of sending game observations to Gyroscope, and of getting back actions for which controller buttons to press. We also had the ability to synchronize two AI bots playing against each other. It was time to train!

Putting it all together: Training the AI

At the start of training the AI (here, playing Dhalsim) is picking random buttons.

With observations, actions, and rewards defined, along with the AI connected to SNES, we were ready. We trained our AI against the built-in game bot. We trained each character for around 8 hours or ~3000 matches. Our hypothesis was that a well-trained AI would (1) maximize reward, and (2) as a consequence, have a reasonably high win-percentage near the end of training.

Mid-way through 3000 games, Dhalsim is playing aggressively and winning 50% of the time

Because playing Street Fighter is an entirely novel use of our service, we assumed we would have to do some tuning — our AI doesn’t usually optimize for these sort of quick rewards nor control such extensive action spaces. Over the course of two fun weekends, we tried many variations of the observation space, action space, reward function, and DQN parameters until we had an AI with a high win percentage.