The Challenge

Last year I started working on a little text adventure game for a 48-hour game jam called Ludum Dare. I take part in it a few times a year and even did the keynote once. While I was able to build a simple text adventure game engine in a day, I started losing steam when it came to creating the content to make it interesting.

My unfinished Ludum Dare text adventure game “Terminal” built with Pixel Vision 8.

Fast forward 6 months, plus a career change into machine learning, and I became interested in seeing if I could train a neural network to generate a backstory for my unfinished text adventure game. My goal was to help augment the randomly generated room descriptions with a bit of narrative from an AI gone rogue. I based the game on one of my favorite first-person shooters (FPS) from the 90s, Marathon.

Marathon 2 screenshot.

Marathon was created by Bungie, well before it released Halo and Destiny. In Marathon, there were three AIs and the main one, Durandal, became “rampant,” which is a fancy term for going crazy. Unlike similar FPS games of the time such as Doom, Marathon had a rich story you could read through terminals scattered throughout the levels. These terminals not only instructed you of tasks but also deepened the plot as you progressed through the game.

Marathon terminal via MarathonWiki.

Now there are lots of great examples of neural networks that have been trained to create text-based content such as song lyrics, Shakespeare poems, and more. Since Marathon contained so much text in the terminals across its three games, I thought it would be an excellent candidate to use an open-source project called textgenrnn, with the help of some tools I use to automate my deep learning workflow, to see what I could create. Plus, since the AI in the game goes crazy, I hoped that even if the generated text didn’t make sense, it would still fit into the theme of the text adventure game I was creating.

An example of garbled text from a crazy AI.

The following is an account of my experimentation, the results, and notes on how to run the project on your own from my git repo here.

Creating the Dataset

Luckily, Marathon’s story is well documented online here. If you haven’t played the game before, it’s worth checking out. You can run a more modern version of it called Alpha One here. With that in mind, to train a textgenrnn model, I had to create a dataset from scratch. There wasn’t an easy way to automate this, so I went through all of the game terminals from the site and copied them into a text file by hand.

You can train textgenrnn with a small set of text but the more you give it, the better it should learn. I tried a few different variations on formatting the game text, but in the end, I had to go in by hand and remove text blocks that negatively impact the training.

Example of a bad text block that could impact the training.

Even terminal text like this, which looks like random characters placed together, actually contains a story. In its current format, the text would throw off the training since we are going to analyze individual words which require spaces between them. In the end, to create a cleaner dataset, I decided to skip these kinds of text blocks. Because this was rather time-consuming, I ended up using the first game’s story. Here is a link to the source file I trained on in the git project.

Setting Up the Project

I began by cloning textgenrnn from GitHub and opening it up in PyCharm. Once I had the project opened, I created a new Python interpreter and installed the requirements defined in the setup.py script. You’ll also want to install Tensorflow which wasn’t in the list of dependencies. I ended up creating a new requirements.txt with the following:

A list of requirements needed to run the project.

The project is well documented and comes with some examples, but I chose to delete them and start with a clean project folder. I removed all the content from the datasets, output, and weights folders since I would generate those with the new dataset. Then I removed the setup.py file since I no longer needed it.

The new project folder after cleaning it up.

The only thing left to do was add my new Marathon terminal text dataset to the appropriate folder and begin creating the scripts to run the training. Before I could do that, I needed to create a config file I could share between my training and text generating scripts. To do this, I created a config.py script to the root of the project with the following code:

I stored all the configuration values in their own script file.

While you can train a model or generate text by passing these values in directly, I find it helped to keep this external so I could tweak it a bit easier while training. With the new config file ready, it was time to create the scripts I needed to train and generate out text.

With all of the configuration values in their own file, I was able to create a simple train.py script to run the actual training. Here is what the code looks like:

This is the code I’m using to train the model.

At a high level, I create a new instance of textgenrnn, set up the path to the output folder, and finally call the train function and supply the arguments it needs. At this point, you can run the script and monitor some of the early output from the terminal window:

Running through the first round of training.

At around the 10th epoch, the text is getting better, but we won’t see the actual results until we generate more substantial amounts of text from the trained model in the outputs folder.

After training the model and it’s files are saved to the outputs folder.

To generate more text, I created a generate.py script with the following code:

This code allows me to generate out text using the trained model.

As you can see, I set up textgenrnn by supplying it with the three files created during training: weights, vocab, and config. From there, we tell textgenrnn to create a new file using an array of temperature values, the number of steps, and the maximum length of the generated text. After running this script, it generates a new text file in the outputs folder with the generated text:

A sample of text generated from the model.

Now that I had everything I needed to begin training and validating the quality of the generated text. Let’s take a look at what it produced.

Generated Text Results

At this point, it’s relatively easy to see what kind of results we can expect. Here are a few samples I began experimenting with.

1 epoch, RNN size of 128 and 2 RNN layers produced the following:

vacuum

- with

you

* *

incoming * *

-

<

transfer

,

strength

gives

area of

a

- ,

to

*

public

. oppressive *

~

<

* * * * * * * * jump * * * * * * * * * * * pad * * *

durandal

message

*

At first, it wasn’t creating anything interesting. I expected that with only a single epoch.

Here is what happened when I increased the training to 10 epochs with the same settings as above:

* * * incoming message from * * * * * * incoming message * * * < 39 - < 46 . . . 299 > * * * end of message * * * *

he , , and crist the from closing his , and closing five was the of arms . on the discarding elevators century

* * * end when message * * * *

< security breached - excerpts > f terminal . . 17 . . . 198 > * * * end of message * * *

< security breached - - < 33 . 12 . . . 128 >

for the the of may fourth * * * end message * * *

teleport when it is is is is . . . i am am able that to to to my sensory eyes the you you you your your a " see philosophical may your your a21h . . . . " . . 223 >

Things are starting to get a bit more interesting. The model can generate full sentences, it learned that messages usually begin and end with some form of “incoming/end message” and it can create some of the fake IP addresses associated with specific types of terminal messages. It’s also good at beginning and ending sections of text with quotes, but the text doesn’t make a whole lot of sense.

Finally, I wanted to see if more epochs would help, so I increased the number to 40 and kept the same settings as a baseline. Here is what it generated:

you have not completed your mission . you may be be not the such here . but i ' ll want to to to friend . if you must into the this to of transmission it would be be a fighter day . . . fighter . held symbolic significance for the . the time that time that had

martian skies

however , due to to the marathon ' s teleporters . . < unauthorized access - alarm 2521 - >

be careful . everything not not not as to , and or , and nearly active to your your maintenance maintenance . .

the pfhor seem to have enslaved a number of other races : : races as of have they been been been been off by compilers for for and alien . the invaders seem to to be interested in the marathon . gheritt s something something one as from about he ' s ' s ' pht , . even hard . . to the was was . rat the the the the the crist shuttle . has , and and at the just than the the after the the

Now we are starting to get complete sentences, which appear to make more sense in paragraphs. The challenge, of course, is that at this point, I was stabbing in the dark trying to understand which values help train better models. Unfortunately, there was a lot of trial and error until I started finding values I felt were producing better results.

To help monitor the training, I wanted to add MissingLink.ai’s SDK to the project to monitor my experiments. Full disclosure, I work for Missinglink, and since I had a vested interest in completing my text adventure game, I became interested in seeing how I could optimize the model during training to identify what values made a difference.

So I created a new missinglink branch and added a few lines of code to the project’s textgenrnn.py script starting with importing the SDK, configuring a project to run the experiment in, and defining the Keras callback:

Importing the MissingLink SDK and creating a Keras callback.

Also, I needed to feed the new callback into the model_t.fit_generator() method:

Adding the MissingLink callback.

With these modifications, I was now able to run the experiment and monitor the results on the MissingLink dashboard. I started with my last experiment at 40 epochs as a baseline before moving onto the next round of training.

Monitoring the loss during training over 40 epochs.

After trying a few different settings, I finally settled on 4 RNN layers, bidirectional set to true and 100 epochs. I also increased the train size to .9 which yielded the following loss:

Monitoring the loss during training over 100 epochs.

Once I was happy with the improved model, I was able to generate better text. Let’s take a look at the final results.

The End Results

After a lot of experimenting, I was able to come up with some blocks of text that felt coherent enough to pass as the ramblings of a crazy AI.

Example A: From Marathon

Gheritt had a good life, so much time, so much time. He had loved swimming, turning, beating. He had loved the tingle in his hands and feet, his inability to kill his nemesis. Once he had fallen down the stairs, and just for a moment, his hands came to rest on the carpet of the stairs. In that instant, his body had frozen, floating over the stairs, safe from falling, but the moment didn’t last. The ocean crashed about him, his hands torn free from the sandy bottom, his body flipping, falling.

Example B: Generated

gheritt had a good life , so much on that will be killed . i saw the make one in a wicked computer — and we must don ‘ t remember them any time you .

i have received a preliminary report from some members of them . the only computer system . “ it was quite simple . “ a white thought syndrome suffering if the uesc ‘ s ‘ pht even stopped their efforts .

I was surprised at how close this was getting. I think with more data, editing the grammar, and a smart integration into the game, this is probably a viable solution for generating some of the background story content. The goal would be to blend randomly generated text, neural network generated text, and handwritten text into a more seamless experience. With that in mind, I plan on integrating this into the game and mixing it with more task-oriented text that outlines objectives, similar to how Marathon terminals work in the original game.

I’d say this little experiment was a success. I need to spend a little more time building a better dataset that includes both Marathon 2 & 3’s terminals. I also want to see if I can eventually get a coherent story out of the model. There is still tweaking to do between the config file and the dataset. However, even after these early experiments, I think the results are promising. Finally, it’s worth mentioning that textgenrnn is an old project. There have been even more advancements in text generation such as OpenAI’s GPT-2. I could see myself continuing to experiment with training a neural network to generate text for more of my procedurally generated games.

If you’d like to know more about this project and my work at MissingLink.ai, feel free to leave a comment below or follow me on Twitter and send me a message.