Back in March 2019, I built a hackathon project called AI Dungeon. The project was a classic text adventure game, with a twist. The text of the story, and the potential actions you were presented, were all generated with machine learning:

The game was popular at the hackathon and with a small group of people online, but overall, was still a few steps away from what I envisioned.

For one thing, players could only choose from the options the game presented them. I wanted a truly open world, where players could write whatever they wanted. For another, the game veered into gibberish too quickly and easily for long playing sessions:

Unfortunately, there wasn’t an obvious fix to either of these problems. I built AI Dungeon on the then-biggest-available 355 million parameter version of GPT-2, and even though it was the most powerful model available, it simply wasn’t enough. After spending months tinkering and tweaking, I’d improved the game considerably, but was still running up against the same walls.

In November, however, OpenAI released the full 1.5 billion parameter GPT-2 model, and opened the door to a new version of the game that was much closer to my original idea. One month later, I released AI Dungeon 2, a truly open world text adventure built on the full GPT-2 model:

Within the game, you can do anything. Start a rock band of skeletons? Eat the moon? Install Windows 10? It’s all possible.

The response was fantastic. We hit the top of Hacker News, a few popular gamers posted video play throughs, and Twitter was full of screenshots of ML-generated adventure. Within a week we had 100,000 players and over half a million play-throughs.

The unexpected downside to this sudden surge of attention, however, was the cost.

How to spend $50,000 on GCP

When I first released AI Dungeon 2, it wasn’t a hosted app. Instead, it was a Google Colab notebook that users would copy and run, which would download the AI Dungeon model and install the game’s interface.

This approach made sense for a couple of reasons. First, Colab is free, which makes it a nice platform for a side project. Second, Google backs each Colab notebook with a free GPU instance, which was necessary for running the 5 GB model.

The first problem we hit was that our model barely fit on the GPU instance. If for some reason your playthrough called for a little extra memory, the whole game would likely crash.

The second — and more existential — problem was the bill.

I’d chosen Colab because of its cost-effectiveness, but what I hadn’t accounted for was data egress charges. With every Colab notebook needing to download a 5 GB model, and with users being in a variety of regions, each download was costing between $0.30 and $0.40.

When the daily bill hit $2,000 per day, BYU’s Perception, Control, and Cognition Lab (PCCL) was kind enough to handle the charges. When the cost hit $7,000, they were fine with it. At $15,000, they started to get nervous. At $20,000, we all agreed we needed to do something. At $30,000, they prepared to pull the plug.

By the time all was said and done, the total bill had reached $50,000 in three days.

Deploying GPT-2 at scale — without going broke

Within 12 hours of PCCL shutting down AI Dungeon 2, our community had hacked together a peer-to-peer solution for sharing the model via torrent, which meant the game was back up with no egress charges. (Note: This is just one of the amazing ways our community has sustained and improved AI Dungeon 2. More on that later.)

This, however, was clearly a temporary measure. The vision for AI Dungeon 2 was for it to be a game anyone could play, not just those with the tech savvy to run a Colab notebook. In order to do this, we needed to release the game as a real app.

To build a full AI Dungeon 2 app, our model needed to be deployed as a backend web service. You can imagine it as a “predict API” that our app can query with user input to generate the next stage of the story. This pattern should be familiar to anyone who has worked with microservices before.

The question is, how do you build a microservice out of an ML model?

As it turns out, there’s an open source tool that automates it, called Cortex. At a high level, Cortex:

Wraps your model in an API and containerizes it

Deploys your model to the cloud, exposing your API as an HTTP endpoint

Auto scales your instances to handle traffic fluctuations

Instead of rolling our own infrastructure using Flask, Docker, Kubernetes, and a mess of AWS services, we were able to consolidate and automate our infrastructure.

This architecture allowed us to host our model as a backend for the web and mobile apps, opening up the game to players who couldn’t use Colab. It also, however, required several optimizations in order for us to make it affordable.

First, we needed to configure aggressive auto scaling. We are billed for every minute an instance is running, and in order to serve many parallel users, we need to launch many instances. To make the most of our spend, we need to spin up the minimum amount of instances needed at any given moment, and quickly take down any unnecessary instances.

Second, we needed to select the optimal instance types. That means figuring out exactly how big our instance needs to be to host the model effectively, and making use of spot instances — unused instances that cloud providers sell at a steep discount.

After some tinkering, we were able to make our Cortex deployments roughly 90% more cost effectively than our previous Colab setup. Within two weeks, our server count had peaked at 715, and we’d supported over 100,000 players. Six weeks later, we’ve passed 1,000,000 users and 6,000,000 unique stories told.

Scaling AI Dungeon has been a community effort

At each stage of development, the community has been key to unlocking our next stage of scale.

The most obvious example is the people who play AI Dungeon 2. Without them, there wouldn’t be any scale to speak of. Beyond just our players, however, we’ve had help from community members like:

BYU PCCL paying our initial GCP bill

The users who brought AI Dungeon 2 back online via torrent within 12 hours of it shutting down

Braydon Batungbacal, who volunteered to build the iOS and Android apps

The Patreon supporters who continue to fund AI Dungeon’s development

Open source projects like Cortex that have worked to support AI Dungeon

As we continue to develop AI Dungeon — and potentially, a bigger platform for other ML-driven games — our community will no doubt continue to be a driving force behind the decisions we make and our ability to execute them.

Thank you to everyone whose been involved so far, and stay tuned for what’s coming next.