Brainstorming a GPT-2 Roguelike: Part 1

Feb 17 2020

AI Dungeon harnesses GPT-2 to generate novel experiences in the framework of a text-adventure game. Since the output of GPT-2 has no logical coherency beyond a few sentences, the play session quickly drifts in unpredictable directions. This is why the game is fun, but it also means there aren't clear goals, indications of progress, or win conditions.

Can we build a superstructure atop the output of GPT-2 to enable such features? In this post I will explore some of the terrain.

Let's start with something well-constrained. We'll use a block of text generated by GPT-2 to lay out the room of a roguelike game. It's a 2D space with setting and entities represented as ASCII characters. Our win condition is simple: if you can reach the exit tile, you win.

Now, how the hell do you build that?

Perhaps we could parse the GPT-2 output and extract the nouns. We detect the word "teacup" as a noun and imbue it with damage points. Of course the game has no idea that a teacup would be a terribly ineffective weapon.

How can we distinguish between nouns like "teacup" and "forest"? Clearly these have completely different purposes in-game. Teacup may be a weapon, but forest must be a setting.

Let's try an example. My handwritten prompt in is bold, while GPT-2 has generated the rest:

The dark forest is full of shadows and hate. Orcs patrol the narrow paths, eager for blood. Rusty weapons of past adventurers litter the ground. With a loud and vicious laugh, the elven rogue now casts a spell that hurls shards of broken chunks at the nearest enemy. It did not take long to find a scrap of bronze in the rubble. It could be all that was left.

Already we see GPT-2 taking things in a different direction.

We can pick out the nouns by hand, but if we're building a game we need to offload this task to the computer. Let's use the CMU Link Parser API to find the nouns.

We end up with this list:

Forest

Shadows

Hate

Orcs

Paths

Blood

Weapons

Adventurers

Ground

Laugh

Rogue

Spell

Shards

Chunks

Enemy

Scrap

Bronze

Rubble

Building our roguelike room from this list could go in all sorts of directions. In fact, there's no way we'd end up with anything like the original description (not that it was particularly coherent to begin with).

Now we need to categorize each noun. To keep things super simple, we'll only extract two kinds of nouns to build our roguelike room:

Enemies, which the player must defeat or dodge on their way to the exit. Artifacts such as weapons, armor, or consumables.

What’s our method for categorizing nouns? By hand? Surely not. There are tens of thousands of nouns in the English language, if not hundreds of thousands.

Luckily for us, linguistics has already mapped this territory. When we want to know a noun's category, we're looking for its hypernym (a noun can have several of these).

There's even an API! The WordNet database can provide us the hypernyms of a given noun.

Here's what we get:

Forest: group, object

Shadows: state, location, cognition, feeling, communication, attribute, person

Hate: feeling

Orcs: (not found)

Paths: act, artifact, location, object

Blood: body, group, attribute, person

Weapons: artifact, communication

Adventurers: person

Ground: object, motive, substance, relation, location, cognition, artifact

Laugh: communication

Rogue: person

Spell: state, time, communication

Shards: artifact

Chunks: group, quantity

Enemy: group, person

Scrap: object, substance, artifact, act

Bronze: substance, artifact

Rubble: substance

As a quick aside, notice that:

WordNet doesn't include "orc". It's an open question as to what happens when GPT-2 produces a word that the WordNet database doesn't contain. For now, we'll hard-code orc to have a hypernym of "person". (WordNet does have "goblin" though, with hypernym "person".)

"Forest" doesn't map to "location". "Jungle" does, though.

How we use WordNet's hypernyms will be hugely influential on the resulting game. For example, do we discard the "feeling" hypernym outright, or would it be more interesting to consider such nouns as enemies, spawning a "hate" entity for the player to defeat?

We'll start by extracting two hypernyms, "person" and "artifact":

Person: shadow, orc, blood, adventurer, rogue, enemy

Artifact: path, weapon, ground, shard, scrap, bronze

We have enough here to get the job done.

We'll use some number of Persons to populate the room with enemies. If you want add more complexity, you could make some of them neutral agents like merchants or mercenaries.

Artifacts will be scattered across the room. Each can be picked up by the player and placed in their inventory. Artifacts can be mapped to weapons, consumables, or wearables.

I'll stop here for now. There's a ton more to cover, and we could go in a few different directions:

Exploring the generation logic that takes the input of categorized nouns and produces a rendered, playable roguelike room. This includes challenges such as rendering different environments based on the "location" hypernym.

Returning to our source text. Can we further plumb its grammatical structure to produce higher quality input for our game generation logic?

Can we use GPT-2's text generation in other interesting ways? Right now we're only using it to produce novel rooms.

I will explore these topics in future posts.