I decided some time ago to create a text-based Skyrim. That sounds overly ambitious at first, but as I developed the story and the game’s mechanics, I discovered its basic elements: a sword & sorcery game in a living, simulated world that is presented as a Choose Your Own Adventure (CYOA) book.

The idea has already brought a short game into the world, Insignificant Little Vermin, which I submitted to this year’s IFCOMP. In this article, I’ll walk you through the process of building that game and talk about what I learned from watching people play it (on Twitch).

Why Skyrim?

Don’t get too hung up on the fact that I’m referring to Skyrim here. Skyrim, for me, is just a sufficiently well-known fantasy open-world videogame that we can use as an example.

Sufficiently well-known because I want people to immediately have an image in mind. Fantasy because the combat style in fantasy games is especially well-suited for our needs (more on that later). Open-world because these types of games are a great form of escapism, and therefore a good fit for our needs (more on that later). Videogame because I put emphasis on the play/flow aspect more than on the interactive story (again, more on that later).

The important thing is that we simulate a world that the player can explore. This world is inhabited by actors (monsters, NPCs) and entities (swords, doors, chests, etc.) that the player can interact with. In Skyrim, the world is rendered in 3D, and the player gives low-level commands such as “move forward” or “use weapon in left hand”.

The goal of the simulation is to entertain the player. That is to say, I’m not using the word “simulation” to suggest that Skyrim tries to “imitate real-world processes” as closely as possible. To me, every videogame that has a steady update loop is a simulation of some sort — even if it’s a 2D simulation of an Italian plumber’s journey to save a princess.

First try, a deliberately naive start

The most naive way of porting Skyrim to a text adventure would be to take the world simulation as it exists in the 3D game and play it one frame at a time, describing what’s going on and asking the player for low level input. It would look something like this:

Outside of Whiterun […] You stand on coordinates {351.0, 211.9}, facing NNE. Your sword is 12% midswing. You see a bandit on coordinates {351.1, 210.8}, facing SSW. His axe is 78% midswing and it missed. push up arrow

> push down arrow

push left arrow

push right arrow You take a step back to coordinates {354.9, 212.5}. […]

As you can imagine, the gameplay and story experience here would absolutely suck. And no matter how you tweak the output, the input, or the length of each frame represented in text, it would still suck.

Second try

One way to solve this is by increasing the level of abstraction. We still use Skyrim’s world simulation, but only on the highest level — the map, distances between objects and places, the location of actors, NPCs, where to spawn monsters, etc. That gives us something like this:

Outside of Whiterun You arrive at Whiterun from the south. There’s a solitary bandit waiting for you, axe in hand and ready to fight. > kill bandit

run You raise your sword and charge the bandit. <Insert interesting description of a fight.> That last swing connects with the bandit’s throat and he falls to the ground. As you kneel next to him and try taking his gold, an arrow slams into the ground less than an inch from your knee. You jump up and catch sight of an archer on a knoll nearby. > kill archer

search bandit

run <Insert another interesting fight description.>

Now, the gameplay here is a bit better, and this kind of experience could even be fun. The player could go to the different places in the simulated world, talk to people, quest for them, and experience the Skyrim storyline.

But let’s face it, combat is a big part of what makes Skyrim fun. So “kill bandit, kill archer” doesn’t quite cut it. At best, with really interesting descriptions of the fights, you’ll have a longer version of something like Conan Kill Everything. More likely, though, the player will get bored after the first few fights and will quit.

That brings us to an important question: What makes combat in Skyrim fun, anyway? We do the same things over and over again and each battle is largely similar. How do we not get bored with them after the first dungeon?

The answer is twofold:

You have a high level of agency during the fights. At any time, you can advance, retreat, attack, defend, change weapons, jump, crouch, flank, climb a rock, use magic, etc. This means you can get better at fighting as a player. You have to fight. Therefore, you can experience, well, fun (as it’s defined in books like Theory of Fun for Game Design). Fights are highly unpredictable. You never know how each combat will turn out. You can do battle with the same group of enemies three times in a row and the course of the fight will be different each time. This means you get variable rewards — an important ingredient for getting you “hooked” on the game.

Skyrim can throw similar combat at you over and over again, and not only does it not become boring, it’s great entertainment! This is not despite the fact that it’s repetitive, it is because of the cocktail of repetition, agency, and unpredictability.

Therein lies the fundamental challenge of trying to marry videogames and text.

Good videogames always have some repetition. All gaming from Pong through Tetris to Portal to Skyrim is based on repeatedly throwing a similar problem at you, the player, so that you can get better at solving it.

some repetition. All gaming from Pong through Tetris to Portal to Skyrim is based on repeatedly throwing a similar problem at you, the player, so that you can get better at solving it. Good text cannot have too much repetition. Repetitive text is boring. Try writing up everything that happens in the first five minutes of some Super Mario gameplay with prose that is fun to read. It’s impossible.

Text games normally solve this in two ways: they either stay away from repetition as much as possible (which makes the game more intellectual — all situations and solutions are unique) or they break up repetitive gameplay into a bunch of minigames. For great examples of the former approach, see almost any of the top traditional interactive fiction. For an example of the latter, see the brilliant Sorcery! series.

Third try

Since we already understand that Skyrim’s gameplay is largely centered around combat, let’s try the minigame approach (and not the more intellectual, non-repetitive one).

Outside of Whiterun You arrive at Whiterun from the south. There’s a solitary bandit waiting for you, axe in hand and ready to fight. > kill bandit

run <A graphical combat minigame.> The bandit keels over and dies.

The minigame can have some text in it, but the mechanics are not tied to the text. In Sorcery!, the combat minigame consists of a series of decisions about how hard to hit (on a scale of total defense to all-out attack). But there are no limits to the design of the minigames we can use. It can be a simple card game, a match 3 puzzle, etc.

This works. I, for one, would buy a mobile version of Skyrim done in the same fashion as Sorcery!.

But I also think we can do better. Much of my enjoyment of Skyrim stems from the fact that it’s an open world where anything can happen almost anywhere. There’s no switching between “exploration mode” and “combat mode”. An inn where you’re talking to an NPC can become a point of tactical cover moments later. You can maneuver around a patrol and hit them with something nasty at long range from the relative safety of a cliff. And so on.

None of this translates well into the text-with-minigame solution. So let’s go further.

Fourth try

We’re going to tweak the level of abstraction yet again. Going frame-by-frame in our naive start was obviously the wrong move. And going with “kill bandit” obviously made the level of abstraction too high, no matter whether the fight was described in text or represented through a minigame.

Let’s descend just a little bit from “kill bandit” into a tactics-based approach. Something like this:

Outside of Whiterun You arrive at Whiterun from the south. There’s a solitary bandit waiting for you, axe in hand and ready to fight. > kill bandit

run How exactly do you want to go about killing the bandit? … with sword

> … with bow and arrow You quickly string the bow and let it fly. The arrow whirls just past the bandit’s ear. <Rest of fight description.>

This approach creates more options than just “kill”, but after a while it’s not much fun. Each player gravitates toward a certain style of play (stealth, long range, melee, magic, etc.) and therefore there’s really not much choice involved. For an archer, the best tactic will almost always be “bow and arrow” or “sneak around”. Another player will see very different combat sequences, but that doesn’t necessarily make the experience fun.

So we’ll need to offer players meaningful choices on a lower level.

Fifth try

Alright, fine. Let’s go with action-by-action level combat, D&D-style.

Outside of Whiterun You arrive at Whiterun from the south. There’s a solitary bandit waiting for you, axe in hand and ready to fight. You unsheathe your sword and approach. > thrust sword

swing sword Your blade moves fast but misses, just left of the bandit’s chest. The bandit swings the axe and lightly cleaves your leather jerkin. thrust sword

> swing sword The sword cuts the bandit’s thigh and he yells in pain. He tries to strike at you from above, but the axe goes wide. <Rest of fight.>

This starts to look like the player has actual agency. There are many possible moves that fit different playing styles (a swordsman can thrust, swing, tackle, decapitate; an archer can target different parts of the body) and those moves can be combined in a myriad of different ways. Players can develop their own tactics!

But there are issues with this approach.

The prose ends up being very mechanical. It’s not as bad as our first naive approach, but it’s still something a real author would never write. “You do X, opponent does Y.” Repeat. That gets boring quickly.

There is no easy way to deal with reactions. A strike with a sword either hits or misses. Sure, the receiving side can anticipate the strike and can be defending at that time (“assume defensive stance” or “raise shield”), but players generally hate doing that. It means trading your opportunity to strike (awesome) for maybe taking less damage (meh). It might be the rational choice in some situations, but it definitely doesn’t feel like an adventure.

So now it looks like we’re screwed. We covered the whole spectrum of abstraction and nothing quite makes the game we want work.

This is where I found myself after tinkering with the game for 5 years. I almost gave up, thinking that I proved — for myself at least — that there is no way to build a Skyrim-like open world game in text. Not even in theory.

I decided to take a break from this “Skyrim in text” project. Instead, I started building something completely different. The game was still text-based, but gameplay was more procedural and psychological. It was about survival in a group.

And during that project, I realized that text stories are fractal. Unlinke in videogames (which need a steady update method), in text we don’t actually need to stick to a single level of abstraction.

Sixth and final try

Let me show you what I mean:

Outside of Whiterun You arrive at Whiterun from the south. There’s a solitary bandit waiting for you, axe in hand and ready to fight. > charge at the bandit

use bow and arrow

run You raise your sword and dash at the bandit. He takes a few quick steps back and when you come in range, he swiftly swings at you. jump back

> dodge

block You duck and the bandit’s sword swings an inch above your head. This briefly exposes the bandit’s side, which has no armor. > thrust sword

punch flank

ignore <Rest of fight.>

Notice how the “charge” command is a tactical one — it’s on a higher level of abstraction than slash or thrust. Then the bandit’s first swing is on the action-by-action level. But then, before the swing even finishes, we go one level deeper, and the player can react to that situation by dodging (or blocking or jumping back). And since that succeeds, we stay at that low level, and the player can insert a counter-thrust during the bandit’s dodged swing.

Now the fight includes many choices on all levels, from tactical positioning to split-second actions. Repetition is then more fun to the player, because the same action can be performed in vastly different contexts.

The resulting prose reads like something a human would write. It’s not just two people exchanging damage, it’s an actual swordfight with dynamic potential outcomes every step of the way. And gameplay flows through different levels of abstraction, like a book would.

Caveats

So there you have it: Insignificant Little Vermin, my submission to IFCOMP, is working with the concept of fractal stories in an open-world simulation to produce a relatively fluid, readable text adventure. Some players have even assumed the text is not procedurally generated in parts where I know it is. More importantly, though, players tend to have a lot of fun.

It’s not all fireworks and applause, though, of course. The short game made it clear to me what the limitations of such an approach are, and just how much more work there is to be done before I accomplish my vision of a completely open world rendered in text.

First of all, this game is really just the first stab in this direction. I discovered many of the insights above while developing Vermin. There is no playbook, no best practices, no industry events about fractal stories or “Skyrims in text”. There is, thankfully, an abundance of literature about interactive fiction, though most of it focuses on a very different aspect of fiction than what I’m after.

For one, there’s much less content in Vermin than I’d like it to have. It’s nowhere near an AAA game (like Skyrim) in scope. It currently only offers limited RPG progression (player character doesn’t learn new stuff, for example), and the combat could use more variability. Also, there is no mapping.

That’s all quite easily solved by putting more work in on this project, though.

What I find more interesting are the limitations that won’t go away so easily. The ones that stem from the nature of the system, and from text in general.

The need for clarity and (surface) simplicity

In text, it is very easy to give too much information. A completely chaotic engagement is fine when it develops in 3D graphics, but unreadable when it’s rendered in prose. A swordfight with 10 actors must be broken down or abstracted — a possible solution being something like Level of Detail (LoD) or Limelight, where the actions of actors who are distant from the player are either ignored or described only in broad strokes.

In text, it’s hard (and boring) to explain spatial relations. Read any sword and sorcery novel and you’ll see how Conan/Fafhrd/Elric always swings at the pirate/swordsman/lich but never “forward and to the left”. Position is always described only in the most general way. No matter how clear you think you are, you’ll never beat 2D or 3D graphics in terms of quickly letting the player know where everyone is and what they’re doing.

In text, you have to think about names, categories, and pronouns, too. It actually helps to have a variety of genders at play at any given time, because then it’s easier to use pronouns to let the player know who we’re focusing on. For example, when you have a scene with a male thief, a female warrior, and a spider, your natural language generation algorithm will get by with “he”, “she” and “it” for the most part. Which is great. Contrast that with a scene that contains 3 male orc warriors and you’ll probably have a lot of sentences like “The orc on the left makes a move.” Which is boring and difficult to read.

In text, it’s harder to work with a myriad of characters’ states. You can’t expect the player to remember that, for example, the orc has his arm extended to the left. Fortunately, people have imagination (and love to use it!), so your underlying simulation can be complex — you just need to make sure to only foreground the most important things, and let the reader get the rest from subtext.

The need for a casual feel

In text, you can’t have your mechanics too stats-based or competitive. If you go that path, sooner or later, your game becomes a spreadsheet with some text sprinkled on top to provide a narrative. Not that this is a bad thing — I like these kinds of games (see Seedship). It’s just that this is not what we’re going for with fractal stories.

Procedural content isn’t a silver bullet

In text, combat (and procedural content in general) works much better when intermixed with something else. Context, character, and narrative development are part of what makes a game fun. In Vermin, all fights have at least some flavor text (opponents talking to each other) and so do the sectiones in which you’re roaming the world. You can’t just defer to procedural generation — you should still have a lot of content (even if that content is, in the end, used procedurally). This follows the general rule, though, that procedural content is not for “lazy developers”. It’s not there to free you from creating content. It’s there to let content react to the player. You still need a lot of it.

Writing is hard

It’s easy to come up with “You hit orc for 15 HP”. It’s much harder to come up with many versions of “your blade misses the orc”.

Natural language generation is hard

It’s easier than modern 3D programming (at least in terms of person-hours), but not by much. Human language is rich and full of pitfalls. Fortunately, we can choose a subset of natural language that is quite easy to do (specifically, present-tense action). Still, for something like this to work, you can’t expect someone who is primarily a writer to successfully program the game. This is not the next Twine or Inform. This is for a development team, not a single author.

Not for just any genre

Lastly, not all types of content are a good fit for fractal stories. Or, at the very least, some genres are much better suited for this approach. Sword & sorcery is great because it’s:

A lot of action described straightforwardly, in present tense. Immediate, visceral conflict. Reactive (counter a slash by blocking it). Physical exploration.

Contrast this to something like Portal, which features actions that are hard to imagine occuring in spaces that are hard to describe, involving lots of skill on the player’s end. But we don’t need to go as far as 3D puzzlers to find subject matter that won’t work. Even other combat-oriented open world games are significantly harder to “port to text” once they involve relatively realistic modern-day shooting instead of fantasy combat. Maybe it’s just my lack of imagination (and almost no experience with literature involving gun fights), but I can’t see a good combat mechanic that would work with the immediacy of firearms.

To end on a high note

So I’ve spent quite a lot of time here explaining the problems of fractal stories. Let’s end on a high note. I want to explain why I’m so enthusiastic about this.

The form factor that Insignificant Little Vermin ended up with is a great fit for casual play, especially on mobile. It’s not taxing in terms of skill. It can be consumed in short bites of gameplay, or as a whole. It provides an element of chance. It lets you read a story that is unique to you, and lets you explore a world at your own pace. It doesn’t require you to squint to see what’s going on on the screen — it’s just text and a few static paintings.

In short, it’s something that you can read & play while waiting on a bus.

I really, earnestly believe that this could be transformational for many indie game developers (some of whom are not even thinking of themselves as game developers yet). As I said before, this is not a system for a single writer-author-developer. Compared to Twine, the barrier to entry is very high. But compared to even the simplest 2D platformer, it’s relatively low.

You can move you developers from graphics programming and 60fps performance optimization to core mechanics, simulation, and natural language generation. Remember, you don’t have a 16ms frame budget anymore, so your simulation and AI can take its time.

Your artists can spend less time filling in the details of the environment (grass here, wall texture there) and more time imagining interesting worlds and characters. They are also unrestricted by capabilities of the rendering engine. If they can describe it, they can use it. An army of 500 thousand dwarves streaming down a valley? Sure. A Ringworld-type structure where you can see terrain millions of kilometers away? Go for it. A Lovecraftian/Escheresque/non-Euclidean architecture? No problem.

Designers are less bound to a single level of detail. Instead, they can let the player be both a general and a soldier in one game, and it will never be a problem. There’s so much less you need to implement with text than there is with graphics. Adding any gameplay mechanic is suddenly much cheaper.

A small team can create something that meets Skyrim in scale.

I can’t wait to see where this approach to game development goes. I’ll be updating this blog and my Twitter account with my progress. If you want less noise, you can subscribe to the mailing list that I created to keep you updated on key moments.

And if you missed the link, here’s Insignificant Little Vermin and here’s its submission at IFCOMP.