Through the years, I've spent what might seem to some people an inordinate amount of time cleaning up and preserving ancient software. My Retrocomputing Museum page archives any number of computer languages and games that might seem utterly obsolete.

I preserve this material because I think there are very good reasons to care about it. Sometimes these old designs reveal unexpected artistry, surprising approaches that can help us break free of assumptions and limits we didn't know we were carrying.

But just as important, cultures understand themselves through their history and their artifacts, and this is no less true of programming cultures than of any other kind. If you're a computer hacker, great works of heirloom software are your heritage as surely as Old Master paintings are a visual artist's; knowing about them enriches you and helps solidify your relationship to your craft.

For exactly re-creating historical computing experiences, not much can beat running the original binary executables on a software emulator for their host hardware. There are small but flourishing groups of re-creationists who do that sort of thing for dozens of different historical computers.

But that's not what I'm here to write about today, because I don't find that kind of museumization very interesting. It doesn't typically yield deep insight into the old code, nor into the thinking of its designers. For that—to have the experience parallel to appreciating an Old Master painting fully—you need not just a running program but source code you can read.

Therefore, I've always been more interested in forward-porting heirloom source code so it can be run and studied in modern environments. I don't necessarily even consider it vital to retain the original language of implementation; the important goals, in my view, are 1) to preserve the original design in a way that makes it possible to study that design as a work of craft and art, and 2) to replicate as nearly as possible the UI of the original so casual explorers not interested in dipping into source code can at least get a feel for the experiences had by its original users.

Now I'll get specific and talk about Colossal Cave Adventure.

This game, still known as ADVENT to many of its fans because it was written on an operating system that supported only monocase filenames at most six characters long, is one of the great early classics of software. Written in 1976–77, it was the very first text adventure game. It's also the direct ancestor of every rogue-like dungeon simulation, and through those the indirect ancestor of a pretty large percentage of the games being written even today.

If you're of a certain age, the following opening sequence will bring back some fond memories:

Welcome to Adventure!! Would you like instructions? > n You are standing at the end of a road before a small brick building. Around you is a forest. A small stream flows out of the building and down a gully. > in You are inside a building, a well house for a large spring. There are some keys on the ground here. There is a shiny brass lamp nearby. There is food here. There is a bottle of water here. >

From this beginning, the game develops with a wry, quirky, humorous and somewhat surrealistic style—a mode that strongly influenced the folk culture of computer hackers that would later evolve into today's Open Source movement.

For a work of art that was the first of its genre, ADVENT's style seems in retrospect startlingly mature. The authors weren't fumbling for an idiom that would later be greatly improved by later artists more sure of themselves; instead, they achieved a consistent (and, at the time, unique) style that would be closely emulated by pretty much everyone who followed them in text adventures, and not much improved on as style even though the technology of the game engines improved by leaps and bounds, and the range of subjects greatly widened.

ADVENT was artistically innovative—and with an architecture ahead of its time as well. Though the possibility had been glimpsed in research languages (notably LISP) as much as a decade earlier, ADVENT is one of the earliest programs still surviving to be organized as a complex, declaratively specified data structure walked by a much simpler state machine. This is a design style that is underutilized even today.

The continuing relevance of ADVENT's actual concrete source code, on the other hand, is quite a different matter. The implementation aged much more rapidly—and badly—than the architecture, the game or its prose.

ADVENT was originally written under TOPS-10, a long-defunct operating system for the DEC PDP-10 minicomputer. The source for the original version still exists (you can find it and other related resources at the Interactive Fiction Archive, but it tends to defeat attempts to appreciate it as a work of programming art because it's written in an archaic dialect of FORTRAN with (by actual count) more than 350 gotos in its 2.4KLOC of source code.

Preserving that original FORTRAN is, therefore, good for establishing provenance (as historians think about these things) but doesn't do a whole lot for that I've suggested as the cultural purposes of keeping these artifacts around. For that, a faithful translation into a more modern language would be far more useful.

As it happens, Don Woods' 1977 version of ADVENT was translated into C less than two years after it was written. You can still play it—and read the code—as part of the BSD Games package. Alas, while that translation is serviceable for building and running the program, it's not so great for reading. It is less impenetrable than the FORTRAN, but was not moved fully to idiomatic C and reads a bit strangely to a modern eye. (To be fair to the translators, the C language was still in its childhood in 1977, and its modern idioms weren't all that well developed yet.)

Thus, there are still a forbidding number of gotos in the BSD translation. Lots of information is passed around through shared globals in a way that was typical in FORTRAN but was questionable style in C even then. The BSD C code is full of mystery constants inherited from the ancestral FORTRAN source. And there is a serious comprehensibility problem around the custom text database that both the original FORTRAN and BSD C versions used—a problem I'll return to later in this article.

Through the late 1970s and early 1980s a lot of people wrote extensions of ADVENT, adding more rooms and treasures. The history of those variants is complicated and difficult to track. Almost lost in the hubbub was that the original authors—Will Crowther and Don Woods—continued to revise their game themselves. The last mainline version—the last release by Don Woods—was Adventure 2.5 in 1995.

I found Adventure 2.5 in the Interactive Fiction Archive in late 2016. Two things caught my attention about it. First, I had not previously known that Crowther and Woods themselves had shipped a version so extended from the famous original. Second—and unlike the early BSD port—there was nothing resembling what we'd expect in a modern source release to go with the bare code and the Makefile. No manual page. No licensing statement.

Furthermore, the 2.5 code was deeply ugly. It was C, but in worse shape than the BSD port. The comments actually included an apology from Don Woods explaining that it had been mechanically lifted from FORTRAN by a homebrew translator of his own devising—and apologizing for the bad style.

Nevertheless, I saw a possibility—and I wrote Don asking his permission to ship a cleaned-up version under a true open-source license. The reply was some time in coming, but Don not only granted permission speaking for both himself and Will Crowther, he also actively encouraged me to do this thing.

Now a reminder about what I think the goals of heritage preservation ought to be: I felt it was essential that the cleaned-up version should at no point break functional compatibility with what we got from Woods and Crowther. Therefore, the very first thing I did after getting the heirloom source to build clean was add the ability for it to capture command logs for regression testing.

When you do a restoration like this, it's not enough merely to make a best effort to preserve original behavior. You ought to be able to prove you have done so. Best practice, then is to start by building a really comprehensive set of regression tests. And that's what I did.

What we did, I should say. The project quickly attracted collaborators—most notably Jason Ninneman. The first of Jason's several good ideas was to use coverage-analysis tools to identify gaps in the test suite. Later, Petr Vorpaev, Peje Nilsson and Aaron Traas joined in. By about a month from starting, we could show more than 95% test coverage. And, of course, we ran retrospective testing with the newest version of the test suite on the earliest version we could make read the logs.

That kind of really good test coverage frees your hands. It allowed us to make rapid progress on the other prime goal, which was to turn the obfuscated source we started with into a readable work of art that fully revealed the design intentions and astonishing cleverness of the original.

So, all the cryptic magic numbers had to go. The goto-laden spaghetti code had to be restructured into something Don Woods in 2017 wouldn't feel he needed to apologize for. In general, what we aimed to transform the source code into was something we could believe Crowther and Woods—two of the most brilliant hackers of their time—would have written in 1977 if they had then had the tools and best practices of 2017 at their fingertips.

Our most (ahem) adventurous move was to scrap the custom text-database format that Crowther and Woods had used to describe the vocabulary of the game and the topology of Colossal Cave.

This—the "complex, declaratively-specified data structure" I mentioned earlier—was the single cleverest feature of the design, and it went all the way back to Crowther's very first version. The dungeon's topology is expressed by a kind of pseudo-code broadly resembling the microcode found underneath a lot of processor architectures; movement consists of dispatching to the sequence of opcodes corresponding to the current room and figuring out which one to fire depending not only on the motion verb the user entered but also on conditionals in the pseudo-code that can test for the presence or absence of objects and their state.

Good luck grokking that from the 2.5 code we started with though. Here are the first two rules as they originally appeared in adventure.text, comprising ten opcodes:

3 1 2 2 44 29 1 3 3 12 19 43 1 4 5 13 14 46 30 1 145 6 45 1 8 63 2 1 12 43 2 5 44 2 164 45 2 157 46 6 2 580 30

Here's how those rules look, transformed to the YAML markup that our restoration, Open Adventure now uses:

- LOC_START: travel: [ {verbs: [ROAD, WEST, UPWAR], action: [goto, LOC_HILL]}, {verbs: [ENTER, BUILD, INWAR, EAST], action: ↪[goto, LOC_BUILDING]}, {verbs: [DOWNS, GULLY, STREA, SOUTH, DOWN], action: ↪[goto, LOC_VALLEY]}, {verbs: [FORES, NORTH], action: [goto, LOC_FOREST1]}, {verbs: [DEPRE], action: [goto, LOC_GRATE]}, ] - LOC_HILL: travel: [ {verbs: [BUILD, EAST], action: [goto, LOC_START]}, {verbs: [WEST], action: [goto, LOC_ROADEND]}, {verbs: [NORTH], action: [goto, LOC_FOREST20]}, {verbs: [SOUTH, FORES], action: [goto, LOC_FOREST13]}, {verbs: [DOWN], action: [speak, WHICH_WAY]}, ]

The concept of using a Python helper to compile a declarative markup like this to C source code to be linked to the rest of the game was maybe just barely thinkable when Adventure 2.5 was written. YAML didn't exist at all until six years later.

But...designer's intent. That's much easier to see in the YAML version than in what it replaced. Therefore, given the purpose of heirloom restoration, YAML is better. Rather like stripping darkened varnish from a Rembrandt—the bright colors beneath may startle if you're used to the obscuring overlayer and think of it as definitive, but they are the truth of the work.

With our choices about what we could change so constrained, you might think the restoration was drudge work, but it wasn't like that at all. It was more like polishing a rough diamond—gradually seeing brilliance emerge from beneath an unprepossessing surface. The grottiness was largely—though not entirely—a consequence of the limitations of the tools Crowther and Woods had at hand. When we cleaned that up, we found genius with only a tiny sprinkling of bugs.

My dev team fixed those bugs, of course. We're hackers; that means we consider heirloom software a living heritage to be improved, not an idol to be worshiped. We certainly didn't think, for example, that Don Woods intended use of the verb "extinguish" on an oil-filled unlit urn to make the oil in it vanish.

Petr Vorpaev, reviewing a draft of this article, observed "Sometimes, we stripped bits of genius off, too. Because it was genius that was used to work around limitations that aren't there any more." He's thinking of a very odd feature of the 2.5 code—it worked around the absence of a string type in old FORTRAN by representing strings in a six-bit-per-character encoding packing five characters into a 32-bit word. That is, of course, a crazy thing to do in C, and we targeted it for removal early.

We added some minor features as well. For example, Open Adventure allows some command abbreviations that are standard in text-adventure games today but weren't supported in original ADVENT. By default, our version issues the > command prompt that also has been in common use for decades. And, you can edit your command input with Emacs keystrokes.

But, and this is crucial, all the new features are suppressed by an "oldstyle" option. If you choose that, you get a user experience that even a subject-matter expert would find difficult or impossible to distinguish from the 1995 and 1976–1977 originals.

Some of you might nevertheless be furrowing your brows at this point, wondering "YAML? Emacs keystrokes? Even as options? Yikes...can this really still be Colossal Cave Adventure?"

That's a question with a long pedigree. Philosophers speak of the "Ship of Theseus" thought experiment; if Theseus leaves Athens, and on his long voyage each plank and line and spar of the ship is gradually replaced, until not a fragment of the original wood remains when he returns to Athens, is it still the same ship?

The answer is, as any student of General Semantics could tell you, "What do you mean by 'same'?" Identity is not a well defined predicate; it changes according to what kind of predictive problem you are using language to tackle. Same arrangement of bits in the source? Same UI? Same behaviors at some level deeper than UI?

There really isn't one right answer. Those of you predisposed to answer "same" might argue "Hey, it passes the same regression tests." Only, maybe it doesn't now. Remember, we fixed some bugs. On the other hand...if the ship of Theseus is still "the same" after being entirely rebuilt, does it cease to be if we learn that the replacement for one of its parts doesn't replicate a hidden flaw in the original? Or if a few improvements have been added during the voyage that weren't in the original plans?

As a matter of fact, Adventure already has come through one entire language translation—FORTRAN to C—with its "identity" (in the way hackers and other people usually think of these things) intact. I think I could translate it to, say, Go tomorrow, and it would still be the same game, even if it's nowhere near the same arrangement of bits.

Furthermore, I can show you the ship's log. If you go to the project repository, you can view each and every small transformation of the code between Adventure 2.5 and the Open Adventure tip version.

There is probably not a lot of work still to be done on this particular project, as long as our objectives are limited to be performing a high-quality restoration of Colossal Cave Adventure. As they almost certainly will be; if we wanted to do something substantially new in this kind of game, the smart way to do it would not be to code custom C, but to use a language dedicated to implementing them, such as Muddle (aka MDL) or Adventure Definition Language.

I hope some larger lessons are apparent. Although I do think Colossal Cave Adventure is interesting as an individual case in itself, I really wrote this article to suggest constructive ways to think about the general issues around restoring heirloom software—why you might want to do it, what challenges and rewards you'll find, and what the best practices are.

Here are the best practices I can identify:

The goals to hold in mind are 1) making the design intent of the original code available for study, and 2) preserving the oldstyle-mode UI well enough to fool an original user.

Build your regression-test suite first. You want to be able to demonstrate that your restoration is faithful, not just assert it.

Use coverage tools to verify that your regression tests are good enough to constitute a demonstration.

Once you have your tests, don't sweat changing tools, languages, implementation tactics or documentation formats. Those are ephemera; good design is what endures.

Always have an oldstyle option. Gain the freedom to improve by making, and keeping, the promise of fidelity to the original behavior in oldstyle mode.

Do fix bugs. This may conflict with the objective of perfect regression testing, but you're an engineer, not an embalmer. Work around that conflict as you need to.

Show your work. Your product is not just the restored software but the repository from which it ships. The history in that repository needs to be a continuing demonstration of good judgment and sensitivity to the original design intent of the code.

Document what you change, including the bug fixes. It is good practice to include maintainer's notes describing your restoration process in detail.

When in doubt about whether to add a feature, be neither over-eager to put your mark on the code nor a slave to its past. Instead, ask "What's in good taste?"

And while you're doing all this, don't forget to have fun. The greatest heirloom works, like Colossal Cave Adventure, were more often than not written in a spirit of high-level playfulness. You'll be truer to their intent if you approach restoring them with the same spirit.