Assembling the history of Unix

Benefits for LWN subscribers The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

The moment when an antique operating system that has not run in decades boots and presents a command prompt is thrilling for Warren Toomey. He compares it to restoring an old Model-T. "An old car looks pretty, but at the end of the day its purpose is to drive you somewhere. I love being able to turn the engine over and actually get it to do its job."

Toomey, an Australian university lecturer, founded the Unix Heritage Society to reconstruct the early history of the Unix operating system. Recently this historical code has become much more accessible: we can now browse it in an instant on GitHub, thanks to the efforts of a computer science professor at the Athens University of Economics and Business named Diomidis Spinellis. The 50th anniversary of the invention of Unix will be in 2019; the painstaking work of Toomey and Spinellis makes it possible for us to appreciate Unix's epic story.

The Unix Heritage Society

Around 1993, while he was a researcher at the University of New South Wales, Toomey began asking on mailing lists and news groups for old Unix versions with the intent to run them on a PDP-11 simulator. He began a group called the PDP-11 Unix Preservation Society, whose mission grew to encompass all old Unix releases and was renamed the Unix Heritage Society in 2000. "I think the title is a bit grandiose," he said in an interview with LWN. "It's not really a society, just me and the mailing list."

Toomey's project faced two obstacles; the first was simply to locate enough parts of each old Unix version to assemble a complete copy. He haunted the newsgroups and mailing lists of old Unix hackers, and he heard rumors of people who knew where to get historical artifacts. Most of his requests went unanswered. He recalls spending five or six years repeatedly asking for specific files, until eventually someone would respond, "Oh, actually, I have it." By chance, Toomey discovered in his own university's computer room a dozen tapes with backups of the 6th and 7th Editions of Unix. The backups weren't bootable—there wasn't even a complete backup of either edition—but the discovery accelerated his project nevertheless.

His second obstacle was the long shadow of AT&T's original copyright. AT&T and other corporations allowed individuals to own copies of Unix, but not to share them. Toomey had found his university's copy of a System V source license, but this only provided a small bit of legal cover to ask strangers to share their vintage files with him. Occasionally, one of Toomey's inside informants might give him a 15-year-old copy of some file, saying, "Just don't tell anyone where you got it."

Whenever Toomey acquired what seemed to be a complete version of Unix, he had to get it up and running, without any documentation to guide him. "You've got an artifact," he said, "It might be a binary or source code and there's no Makefile, you've got no idea what was the right sequence of things to do to build it."

Last year, for example, Toomey and his friends from the Unix Heritage Society resuscitated the first version of Unix for the PDP-7, written in mid-1970. The primary source was a dot-matrix printout containing PDP-7 assembly code, badly printed with notes and corrections scribbled on it. The members of the society converted the blurred copy to digital text with an OCR program, but they knew there were transcription errors that they'd have to backtrack and fix. Undaunted, they proceeded to the next stage: they learned the syntax of PDP-7 assembly code and wrote an assembler to convert the badly scanned text to machine code.

Now, with a set of executable binaries, the team had to store them in a filesystem, and here they hit a circular dependency. They didn't know the binary format of the filesystem for that version of the Unix kernel. The kernel itself implemented this filesystem, but they had to get the kernel to boot in order to use it for that purpose. Toomey decided to use a PDP-7 simulator to reverse-engineer the basic layout of a bootable disk image, and wrote a tool to create such an image containing the executables that he and his friends had assembled. "It's chicken-and-egg, but you work in stages," he said. "You get one little bit working and you use that to leverage up the next bit."

Unix's two inventors have helped him along the way. "Ken Thompson is minimalist in his communication," Toomey said. When the Unix Heritage Society brought up a PDP-11 version of Unix, he sent Thompson a series of emails about it, to which Thompson responded with single-word messages: "Amazing," or, "Incredible." Toomey said that while Dennis Ritchie was alive, he enthusiastically supported the project. "I really miss him an awful lot."

The Unix History Repository on GitHub

It's valuable to preserve snapshots of old-fashioned systems, but these snapshots don't fit modern programmers' methods for exploring the history of an evolving code base. Today, we read history with tools like Git. Spinellis has imported over 44 years of Unix code history into Git and published the repository on GitHub. The project builds on Toomey's accomplishments, but Spinellis wants more than just the code: he is building a moment-by-moment history of its evolution, and line-by-line attribution of each author's contributions.

Unix was developed without any version control at first. When development moved to the University of California at Berkeley in the late 1970s, coders began tracking certain files in an early version control system called SCCS, but even then it was not used for all files. Spinellis reconstructed as much history as he could by importing entire snapshots of early Unix versions into Git as if they were single commits. He researched primary sources like publications, technical reports, man pages, or names written in comments in the source code to attribute particular parts of the code to their authors.

Since publishing the repository on GitHub, Spinellis has continued to refine it periodically. He recently discovered an author unacknowledged in the Git logs whose contributions he wants to add. This March, the copyright holders for Unix Research Editions 8, 9, and 10 granted permission to distribute those versions, so that history can now be integrated into the repository. Additionally, Spinellis points out that he only followed one Unix variant to its conclusion: FreeBSD. Other variants like NetBSD and OpenBSD are just as old and interesting; their stories could be added to the repository as distinct branches.

But why?

fork()

exec()

Both Spinellis and Toomey enjoy reading old Unix code to see how much power the early programmers could jam into a tiny memory footprint. For example, the PDP-7 Unix that Toomey recovered last year is a minimalist masterpiece. It is a recognizable Unix system, including theandsystem calls, multiple user accounts, file permissions, and a directory structure, all implemented in only 4000 words of memory.

"But it's really not the source code that's important," said Toomey. "It's the ideas that are embodied in it." AT&T's efforts to protect the Unix code were irrelevant, he said, because the real value lies in concepts like connecting small utilities together with pipes and implementing the system in a portable programming language.