Origins and Development of TOPS-20

by Dan Murphy

Copyright (c) 1989, 1996 - Dan Murphy

ORIGINS

Hence, to understand the factors which influenced the design of TOPS-20, we have to look back to the late 1960s and the environment in which TENEX was created. Some of the ideas that went into TENEX, and the process which ultimately brought it into DEC, had roots going back to the early 1960's. In this section, we will trace these aspects of TOPS-20's history.

I was an undergraduate at MIT from 1961 to 1965. After receiving my degree in 1965, I went to work for BBN doing various system programming activities and, in particular, supporting the LISP system then in use for AI research. I was involved in the design of TENEX and implemented a number of its kernel modules from 1968 to 1972 and then came to DEC along with TENEX in 1973.

DEC bought TENEX in 1973, believing that it represented a good base for a new, state-of-the-art commercial system. However, as with other "overnight sensations", this was just the final step in a flirtation between DEC and BBN that had gone on over many years. In 1961, BBN was one of the first sites external to DEC to install and use a PDP-1 (DEC's first computer), and did some early experiments in timesharing using this machine and some locally designed modifications. This is generally recognized as the first practical timesharing system. During the subsequent years, many discussion took place between the two companies about the best way to design and build computers.

Paging Development in the 1960'S

One such area of discussion was paging and memory management. BBN was pursuing a number of AI research projects in LISP, and these typically required a large amount of random access memory. Few computers of that day (and none that BBN could afford) had the size of memory that was required, so various alternative techniques had been sought to work around the limitation[2]. Paging and "virtual memory" was one in which there was a great deal of interest. There was a LISP system running on the PDP-1 at BBN, and I had converted it to use a software paging system that I had developed earlier at MIT.

This system used a high speed drum as backing store for main memory, and divided both main memory and the drum into fixed size units (pages) that could be copied back and forth as necessary. The LISP system itself used 18-bit pointers and was built as if there were an 18-bit (262 kword) memory on the machine. However, with software paging, each reference to the target of one of these pointers was changed to a subroutine call (or in-line sequence). The subroutine would take the high order few bits of the pointer as an index into an array (the page table) which would contain the actual current location of the page. If the page were then in main memory, the page number bits from the table would replace those which had been used as the index into the table, and the reference would be completed. If the desired page were not in main memory, a call would be made to a "page manager" routine to rearrange things as necessary. That meant selecting a page in main memory to be moved (copied) back to the drum, and then reading into this page the desired page from the drum. Finally, the table would be adjusted to reflect these changes, and the reference sequence begun again.

In actual use, this system was fairly effective. The reference patterns of the typical LISP programs that we were running were such that the system was not dominated by waits for drum IO. That is, most pointer references were to pages then in main memory, and only a small fraction were to pages which had to be read in from the disk*.

[* - Several studies, some by us at BBN, had been done to investigate memory reference behaviour and gather statistics from various programs[3].]

However, the actual number of references was sufficiently high that a great deal of time was spent in the software address translation sequence, and we realized that, ultimately, this translation must be done in hardware if a truly effective paged virtual memory system were to be built.

Early interest in the PDP-6 and first 36-bit product cancellation.

The PDP-6 had been announced in 1964 and was immediately the machine of choice for LISP programming. A single word holding two addresses (a CAR and CDR in LISP terms) seemed designed especially for LISP (this wasn't mere coincidence). The big (for the time) address space of 2_^18 words offered the promise of fast, efficient execution of large LISP programs. Although the basic instruction set didn't include a CONS instruction (the LISP primitive that builds lists), one early PDP-6 had a special modification installed to provide that operation. This was the machine that was installed at the AI Lab of Stanford University, headed by one of the inventors of LISP, John McCarthy.

The group at BBN had every intention of acquiring a PDP-6 to improve our LISP capabilities, but of course, we wanted it to have hardware paging so that we could carry over the paging techniques from our PDP-1 LISP. Several discussions were held between BBN and DEC, including Win Hindle and Alan Kotok on the DEC side. BBN was lobbying for paging to be included in a subsequent PDP-6 model, or possibly as an add-on, but these discussion came to an end when DEC announced that there would be no further PDP-6 models -- that is, that DEC was going out of the 36-bit business. This (1966) was the first of several such occasions.

A Detour and Learning Experience

Taking DEC at its word, BBN turned its attention to selecting another machine, hopefully one with an expanding future, that we could use to support our various research projects. The result of this selection process was the purchase of an SDS-940. SDS was Scientific Data Systems in El Segundo, California, later acquired by Xerox and known as XDS. The SDS-940 was a 24-bit word machine with a modest virtual address space, but it did have hardware paging capabilities. Also, SDS was touting as a follow-on, the Sigma-7 then under development, which would be larger, faster, and do all the things we could ever possibly want. If it did, we never found out because by the time it came out, the DEC KA-10 had been announced (or at least leaked), and we happily switched back to the architecture we really wanted.

It was also a factor that the Sigma-7 suffered a bit of schedule slippage. Yes, this actually happened despite firm assurances from SDS that no such thing was possible. SDS had even published several ads in trade magazines touting their forthcoming system as "the best timesharing system not yet available". Another memorable ad read "they said we must be crazy to publish a firm software development schedule, but here it is." Apparently, "they" were right because that firm schedule went the way of most "firm" software development schedules of the period.

The former ad also inspired someone at DEC to create a counter ad for TOPS-10 which read "announcing the best timesharing system very much available."

BBN did use the SDS-940 for a couple of years however, and we ran on it an operating system developed by a group at U.C. Berkeley. That was significant because a number of features in the Berkeley timesharing system were later modified and adopted into TENEX.

The 36-bit machine is born again.

When BBN found out the DEC had secretly gone back into the 36-bit business (or at least was developing a new machine of that architecture), we once again lobbied for the inclusion of a hardware paging system. And once again, this was to no avail. One advancement in the KA-10 was the dual protection and relocation registers (the PDP-6 had only a single pair). This allowed programs to be divided into two segments: a reentrant, read-only portion, and a read-write data area. That, however, was as far as DEC was willing to go at that time in advancing the state of the art of operating system memory managment support.

Another factor was DEC's firm intent to keep down the price of this new large machine. DEC was doing well with small machines (PDP-5 and PDP-8 in particular), but the PDP-6 had suffered numerous engineering and reliability problems, was hard to sell, and, some say, almost wrecked the company. The KA-10 had a design constraint that a minimal machine could be configured and sold for under $100,000. This configuration was limited to 16 kwords of main memory and had no solid-state registers for the 16 accumulators (all AC references went to main memory at considerable cost in speed). To support this, a "tiny" build of TOPS-10 was designed which would fit in 8 kwords, and various utilities were similarly squeezed: the 1K PIP, etc. Whether the $99,999 entry price was a good marketing ploy is hard to say. In any case, none of that configuration were ever sold.

Then too, paging and "virtual memory" were still rather new concepts at that time. Significant commercial systems had yet to adopt these techniques, and the idea of pretending to have more memory that you really had was viewed very skeptically in many quarters.

Paging Requirement Leads to New Operating System

Undaunted by DEC's refusal to see the wisdom of our approach, BBN nonetheless planned the purchase of several KA10's and set about figuring out how to turn them into the system we really wanted. In addition to the 36-bit architecture we favored, the KA10 system had retained much of the hardware modularity of the PDP-6 including in particular the memory bus with independent memory units. This led us to conceive of the idea of putting a mapping device between the processor and the memories which would perform the address translation as we wanted it done.

Another decision we had to make was whether or not to attempt to modify TOPS-10 to support demand paging. The alternative was to build a new system from scratch, an ambitious undertaking even in those days. Obviously, we decided to build a new system -- the system which was later named TENEX. This decision was justified largely on the grounds that major surgery would be required to adopt TOPS-10 as we desired, and even then we probably wouldn't be able to solve all the problems. In retrospect, this view was probably justified since, although TOPS-10 development continued for nearly 20 years after we started TENEX, TOPS-10 was never modified to have all of the virtual memory features of TENEX/TOPS-20. TOPS-10 did ultimately support paging and virtual memory, but not the various other features.

Beyond that, there were a number of other features not related to paging that we wanted in an operating system. This further tilted the decision toward implementation of a new system. On the other hand, we had no desire or ability to implement new versions of the many language compilers and utilities then available under TOPS-10, so a method of running these images "as is" was needed. We decided it would be possible to emulate the operating system calls of TOPS-10 with a separate module which would translate the requests into equivalent native services. This is described in more detail later. The plan then, would be to implement a new system from the operating system interface level down plus particular major systems such as LISP that were key to our research work.

Virtual memory through paging was the fundamental need that led to the decision to build TENEX. Had TOPS-10 or another system been available which supported only that requirement, we probably would have used it and not undertaken the construction of a new kernel. The other desires we had for our system would not have been sufficient to warrant such an effort, and of course, we did not foresee all of the capabilities that it would eventually have.

ORIGIN OF MAJOR FEATURES

TENEX development was motivated largely by the needs of various research programs at BBN, Artificial Intelligence in particular. These needs were shared by a number of other labs supported by the Advanced Research Projects Agency (ARPA) of the DOD, and in the period from 1970 to 72, a number of them installed TENEX and made it the base for their research programs. As noted before, a large virtual memory was foremost among these needs because of the popularity and size of the LISP programs for speech recognition, image processing, natural language processing, and other AI projects.

During this same period, the ARPA network was being developed and came on line. TENEX was one of the first systems to be connected to the ARPANET and to have OS support for the network as a general system capability. This further increased its popularity.

Like most other software systems, TENEX was an amalgam of ideas from a number of sources. Most of the features which came to be prized in TOPS-20 had roots in other systems. Some were taken largely unchanged, some were modified in ways that proved to be critical, and others served merely as the germ of an idea hardly recognizable in the final system. Among those that we can trace to one or more previous systems are: "escape recognition", virtual memory structure, process structure and its uses, and timesharing process scheduling techniques.

Three system most directly affected the design of TENEX -- the MULTICS system at MIT, the DEC TOPS-10 system, and the Berkeley timesharing system for the SDS 940 computer. MULTICS was the newest and most "state of the art" system of that time, and it incorporated the latest ideas in operating system structure. In fact, it was popular in some circles to say that, with the implementation of MULTICS, "the operating system problem had been solved". Several members of the MIT faculty and staff who had worked on MULTICS provided valuable review and comment on the emerging TENEX design.

Many of the paging concepts came from our own previous work, the PDP-1 LISP system in particular. Other ideas had recently appeared in the operating system literature, including the "working set" model of program behaviour by Peter J. Denning[4]. MULTICS had developed the concept of segmentation and of file-process mapping, that is, using the virtual address space as a "window" on data which permanently resides in a file.

The Berkeley 940 system had a multi-process structure, with processes that were relatively inexpensive to create. That in turn allowed such things as a system command language interpreter which ran in unprivileged mode and a debugger which did not share address space with the program under test and therefore was not subject to being destroyed by a runaway program.

Virtual Memory and Process Address Space

These two concepts, virtual memory and multiple processes, were fundamental to the design of TENEX, and ultimately were key to the power and flexibility of TOPS-20. The two abstractions also worked well together. The concept of a process included a virtual address space of 262 kwords (18-bit addresses) -- the maximum then possible under the 36-bit instruction set design. Extended addressing was still many years away, and the possibility that more than 262k might ever be needed rarely occured to anyone.

We merely wanted the full virtual address space to be available to every process, with no need for the program itself to be aware of demand paging or to "go virtual". We believed that mechanisms could be built into the paging hardware and operating system to track and determine process working sets, and it would do as good a job at managing memory efficiently as if the program explicitly provided information (which might be wrong). It might happen, of course, that a particular program had too large a working set to run efficiently in the available physical memory, but the system would still do the best it could and the program would make some progress at the cost of considerable drum thrashing. In any event, the implementation of virtual memory was to be transparent to user programs.

Page Reference History Information

To support transparent demand-paged virtual memory, we had learned that we had to maintain history information about the behaviour of programs and the use of pages. This led to the design and inclusion of the "core status table" in the hardware paging interface. Other tables (i.e. page tables) would hold information that provided the virtual-to-physical address translation; the core status table was specified to hold information related to the dynamic use and movement of pages. For each physical page in use for paging, the core status table would record: (1) a time-stamp of the last reference, and (2) an identification of the processes that had referenced the page. The latter was included because we expected that there would often be many processes sharing use of a given page. The need for sharing was represented in several other aspects of the paging design as well[5].

Shared Pages and File-to-Process Mapping

The perceived need for sharing pages of memory and the segmentation ideas we had seen in Multics led to the design of the file-to-process mapping capabilities. In the paging hardware, this was supported by two types of page table entries, "shared" and "indirect", in addition to the "private" type that would contain a direct virtual-to-physical mapping. These allowed the physical address of a shared page to be kept in one place even though the page was used in many places, including in the address space of processes that were swapped out. The "indirect" pointer eventually allowed another capability that we didn't initially foresee -- the ability to make two or more process address spaces equivalent, not only with regard to the data contained therein, but with regard to the mapping as well*.

[* - This allows, for example, a Unix-type "fork" operation to create the child address space very cheaply whether or not it is later replaced by an EXEC.]

In software, we wanted to achieve some of the benefits of Multics "segmentation", but the PDP-10 address space wasn't large enough to use the Multics approach directly. Consequently, we decided to extend the idea of named memory down to the page level and to allow mapping to be done on a page-by-page basis. The result of this was the fundamental design of the TENEX/TOPS-20 virtual memory architecture, specifically:

A process address space contains no real storage, it is merely a set of 512 windows (mappings) to storage.

The only real storage resides in files. A page of real storage is identified by the combination of: file name + VPN (virtual page number).

A system call maps pages of real storage into process windows and unmaps them. The same page may be mapped into many processes (and into multiple windows of the same process), and all windows see and share the same physical storage.

Copy-on-write pages

We knew that most sharing would be of read-only data, especially of program text and constant data, but that any program might contain initialized variables which would eventually be written. Beyond that, almost any normally read-only data might be modified under some circumstances, e.g. insertion of a break point for debugging. In the LISP system, vast amounts of program were represented as LISP structures, and these structures could be edited and modified by the user.

Hence, it wasn't acceptable to enforce read-only references on the bulk of this storage, nor would it be efficient to copy it each time a process used or referenced it. That would defeat much of the sharing objective. The solution we devised was the "copy-on-write" page. This meant that the page would remain shared so long as only read or execute references were made to it, but if a process tried to write it, a copy would be made and the process map adjusted to contain the copy. This would happen transparently to the process. True read-only enforcement was available of course, and the actual file contents (e.g. of a system program executable file) were appropriately protected against modification, but the user could map any such pages copy-on-write, and a write reference would then merely result in a modified copy being used.

Design Lessons Learned

Multics may be said to have contributed more than just the ideas for virtual memory organization and other specific capabilities. During the design of TENEX, we invited some of the Multics designers and implementors to review our progress and decisions. As is often the case, we had fallen into the trap of trying to do too much in a number of areas and had produced some designs that were quite convoluted and complex. Several people from Multics beat us up on those occasions, saying "this is too complicated -- simplify it! Throw this out! Get rid of that!" We took much of that advice (and could probably have taken more), and I credit these reviews with making the core capabilities significantly easier to understand and program, as well as to implement and debug.

This was one of the earliest occasions when I really began to appreciate the fact that a complex, complicated design for some facility in a software or hardware product is usually not an indication of great skill and maturity on the part of the designer. Rather, it is typically evidence of lack of maturity, lack of insight, lack of understanding of the costs of complexity, and failure to see the problem in its larger context.

The time from when the first word of TENEX code was written to when the system was up and serving regular users was about six months -- remarkable even for that era. The system was far from complete at that point, but BBN internal users were making effective use of it.

One of the things that contributed to such speedy development was our concern for debugging support. DDT was the standard symbolic debugger on the PDP-10, and it had been evolved from predecessor versions on the PDP-6, PDP-1, and TX-0 (an experimental machine at MIT that preceded the PDP-1). DDT had a lot of power for its time, and even today remains the choice of some experienced users despite the availability of screen- and window-oriented debugging systems. DDT ran in both exec (kernel) mode and user mode under TOPS-10, and we supported both on TENEX as well.

In addition, we provided a way for DDT to run under timesharing while looking at the system code and data structures. This allowed several developers to be debugging system code at the same time, at least to a limited extent. Even breakpoints could be handled, provided that no other process would execute the code while the breakpoint was in place. In all modes, DDT had the full set of symbols available to it, so we were able to debug without constant referral to listings and typing of numeric values.

In user mode, DDT could be added to any program even if it were already running. Previous systems required that debugging be anticipated by linking the debugger with the program. In TENEX, DDT was merely mapped when needed at a high unused address in the address space well out of the way of most programs. An even more powerful version of DDT, called IDDT (for invisible) ran in a separate process from the program being debugged. Thus, it was impervious to memory corruption by the program under test, and it could intercept any and all illegal actions. Unfortunately, this version of DDT diverged from the mainstream DEC DDT, and IDDT was never brought up to date and made the normal debugger, even under TOPS-20. Nonetheless, the easy and extensive use of DDT surely made development more effective, both for user and system programmers.

Another area where TENEX, and later TOPS-20, excelled was in the user interface, described in detail below. A key reason that we were able to extend the user interface so significantly is that the code which implemented it ran an an ordinary user program. Hence, it wasn't constrained by size or limited by functions which could not be performed in kernel mode. Beyond that, debugging it was much simpler -- it didn't require stand-alone use of the machine as did kernel debugging. All of this allowed a great deal of experimentation and evolution based on tuning and feedback from real use and real users.

User-oriented Design Philosophy

A piece of system design "philosophy" had emerged at BBN and among some of the ARPA research sites that was to have a large impact on the overall feel of TENEX and, ultimately, TOPS-20. At the time, we called this "human engineering" -- we wanted the system to be easy to learn and easy to use, and we wanted the system to take care of as many grungy details as possible so that the programmer or user did not have to deal with them. Beyond that, we were willing to spend real machine cycles and real (or at least virtual) memory to make this happen.

This philosophy led initially to the human interface features of the EXEC, including "escape recognition", the question-mark help facility, optional subcommands, and "noise" words. Few people now argue against the need to provide effective human interfaces, but at that time there were many detractors who felt that it was a waste of cycles to do such things as command recognition. These kinds of things, they said, would "slow the system down" and prevent "useful work" from getting done. Other contemporary systems used short, often one-letter, commands and command arguments, provided no on-line help, and did not give any response to the user other than the "answer" (if any) to the command that had been entered.

Computer use by non-experts

Many such systems fell into the "by experts, for experts" category. That is, they were built by experts and intended to be used by other experts. An "expert" would obviously not need frivolous features like noise words or command recognition. Experts would know, not only all the commands, but all the legal abbreviations as well. Experts may have read the manual once, but would always remember everything needed to interact with the system. So went the implicit assumptions about users -- either you were an expert, or you were an inferior being not really entitled to use the Computer anyway.

The TENEX team took a different view. It was clear that the power of computers was increasing every year and so one should expect the computer to do more and more interesting things. Most people would agree that a major purpose of computer automation is to relieve people of boring, repetitive tasks; we believed that purpose extended to computer system development as well, and that it was wrong to require a person to learn some new set of boring, arcane tasks in order to use the computer. The machine should adapt to man, not man to the machine.

The view was probably reinforced by the artificial intelligence research being done in the environment where TENEX was designed. In such areas as speech recognition, pattern recognition, and natural language comprehension, massive computation power was being applied to make computers interact in ways that would be more human. These were long-term efforts, but we wanted our computer systems to be more human-oriented in the sort term as well.

One of the ideas to come out of the AI research was that the computer should "do what I mean", even if I didn't say it quite right or maybe made a spelling error. Warren Teitelman implemented extensive capabilities in LISP to correct errors automatically or with minor user intervention, and this came to be called the DWIM package. With this kind of research effort going on, we were sensitive to the restrictions and obtuseness of most existing systems.

Hence, we took (now obvious) decisions like, (1) identifying users by alphanumeric name rather than number; (2) allowing file names to be long enough to represent meaningful (to the human) information; (3) typing or displaying positive confirmation of actions requested by the user; (4) allowing for a range of individual preferences among users in regard to style of command input and output.

Interactions Judged on Human-Human basis

We specifically looked for ways of interaction that would seem natural when judged in a human-to-human perspective. On that basis, many typical computer interface behaviours are quite unfriendly. Terse, cryptic, and/or abrupt responses; inflexible and obscure input syntax; use of words for meanings that are counter-intuitive -- these things are all judged as rude or worse if seen in a human-to-human interaction.

This desire to make the computer do the grunge work motivated design in areas other as well. Demand paging and virtual memory were seen as ways to reduce the burden and difficulty of writing large programs. With a large virtual memory, the programmer would not have to be concerned with segmenting and overlaying his program. Rather, he could leave memory management to the system and concentrate on the particulars of the task at hand. Other features such as process structure, system utility calls, and standardized file parsing were intended to ease the job of system programming by providing standard, powerful functions to the programmer.

Escape Recognition, Noise Words, Help

One of the most favored features among TOPS-20 users, and one most identified with TOPS-20 itself, is "escape recognition". With this, the user can often get the system to, in effect, type most of a command or symbolic name. The feature is more easily used than described; nonetheless, a brief description follows to aid in understanding the development of it.

A Brief Description of Recognition and Help

Typing the escape key says to the system, "if you know what I mean from what I've typed up to this point, type whatever comes next just as if I had typed it". What is displayed on the screen or typescript looks just as if the user typed it, but of course, the system types it much faster. For example, if the user types DIR and escape, the system will continue the line to make it read DIRECTORY.

TOPS-20 also accepts just the abbreviation DIR (without escape), and the expert user who wants to enter the command in abbreviated form can do so without delay. For the novice user, typing escape serves several purposes:

Confirms that the input entered up to that point is legal. Conversely, if the user had made an error, he finds out about it immediately rather than after investing the additional and ultimately wasted effort to type the rest of the command.

Confirms for the user that what the system now understands is (or isn't) what the user means. For example, if the user types DEL, the system completes the word DELETE. If the user had been thinking of a command DELAY, he would know immediately that the system had not understood what he meant.

Typing escape also makes the system respond with any "noise" words that may be part of the command. A noise word is not syntactically or semantically necessary for the command but serves to make it more readable for the user and to suggest what follows. Typing DIR and escape actually causes the display to show:

DIRECTORY (OF FILE)

This prompts the user that files are being dealt with in this command, and that a file may be given as the next input. In a command with several parameters, this kind of interaction may take place several times. It has been clearly shown in this and other environments that frequent interaction and feedback such as this is of great benefit in giving the user confidence that he is going down the right path and that the computer is not waiting to spring some terrible trap if he says something wrong. While it may take somewhat longer to enter a command this way than if it were entered by an expert using the shortest abbreviations, that cost is small compared to the penalty of entering a wrong command. A wrong command means at least that the time spent typing the command line has been wasted. If it results in some erroneous action (as opposed to no action) being taken, the cost may be much greater.

This is a key underlying reason that the TOPS-20 interface is perceived as friendly: it significantly reduces the number of large negative feedback events which occur to the user, and instead provides many more small but positive (i.e. successful) interactions. This positive reinforcement would be considered quite obvious if viewed in human-to-human interaction terms, but through most of the history of computers, we have ignored the need of the human user to have the computer be a positive and encouraging member of the dialog.

Typing escape is only a request. If your input so far is ambiguous, the system merely signals (with a bell or beep) and waits again for more input. Also, the escape recognition is available for symbolic names (e.g. files) as well as command verbs. This means that a user may use long, descriptive file names in order to help keep track of what the files contain, yet not have to type these long names on every reference. For example, if my directory contains:

BIG_PROGRAM_FILE_SOURCE VERY_LONG_MANUAL_TEXT