by Robert W. Oliver II

This week’s featured GitHub repository is MS-DOS, published by Microsoft in 1981

Pure Retro Joy

When I learned that Microsoft’s MS-DOS source code was one of the most popular repositories on GitHub, I was nothing short of ecstatic.

If you’ve followed any of my blog posts here, you’ll know I’m quite a fan of retro computing. I’ve covered various topics like MS-DOS video game development and put together a simple 16-bit operating system named Retrokern. When talking with other developers who are retro-enthusiasts, we often speak with passion in the challenge of creating fully-functional software and games in the most confining of spaces. It was this intense scarcity of hardware resources that encouraged ingenuity and spawned an entire golden era of PC computing.

In 2014, Microsoft released the source code to MS-DOS 1.25 and 2.0. These versions released in 1981 and 1983 respectively, were licensed to both IBM (branded as PC-DOS) and IBM-PC clone manufacturers and OEM (original equipment manufacturers). In the version 1.x and 2.x era, releases of DOS were tailored to the manufacturer due to dramatic variations in BIOS compatibility. Thanks to the IO.SYS (sometimes named IBMBIO.COM) hardware abstraction, Microsoft needed to only modify this relatively thin layer to allow their operating system to execute on a wide variety of 8086-based machines.

Exploring the Source

In our trip down 640k memory lane, we’ll be exploring the 2.0 version. Not only is it the most recent source listed, but it also bears a stronger resemblance to later DOS versions.

Before we get into specific segments, let’s discuss the programming language of MS-DOS 2.0 — x86 assembly. Nearly all systems level and game programming in the early DOS days was done in assembly. For an operating system, some assembly is required, especially in the boot loader. Since CPU cycles and bytes of memory were incredibly expensive in the 80’s, the entire MS-DOS kernel and associated utilities were written in assembly.

DOS Boot

When a BIOS-enabled PC (pretty much any PC made in the 80’s, 90’s, and 2000’s) boots, the BIOS runs various self-checks and sets up its own interrupt vectors (function call tables for software interrupts), then loads the first sector of the boot drive and transfers control to it. The operating system has a very small window of code available to load the rest of the system from disk and transfer execution to it. This is handled primarily in the SYSINIT.ASM file.

In the early days of DOS, this feat was not overly challenging, but as hardware and filesystems became more complex, it was clear the PC world needed a better boot loader. Thus, the UEFI (universal extensible firmware interface) standard was born, used by both Macs and PCs. Despite its support for huge hard drives, x86_64 processors, and hardware never dreamed of in the 1980’s, the UEFI system includes a legacy BIOS layer emulating the same bootstrapping behavior that an operating system like MS-DOS 2.0 would expect.

The MS-DOS Kernel

The hardware abstraction and input/output layer, called IO.SYS, pared with the MS-DOS kernel, MSDOS.SYS, make up the core of the system. Many of the files in the source tree combine to produce these two files. These files provide various routines that are available via interrupts, namely int 0x21 (or referred to as 21h, the “h” for hex).

This software interrupt allows DOS programs to allocate memory in a “safe” way (I use quotes because DOS memory allocation was far from perfect), access the filesystem, and display text on the screen. These function calls are far more portable than BIOS calls since BIOS specifications were rarely entirely compatible. DOS kernel functions provided a safe way to accomplish most of a program’s needs.

When speed was critical, however, programs would often ignore DOS and use the hardware directly. This was especially true in the case of text display. Command line utilities were fine with the 0x09 int 0x21 call to print text (terminated by a $ sign), but complex applications or games would often write to the screen directly at 0xB800.

The Command Interpreter

COMMAND.ASM provides the source code for the part of DOS most users interacted with — the command interpreter. It was often known by its binary name, COMMAND.COM. Once the kernel was loaded, the command interpreter would be started. The program was unique in that it had three unique parts: the init, transient, and resident sections.

The init portion of COMMAND.COM loaded the rest of the portions, processed the AUTOEXEC.BAT file (a script launched at each boot with startup commands), then transferred control to the transient portion.

The transient portion handled the user input loop. It would display the command prompt, wait for user input, then process those commands. The transient portion remained in memory during program execution but was considered volatile. Programs that used considerable amounts of memory would overwrite this section. It wasn’t needed during program execution, so this was perfectly acceptable.

The resident portion of was always present. It included the code necessary for starting and terminating programs as well as handling their exceptions (including the CTRL+C user-generated exception). It also included code that would reload the transient portion from the COMMAND.COM file.

In later DOS versions, the command interpreter could be substituted with another program. Popular alternatives were 4DOS, released by JP Software, and NDOS by Norton. These alternatives provided additional functionality and extra quality-of-life functions for the DOS user.

DOS Utilities

MS-DOS was more than just a boot loader and command interpreter — it shipped with various utilities for formatting disks, managing files, and debugging software. Let’s explore several interesting programs that were included in MS-DOS 2.0

COPY

Interesting enough, the copy utility, used for copying files from one location to another, wasn’t a separate program but rather part of the transient portion of the command interpreter. If a program wanted to copy a file, it either had to do it by itself or spawn another command interpreter and run its copy command for that purpose.

The source code for copy can be found in v2.0/source/COPY.ASM. This is included in the larger COMMAND.ASM file. As with most assembly code, the various routines inside the copy function are split apart by labels (specified by a name with a proceeding colon). These labels were translated into JMP (jump) addresses by the assembler.

An interesting artifact lies on line 120:

mov [MELCOPY],al ; Not a Mel Hallerman copy

Mel Hallerman was an IBM employee who is credited with writing some of the utilities included in MS-DOS 2.0. Unfortunately, I could find no code documentation or external reference source that provided a clue as to why the routine was given this name.

DIR

The file allocation table (FAT) was a directory that existed on the disk to point to the absolute locations of various files and subdirectories. Rather than browse this binary directory, the “dir” utility embedded in COMMAND.COM interprets this and displays it in an easy-to-digest fashion to the user.

The source code for this command can be found in DIR.ASM. Like COPY.ASM, it is compiled into the COMMAND.ASM file for inclusion into the transient portion of the command interpreter.

CHKDSK

MS-DOS 2.0 introduced directories. Files no longer had to reside in the root directory, and users could create a file and folder system to suit their needs. While this was a great achievement for Microsoft, this directory system is not perfect. When it failed, and it sometimes did during power outages and hard lockups, CHKDSK was usually the first rescue tool deployed.

The code for CHKDSK is, appropriately enough, found in CHKDSK.ASM. It’s an impressive piece of code that can rescue data in many cases of a corrupted FAT.

Oddly enough, until DOS 3.x, the DIR command didn’t show the user how much free disk space remained on the disk. The CHKDSK utility was the most common way to retrieve this information.

EDLIN

MS-DOS 5 and later included EDIT, a menu-driven text-based editor that was simple to use. But before this, MS-DOS users often relied on EDLIN.

Edlin was a line editor written by Tim Patterson for 86-DOS, the precursor to MS-DOS 1.0. I specify line editor because rather than accept free-form text like most other editors, the user could only input one line at a time. To display already written text, the user could use the “L” command, or prefix this with numbers to indicate the lines to display. Inserting text among what was already entered was a chore — users needed to relist the text and use the “I” command (preceded by the line number) to begin inserting text.

While Edlin wasn’t known for its intuitive interface, it was quite powerful in certain circumstances, providing a great search and replace feature and enabling users to delete multiple lines with just a few keystrokes.

DEBUG

Unless you purchased an assembler or compiler, writing software for DOS was difficult. Since no significant development utilities were included in the base package, amateur programmers were faced with a chicken-or-the-egg problem. Without a way to encode assembly commands into the binary code that the x86 processor uses, they were stuck with writing batch files in EDLIN.

I personally experienced this on my first DOS computer. I was a teenager with no job, so shelling out money for development software was not an option. Until I could acquire better tools, I often wrote assembly language programs in DEBUG.

The process of entering programs into DEBUG was tedious, but, true to its name, it allowed for instant debugging of said code. DEBUG was only able to write .COM files, meaning my programs couldn’t be larger than 64k, but that was more than enough to suit my programming needs until I later acquired more professional tools.

DEBUG was also excellent for disassembling and patching code. If the file was a .COM program, you could patch problematic code and write the file back to disk, providing a primitive yet effective hex editor.

The source code for DEBUG.COM can be found in DEBUG.ASM.

Further Exploration

The code is adequately documented, but for some commands, like FORMAT, a text file is provided with an in-depth discussion of the program’s operation and source code structure. I found these files quite useful in my exploration of this ancient treasure.

The Future of DOS

Believe it or not, new DOS software is still being written. And across the interwebs you’ll find reports of DOS computers still being used in various commercial applications. George R.R. Martin uses WordStar, a DOS-based word processor, for his Game of Thrones book series. YouTuber and Linux personality Brian Lunduke underwent a 30-day 1989 computing challenge and, last I heard, still uses a DOS spreadsheet program. A good friend of mine uses the distraction-free environment of DOS to get some serious work done.

I won’t pretend that DOS is the operating system of the future. That would be, at best, incredibly naïve. But the fact that this nearly forty-year-old operating system is still kicking in various incarnations, including the popular FreeDOS clone, is astounding.

I encourage you to marinate on this thought for a moment — we still boot computers and write software for an operating system written nearly four decades ago when Jimmy Carter was president.