A Personal History of Compilation Speed, Part 2

(Read Part 1 if you missed it.)

My experience with IBM Pascal, on an original model dual-floppy IBM PC, went like this:

I wrote a small "Hello World!" type of program, saved it, and fired up the compiler. It churned away for a bit, writing out some intermediate files, then paused and asked for the disc containing Pass 2. More huffing and puffing, and I swapped back the previous disc and ran the linker. Quite often the compiler halted with "Out of Memory!" at some point during this endeavor.

Now this would have been a smoother process with more memory and a hard drive, but I came to recognize that a compiler was a Very Important Program, and the authors clearly knew it. Did it matter if it took minutes to convert a simple program to a machine language executable? Just that it could be done at all was impressive indeed.

I didn't know it at the time, but there was a standard structure for compilers that had built-up over the years, one that wasn't designed with compilation speed as a priority. Often each pass was a separate program, so they didn't all have to be loaded into memory at the same time. And those seemingly artificial divisions discussed in compiler textbooks really were separate passes: lexical analysis, parsing, manipulation of an abstract intermediate language, conversion to a lower-level level intermediate language, peephole optimization, generation of assembly code. Even that last step could be literal, writing out assembly language source code to be converted to machine language by a separate tool. And linking, there's always linking.

This was all before I discovered Turbo Pascal.

On one of those cheap, floppy-only, 8088 PC clones from the late 1980s, the compilation speed of Turbo Pascal was already below the "it hardly matters" threshold. Incremental builds were in the second or two range. Full rebuilds were about as fast as saying the name of each file in the project aloud. And zero link time. Again, this was on an 8MHz 8088. By the mid-1990s, Borland was citing build times of hundreds of thousands of lines of source per minute.

The last time I remember seeing this in an ad, after Turbo Pascal had become part of Delphi, the number was homing in on a million lines per minute. Projects were compiled before your finger was off of the build key. It was often impossible to tell the difference between a full rebuild of the entire project and compiling a single file. Compilation speed was effectively zero.

Borland's other languages with "Turbo" in the name--like Turbo C--weren't even remotely close to the compilation speeds of Turbo Pascal. Even Turbo Assembler was slower, thanks in part to the usual step of having to run a linker. So what made Turbo Pascal so fast?

Real modules. A large percentage of time in C compilers is spent reading and parsing header files. Even a short school assignment may pull in tens of thousands of lines of headers. That's why most C compilers support precompiled headers, though they're often touchy and take effort to set-up. Turbo Pascal put all the information about exported functions and variables and constants into a compiled module, so it could be quickly loaded, with no character-by-character parsing needed.

Integrated build system. The standard makefile system goes like this: first the "make" executable loads, then it reads and parses a file of rules, then for each source file that is out of date, the compiler is started up. That's not a trivial effort, firing up a huge multi-megabyte executable just to compile one file. The Turbo Pascal system was much simpler: look at the list of module dependencies for the current module; if they're all up date, compile and exit; if not, then recursively apply this process to each dependent module. An entire project could be built from scratch without running any external programs.

Minimal linker. Have you ever looked at the specs for an object file format? "Complicated" and "bulky" are two terms that come to mind. Turbo Pascal used a custom object file with a minimal design. The "linker" wasn't doing anywhere near the work of standard linkers. The result was that the link step was invisible; you didn't even notice it.

Single pass compiler with combined parsing and code generation. No separate lexer, no separate parser, no abstract syntax tree. All of these were integrated into a single step, made possible by the straightforward syntax of Pascal (and by not having a preprocessor with macros). If you're curious, you can read more about the technique.

Yes, there was a drawback to instantaneous compile times. Fewer optimizations were done, and almost always the resultant code was slower than the C equivalent. But it didn't matter. Removing the gap between the steps of writing and running code was worth more than some amount of additional runtime performance. I used to hit the build key every so often, even while typing, just to check for syntax errors. And zero compilation speed eventually became standard, with the rise of interpreted languages like Perl, Ruby, and Python.

permalink August 22, 2009

previously