A Personal History of Compilation Speed, Part 1

The first compiled language I used was the Assembler Editor cartridge for the Atari 8-bit computers. Really, it had the awful name "Assembler Editor." I expect some pedantic folks want to interject that an assembler is not a compiler. At one time I would have made that argument myself. But there was a very clear divide between editing 6502 code and running it, a divide that took time to cross, when the textual source was converted into machine-runnable form. Contrast that to Atari BASIC, the only language I knew previously, which didn't feature a human-initiated conversion step and the inevitable time it took.

Conceptually, the Assembler Editor was a clever design. Source code was entered line by line, even using line numbers, just like BASIC. The assembler could compile the source straight from memory and create object code in memory, with no disk access to speak of. The debugger was right there, too, resident in memory, setting the stage for what looked like an efficient and tightly integrated development system.

Except for whatever reason, the assembler was impressively slow, and it got disproportionately slower as program size increased. A linear look-up in the symbol table? Some kind of N-squared algorithm buried in there? Who knows, but I remember waiting over seven minutes for a few hundred lines of code to assemble. Atari knew this was a problem, because there was a note in the manual about it only being suitable for small projects. They offered the friendly advice of purchasing a more expensive product, the Atari Macro Assembler (which was a standalone assembler, not an integrated environment).

Instead I upgraded to MAC/65, a third party alternative that followed the formula set by the Assembler Editor: cartridge-based for fast booting, BASIC-like editor and assembler and debugger all loaded into memory at once. MAC/65 was popular among assembly coders primarily on its reputation for quick assembly times. And quick it was.

Almost certainly the slowness of the Assembler Editor was because of a bad design decision, one not present in MAC/65. But MAC/65 went one step further: source code was parsed and tokenized after each line was entered. For example, take this simple statement:

LDA #19 ; draw all bonus items

It takes a good amount of work, especially on a sub-2MHz processor, to pick that apart. "LDA" needs to be scanned and looked-up somewhere. "19" needs to be converted to binary. The MAC/65 approach was to do much of this at edit-time, storing the tokenized representation in memory instead of the raw text.

In the above example, the tokenized version could be reduced to a byte indicating "load accumulator immediate," plus the binary value 19 (stored as a byte, not as two ASCII characters), and then a token indicating the rest of the line was a comment and could be ignored at assembly time. When the user viewed the source code, it had to be converted from the tokenized form back into text. This had the side-effect of enforcing a single standard for indentation style, whether or not there was a space after the comment semicolon, and so on.

When my Atari 8-bit days ended, and I moved to newer systems, I noticed two definite paths in assembler design. There were the traditional, lumbering assemblers that ran as standalone applications, which almost always required a final linking step. These were usually slow and awkward, seemingly designed as back-ends to high-level language compilers, not meant to be used directly by programmers. And then there were the lightning-fast assemblers, often integrated with editors and debuggers in the tradition of the Assembler Editor and MAC/65. For dedicated assembly programmers during the Amiga and Atari ST years, those were clearly the way to go.

By that time, except when there was no alternative, I was using compilers for higher-level languages. And I was wondering if the "slow, lumbering" and "lightning fast" split applied to those development systems as well.

Part 2

permalink August 2, 2009

previously