The design and implementation of programming languages fascinates me. A programming language enables a text file to both accurately communicate to another person a problem’s solution and control a computer’s execution. The influence a language’s feature set has on the way a programmer thinks about problems and their solutions is an endlessly captivating idea. The world of programming languages is amazing.

I’m particularly interested in implementing higher-level languages in lower-level languages. I think I was attracted to Math in school because the construction of new ideas based on a small set of axioms seems like good, clear thinking. Building up to a high-level programming language from below has the same appeal for me.

This article may be the first of many recording the scratching of an itch that apparently needs scratching. I cannot seem to ignore it even if I try. A book on Scheme or implementing interpreters or compilers always seems to find its way into my hands. I want to implement Scheme from Scratch: an interpreter, a compiler, a JIT, etc. I want to see how it all happens. If you do too then you may enjoy following along.

Prerequisites

This exploration is not for novice programmers. It is a tutorial introduction for novice language implementor and enthusiasts from the perspective of a novice language implementor and enthusiast.

You have experience programming on a UNIX or UNIX-like system. Any of Linux, BSD, or Mac OS X is fine. You have used common developer tools like gcc and make . Your copy of K&R 2e is a little dog eared but cherished. You might even have a copy of APUE 2e.

You have some prior exposure to Intel 386 Assembly language. You don’t need to be an Assembly master. Just some general knowledge about registers, the stack and heap, and some basic instructions. Jeff Duntemann’s Assembly Language Step By Step, now in its third edition, is a light and enjoyable introduction providing a sufficient level of Assembly knowledge.

If someone even mentions SICP, your head fills with thoughts of master magicians casting alluring spells. SICP 2e has spent a lot of time on your bedside table and you’ve done many of the exercises when you really should have been doing other things (like I should be doing now.) SICP showed you the implementation of Scheme in Scheme. This chicken and egg paradox bothers you and you wonder “How do I implement Scheme from Scratch?”

“Scheme”

There are many versions of “Scheme”. The Original Lambda Papers show various ancestors. I’ve seen threads mentioned in those papers. SICP implements several versions of Scheme. The main language used in SICP doesn’t have macros. SICP shows the implementation of a lazy Scheme. There is the official IEEE Scheme standard and the de facto RnRS standards: R4RS, R5RS, R6RS.

It isn’t totally clear what makes a Scheme “Scheme”. I’m not going to worry too much about exactly what a language must have to be enough of a Scheme to earn the name. I know it needs at least s-expressions; garbage collection; symbols; first-class, lexically scoped procedures; and proper tail calls.

“from Scratch”

What does it mean to implement “from Scratch”? Upon what foundation will we build Scheme? Do we have a working physical computer, an operating system, an assembler, a linker, a C compiler, a Scheme interpreter or compiler, another high-level language?

At least to me, “from Scratch” implies building up towards Scheme from some lower-level. If a system already has another high-level language then, philosophically, why not just write programs in that language? We could, after all. So implementing Scheme with the assistance of another high-level language is out.

Does the system already have Scheme? No. If it did, we’d use it and you wouldn’t need to implement Scheme. (Or if the system does have Scheme, at least we’d pretend it didn’t so we could enjoy implementing it ourselves.)

I will assume we have working computer with a 32 bit Intel 386-style processor, UNIX-like operating system with an assembler and linker. These are all very interesting low-level details but building up towards a Scheme system from a level where one of these is missing is out of scope for this discussion.

Perhaps the grey area is whether or not the system has a C compiler or not and if we want to use that to build up to Scheme. All UNIX-like systems have a C compiler. The operating system and tools are probably written mostly in C. Many high-level languages, like Scheme, are implemented in C. Working in C is definitely working at a lower-level then working in Scheme. A testament to the low-level of C is that C is often referred to as “portable Assembly”. Given these things we can take a C compiler, gcc , as one of our available tools.

Scheme from Scratch Implementations

There are many possible ways to implement Scheme from Scratch:

Bootstrap Scheme A quick and dirty Scheme interpreter implemented in C. This should be less than 1000 lines to implement. Not many features and slow execution but enough of an interpreter to bootstrap us into the world of Scheme. Compiling Scheme A Scheme to i386 Assembly compiler written in Scheme. First executed with Bootstrap Scheme but later self compiling. See An Incremental Approach to Compiler Construction. Compile-to-C Scheme A Scheme to C compiler written in Scheme. Provides great portability because C is the intermediate language on the way to machine code. Byte Code VM Scheme An embeddable Scheme virtual machine based on a byte-code compiler and interpreter. Written in ANSI C and very portable. See Chibi Scheme, for example. JIT Engine Scheme An embeddable Scheme engine based on a just-in-time compiler for i386 and with sophisticated garbage collection. I’m not sure if this will be written in C or Compiling Scheme. Platform Scheme Compile Scheme to target a platform like JVM, .NET’s CLR, Parrot, or LLVM. By targeting an existing platform, a lot of education and fun would be missed. It certainly wouldn’t count as “from Scratch”. It is necessary to write the target platform also.

You can see that it is a big itch. I don’t know how long it will take to scratch or what will come of it all but will surely be fun implementing Scheme from Scratch.

Next article: Integers