29 May 2017

For a decent while now, I’ve been working on self-hosting the Epoch 64-bit compiler. This involves getting the compiler to a point where it is robust enough to actually compile itself. In order to do this, I’m using a modified 32-bit compiler which generates 64-bit binaries. Once a working 64-bit compiler is emitted, I can feed that compiler back into itself, thus completing the head-trip ouroboros that is self-hosting or "bootstrapping" a compiler.

At the moment, the compiler can successfully lex, parse, type-check, and partially code-gen itself. In practical terms, this means that the front-end of the compiler is working fine, but the back-end - the set of systems responsible for turning code into machine language and emitting a working executable - remains incomplete. For a slightly different perspective, I’m generating LLVM IR for most of the compiler at this point.

The bits that are left are corner cases in the code generation engine. There are things like intrinsic functions that need to be wired up, special semantics to implement, and so on. In particular, right now, I’m working on solving a corner case with the nothing concept. nothing is an Epoch idiom for expressing the idea that there is no data; except, unlike traditional null , nothing is its own type. If something has a type it cannot be nothing - again, unlike null . The usefulness of this may seem questionable, but the distinction makes it possible to avoid entire classes of runtime bugs, because you can never "forget" to write code that handles nothing - the compiler enforces this for you!

Anyways, the trick with nothing is that you can pass a literal nothing to a function as an argument, to signify that you have no semantically valid data to pass in. This is handled correctly by the parser and type checker, but falls down in code generation because we can’t actually omit the parameter from the function call.

What happens is the code generator creates a function with, say, 3 parameters. If the second parameter is nothing at a call site, we have to still pass something over to the function, from LLVM’s perspective. So we generate a dummy parameter that essentially translates the nothing semantics into null semantics - something LLVM can recognize.

Now things get complicated.

If we have an algebraic sum type that includes the type nothing , and we pass a sum-typed variable into a function which expects concrete types, the code goes through a process called type dispatching. This process basically matches an overload of a function with the runtime types of the arguments passed in. Think of it like virtual dispatch with no objects involved. (Strictly speaking, type dispatch in Epoch is multiple dispatch rather than the single dispatch seen in more popular languages.)

To facilitate all this, the compiler inserts annotations into the code, so that it can deduce what set of overloads to choose from when the runtime dispatcher is invoked. Some of these annotations survive at runtime - analogs of virtual-table pointers in C++.

Annotations are passed as hidden parameters on the stack when invoking a function. And at last we reach the real wrinkle: a nothing annotation can come from two distinct places: either the construction of a sum-typed variable which allows nothing as a base type, or a literal nothing passed to a function call.

The headache is that, to LLVM, both uses look like a function call. There is special case logic that exists to fix up the annotations for sum-typed constructors. Unfortunately, that logic collides with the logic needed to fix up annotations for general function call usage because LLVM doesn’t know the difference.

It’s an imminently solvable problem, but it’s a headache. Hopefully once this bug is gone there won’t be too many more to swat before I can start code-generating working 64-bit compilers.