A Tiny Lisp for AltOS

I took a bit of a diversion over the last week or so when I wondered how small a lisp interpreter I could write, and whether I could fit that into one of the processors that AltOS runs on. It turns out, you can write a tiny lisp interpreter that fits in about 25kB of ram with a 3kB heap for dynamic data.

I decided to target our ChaosKey boards; they're tiny, and I've got a lot of them. That processor offers 28kB of usable flash space (after the 4kB boot loader) and 6kB of ram with the processor running at a steaming 48MHz.

I'm not at all sure this is useful, but I always enjoy doing language implementations, and this one presented some 'interesting' challenges:

Limited RAM. I don't have space to do a classic stop/copy collector.

Limited stack. A simple lisp implementation uses the C stack for all recursion in execution and memory collection. I don't have enough ram for that.

Iterative Compacting Allocator

I'm betting someone has built one of these before, but I couldn't find one, so I wrote my own.

The basic strategy is to walk the heap to find a subset of the active objects which are allocated sequentially in memory with only unused storage between them. These objects are then compacted in-place, and then the heap is walked again to update all references to the moved objects. Then, the process is restarted to find another subset and move them.

By looking for these subsets starting at the bottom of the heap, and working upwards towards the top, the whole heap can be compacted into a contiguous chunk at the bottom of memory.

Allocation involves moving a pointer along at the top of active memory; when it gets to the top of the heap, collect and see if there's space now.

As always, the hardest part was to make sure all active memory was tied down. The second hardest part was to make sure that all active pointers were updated after any allocation, in case a collect moved the underlying object. That was just bookkeeping, but did consume much of the development time.

One additional trick was to terminate the recursion during heap walking by flagging active cons cell locations in a global bitmap and then walking that separately, iterating until that bitmap is empty. Nested lambdas form another recursion which should probably get the same approach, but I haven't done that yet.

An unexpected "benefit" of the tiny heap is that the collector gets called a lot, so any referencing bugs will have a good chance of being uncovered in even a short program execution.

ROM-able Lisp

Instead of implementing all of the language in C, I wanted to be able to implement various pieces in Lisp itself. Because of the complex nature of the evaluation process, adding things like 'let' or even 'defun' turn out to be dramatically simpler in Lisp. However, I didn't want to consume bunches of precious RAM to hold these basic functions.

What I did was to create two heaps, one in ROM and the other in RAM. References are be tagged as to which heap they're in.

16-bit Values

Lisp programs use a pile of references. Using a full 32 bits for each one would mean having a lot less effective storage. So, instead, I use an offset from the base of the heap. The top bit of the offset is used to distinguish between the ROM heap and the RAM heap.

I needed a place to store type information, so I settled on using the bottom two bits of the references. This allows for four direct type values. One of these values is used to indicate an indirect type, where the type is stored in the first byte of the object. The direct types are:

Value Type 0 Cons cell 1 14-bit int 2 String 3 Other

With 2 tag bits, the allocator needs to work in 32-bit units as the references couldn't point to individual bytes. Finally, I wanted 0 to be nil, so I add four to the offsets within the heaps.

The result is that the ROM and RAM heaps can each cover up to 32k - 4 bytes.

Note that ints are not stored in the heap; instead they are immediate values stored in 14 bits, providing a range of -8192 to 8191. One can imagine wanting more range in ints at some point.

Heap-based Evaluator

A simple lisp implementation uses the fact that eval is re-entrant and do the operation on the C stack:

val eval(val exprs) { val vals; while (exprs) { vals = append(vals, eval(car(exprs))); exprs = exprs->cdr; } return execute (car(vals), cdr(vals)); }

This makes things really simple and provides for a clean framework for implementing various bits of lisp, including control flow and macros. However, it rapidly consumes all of the available memory for a stack, while also requiring separate bookkeeping for the in-use memory in each frame.

I replaced this design with one which keeps the lisp stack on the heap, and then performs eval with a state machine with all state stored in global variables so that the memory manager can reference them directly.

Each eval operation is performed in a separate 'stack' context, which holds the entire eval state except for the current value, which lives in a separate global variable and is used to pass values out of one stack frame and into another. When the last stack context is finished, the evaluation terminates and the value is returned to the caller.

There are nine states in the state machine, each of which is implemented in a separate function, making the state machine a simple matter of pulling the current state from the top of the stack and invoking the associated function:

while (ao_lisp_stack) { if (!(*evals[ao_lisp_stack->state])() || ao_lisp_exception) { ao_lisp_stack_clear(); return AO_LISP_NIL; } } return ao_lisp_v;

Because there's no C recursion involved, catching exceptions is a simple matter of one test at this level.

Primitives like progn, while, cond and eval all take special magic in the state machine to handle; getting all of that working took several attempts before I found the simple loop shown above.

Lexical Scoping

The last time I did a lisp interpreter, I implemented dynamic scoping. Atoms were all global and had values associated directly with them. Evaluating a lambda started by saving all of the existing global values for the parameter atoms and then binding the new values. When finished, the previous values would be restored. This is almost correct, but provides surprising results for things like:

> (setq baz 1) > (def foo (lambda (bar) (+ baz bar))) > (def bletch (lambda (baz) (foo baz))) > (bletch 2) 4

The value that foo gets for 'baz' is 2 instead of 1 under dynamic scoping, which most people find surprising. This time, I was determined to use lexical scoping, and it turned out to be surprisingly easy.

The first trick was to separate the atoms from their 'value'; each atom can have a different value in different lexical scopes. So, each lexical scope gets a 'frame' object, those contain the value for each atom defined in that scope. There's a global scope which holds all of the globally defined values (like baz, foo and bletch above). Each frame points to its enclosing scope, so you can search upwards to find the right value.

The second trick was to realize that the lexical scope of a lambda is the scope in which the lambda itself is evaluated, and that the evaluation of a lambda expression results in a 'function' object, which contains the lambda and its enclosing scope:

> (def foo (lambda (bar bletch) ((lambda (baz) (+ baz bar)) bletch))) > (foo 2 3) 5

In this case, the inner lambda in foo can 'see' the value of bar from the enclosing lambda. More subtly, even if the inner lambda were executed multiple times, it would see the same baz, and could even change it. This can be used to implement all kinds of craziness, including generators:

> (defun make-inc (add) ((lambda (base) (lambda () (progn (setq base (+ base add)) base))) 0) ) > (setq plus2 (make-inc 2)) > (plus2) 2 > (plus2) 4

The current implementation of each frame is a simple array of atom/value pairs, with a reference to the parent frame to form the full scope. There are dramatically faster implementations of this same concept, but the goal here was small and simple.

A Tiny Allocator Optimization

With eval consuming heap space for stacks, frames and argument lists, the interpreter was spending a lot of time in the collector. As a simple optimization, I added some free lists for stack frames and cons cells.

Stack frames are never referenced when they're finished, so they can always go on the free list. Cons cells used to construct argument lists for functions are usually free.

Builtin functions have a bit which indicates whether they might hold on to a reference to the argument list. Interpreted lambdas can't get the list while nlambdas, lexprs and macros do.

Each lambda execution creates a new frame, and while it would be possible to discover if that frame 'escapes' the lambda, I decided to not attempt to cache free ones yet.

Save and Restore

To make the lisp interpreter more useful in tiny computers, I added the ability to save and restore the entire heap to flash. This requires leaving enough space in the flash to preserve the heap, further constraining the amount of flash available for the application.

Code

All of this code is in the 'lisp' branch of my AltOS repository:

AltOS

The lisp interpreter is independent from the rest of AltOS and could be re-purposed for another embedded operating system. It runs fine on ChaosKey hardware, and also on the STM32F042 Nucleo-32 board

There's also a test framework which runs on Linux, and is how I developed almost all of the code. That's in the src/test directory in the above repository, and is called 'ao_lisp_test'.

Towers of Hanoi

Here's an implementation of the classic recursive Towers of Hanoi game; it shows most of the current features of the language.