Let's Run Lisp on a Microcontroller

I was inexplicably amazed by Lisp since the first day I've seen how Lisp programs look. ( define ( factorial n ) ( if ( = n 1 ) 1 ( * n ( factorial ( - n 1 ) ) ) ) ) My feelings still stay, although that's probably because I never had a chance to use Lisp for real, so I don't know enough to hate it. Anyway, occasionally I keep thinking about some weird stuff I could try Lisp for. So, the idea to run Lisp on an MCU came to me a long time ago, when I was working at my previous job in the company who develops stuff with MCUs. However, that would require implementing everything in C, and that's just not trivial enough to do merely for fun. So I didn't. Time has passed, now I'm happy to work at Cesanta, where we develop Mongoose IoT platform. A part of that platform is a JavaScript engine v7, whose primary targets are microcontrollers. And recently I realized that my old weird idea to run Lisp on an MCU became slightly more real: all the heavy lifting, such as garbage collection, strings handling, etc, is already done, and all we need is to implement Lisp in JavaScript! And run it on an MCU. What a wonderfully silly weekend project!

Caution

This is not going to be practical, at all. Instead, this is absolutely insane stuff, which however I find fun to implement (and that's the only reason I've done that). For reference, here are a few links to (presumably) much more practical related solutions, which however I didn't use for real yet: OS X, Windows, etc; Chicken Scheme : a practical and portable Scheme → C compiler. Generated C code runs on Linux,X, Windows, etc;

ESP-Lisp : a small fast lisp interpeter for a ESP8266 as alternative to lua on the nodemcu;

uLisp : a version of the Lisp programming language designed for the ATmega-based Arduino boards;

Tiny Lisp Computer : this article describes a self-contained computer with its own display and keyboard, based on an ATmega328, that you can program in Lisp;

Ferret : a hard real-time Clojure for Lisp machines;

XS : Lisp on Lego MindStorms;

TinyScheme : a lightweight Scheme interpreter that implements as large a subset of R5RS as was possible without getting very large and complicated;

WISP : a homoiconic JavaScript dialect with Clojure syntax, s-expressions and macros (project is abadoned though);

microscheme : a functional programming language for the Arduino. Thanks to all of you who provided those links in comments! If you're aware of some other related project which deserves to be mentioned here, please let me know.

Initial setup

UPD: Since the time of writing it, things have changed somewhat: JavaScript is not enabled by default in the firmware I'm going to talk about below, and there are no wizards to walk you through the setup process: instead, there is a command line tool ''miot'' which is a swiss army knife of the Mongoose IoT platform. I didn't yet update this section since it's not the primary focus of the article; however you might want to just skim it through and go to Picking Scheme implementation. We don't actually have to install any heavy build environment or things like that: we can edit our code, build it and deploy to the devices from a web-based IDE. We just need to preflash our device(s) once with the Mongoose firmware, and register them at the cloud; the process looks pretty much like Next → Next → Next → Finish. Mongoose IoT supports two hardware platforms so far: ESP8266 and Texas Instruments CC3200. I'll use ESP8266 in this text; namely, the NodeMCU board. So first of all, let's download the latest release of Mongoose Flashing Tool (MFT). Now, connect your NodeMCU to the computer, and run MFT; you'll see something like that: Now, select a port to which your NodeMCU is attached (at the screenshot above, it's ttyUSB0 ), click Next. The wizard will ask which firmware you want to flash; it usually makes sense to use Development Snapshot. The firmware will be downloaded, flashed, and the wizard will communicate with your newly flashed device. The wizard will walk you through the process of connecting to wi-fi and registering the device at the cloud; in the end, you'll click “Add your device to the cloud”; it will your browser to be popped up with the cloud page opened, you'll need to login, and finally your device will be added to your account. Cool! Now, it's time to create some project. You can name it hello or whatever. After it's added, click on it, and you'll see the IDE: Now, we can put some JavaScript and flash the device: let's edit the file app.js , which is executed at boot. Initially, it looks like: // Device logic goes here console. log ( 'Hello from JS! Running ' + Sys. ro_vars . fw_version + ' firmware' ) ; So if you want you can put something else here, and then, make sure you have the correct device selected (well, probably you have just one device), and hit “Run” button. You'll see the build log, then deployment log, and finally, your JavaScript code comes into play. Default code will obviously cause this to appear: [ 23:10:57 ] Hello from JS! Running head firmware Yay! It works. So, now we can add some new files to the filesystem , and evaluate them from app.js by adding there File.eval(“myfile.js”); . Yeah, no require() just yet.

JavaScript REPL

Before we actually move to Lisp, just a final note on how to get into the JavaScript REPL, which might come in handy: you just need to run MFT (Mongoose Flashing Tool, which you installed recently) with the --advanced key, and you'll see a picture like this: You don't need to flash anything, just pick the correct port and hit “Connect”. You'll see the prompt: --- connected undefined [31148/530] $ Alternatively, you can use picocom instead of MFT : $ picocom /dev/ttyUSB0 -b 115200 --imap lfcrlf --omap crcrlf In the input field below, you can enter arbitrary JS, e.g. for (var i = 0; i < 10; i++){ console.log(i); } , or whatever. And since we're working in a highly memory-constrained environment, the prompt always contains how much memory is available. In the example above, we have 31148 bytes of free system heap memory, and 530 of “JS heap”: memory which was already allocated from the system heap, but can still be used for JS objects and properties. You can get the same values from your scripts by evaluating: GC.stat().sysfree and GC.stat().jsfree . By default, only stdout goes to the console; stderr goes to another UART. If you don't have a separate UART-to-USB adapter attached, it would be useful to have stderr on the console as well; for that, you can add the following in the app.js : if ( Sys. conf . debug . stderr_uart != 0 ) { Sys. conf . debug . stderr_uart = 0 ; Sys. conf . save ( ) ; } The call Sys.conf.save() will actually reboot the device, so make sure you don't call it unconditionally, otherwise you'll end up with the bootloop. Okay, really enough about JavaScript, let's move to Lisp!

Picking Scheme implementation

There are many Lisp dialects, most popular being Common Lisp and Scheme, but for this particular use case, I'd definitely prefer Scheme: unlike CL, it's elegant and small. Its (quite old) specification is just 50 pages, unlike 1000+ pages of Common Lisp. And, well, I'm not really happy about my programs being full of defun , it's just too much like “defunction” (Okay, it's a minor one). Initially I was hoping to just get some existing Scheme implementation, and burn it into the MCU. There is a large list of existing implementations, but it turns out that it's actually hard to get what I need: a lightweight and simple, probably not complete, but correct implementation, without heavy dependencies. Tail-call optimization isn't a strict requirement, but a very desirable feature: MCU doesn't have lots of RAM available. So I've checked out some lightweight ones from the list above, one of them being JSLisp by Joe Ganley. It is clearly lightweight and it is even able to evaluate some of my expressions, but it's dynamically scoped, unlike Scheme or Common Lisp, which are statically (or lexically) scoped. I really don't like the idea of dynamic scoping, since it makes programs harder to reason about; and honestly I'm not aware of any dynamically-scoped language in more or less wide use today. Next, I picked GoldenScheme, which is just 8KB. Even though the source code is indeed small, it takes too much RAM: e.g. for each symbol, it creates an object: { type : "symb" , name : name , parent : parent } Well yeah, there's no surprise: who on earth would care about the memory consumption of a JavaScript program? But I have to. Each property is a structure which contains a name, a value, and some attributes ( enumerable , writable , configurable , to name a few), plus the link to the next property. All in all, in v7, each property takes minimum 24 bytes (for curious minds, here's the structure definition). Plus, if name or value isn't a primitive (or is a primitive string with the length more than 5 bytes), additional memory is obviously needed. We just can't spent that much for each single symbol. Of course I tried it anyway, and defining just the simple function (define (fact x) (if (< x 2) x (* (fact (- x 1)) x))) caused it to consume almost 5KB of RAM! I believe these 5KB are not just symbols, but I didn't bother to figure out what exactly this memory is used for. Given that there's just about 30KB of free RAM in total, it's not quite an option. I've peeked at a couple of other implementations, including the bulky ones, but none satisfied me. They are either too bulky, or have some dependencies such as jQuery, etc. So I decided to come up with my own simple implementation, which will at least have what I need.

Meet DFScheme

Well, at this point I realized that all of it will not fit into a single weekend, but that was too late to give it up. And anyway, why not have more fun. You can find sources (library and tests) on GitHub: https://github.com/dimonomid/dfscheme. So far, DFScheme is a very basic Scheme implementation, which however already supports tail-call optimization, for both direct and indirect tail calls to the same function. We'll talk about the implementation a bit later.

Scope

The Scheme scope is implemented on top of JavaScript objects: when we add some new scope, we just create a new object with a prototype set to the existing scope. When we set new item on scope, we have to check the whole prototype chain of the scope manually (because if we don't do that, JavaScript will always define the property on the top scope, hiding the existing property, instead of modifying it). Getting the value from scope, on the other hand, comes for free: JavaScript does all the job for us.

Representing data types

As mentioned above, one of the main concerns was the low footprint, so of course implementation is tailored to consume as little memory as possible.

Symbols, numbers, booleans, strings

For simplicity, all of these are implemented on top of JavaScript strings. Keep in mind that, in v7, strings with length less than or equal to 5 bytes occupy the same minimal amount of space as e.g. a number or a boolean value: 8 bytes. Rules are simple: If a string is either #t or #f , it's a boolean value;

If a string starts and ends with a quote ( “ ), it's a string value;

If a string can be converted to a number successfully, it's a number value;

Otherwise, it's a symbol.

Null

Null value is a JavaScript's undefined . Even though there is a null value in JavaScript, here I intentionally reverted their roles, because this allows us to save a bit of memory: we can leave some property out, and it will be considered undefined when we try to get it. The reason will be more clear once we consider the next data type, cons cells:

Cons Cells (Pairs)

Initially I tried to come up with some way to fake lists. I considered using arrays instead of implementing lists properly on top of cons cells. That would definitely consume less memory: each list item then would take just one property, whereas each cons cell should be a separate object with two properties: car and cdr . It's more than twice as much. However, eventually I had to give up on this: even though I can fake most of the behaviour, I can't fake everything. Consider: ( define x ' ( 1 2 3 ) ) ( eq? ( cdr x ) ( cdr x ) ) Obviously the result should be #t , because (cdr x) in both cases would refer to exactly the same cons cell (2 . (3 . ()) . But if lists are implemented on top of arrays, then cdr would actually have to create a copy of array, so, the eq? predicate would return false. And, of course, useless copying might be a very expensive operation. So, I had to implement cons cells as objects with two properties: car and cdr . As you know, in Lisp, a properly formed list ends with a pair whose cdr is null (i.e. () ). Since nulls are implemented as JS undefined , we can just not define cdr , and the () will be automatically assumed. Let's save at least a bit of those lovely bytes.

Lambdas

There's not much we can do about making lambdas consume less memory, so it's just an object with args and exprs properties, which are lists, the scope property, which obviously is the lambda's scope.

Tail-call optimization

Tail-call optimization is implemented in an easy way: there is a stack of call frames, each one being an object with the following properties: func : reference to the lambda object or JS function which is being executed in this call frame;

tail : if set to true, it means we're evaluating the latest expression of the function. So, when some function is going to be called, we start checking all call frames from top to bottom. There are three possible cases for each frame: tail is not set to true. It means that the call can't be optimized, so we call function as usual;

tail is set to true, and func is not equal to the function we're going to call. In this case, go to the next call frame;

tail is set to true, and func is equal to the function we're going to call. We found the call to eliminate! When we've got the call to eliminate, we throw the object containing the index of the call frame we should go into, and the array with function arguments. And, obviously, processing of the function evaluation is wrapped into try-catch block: if we caught the tail-call elimination object, we check if the call frame index is equal to the current index. If it's not, then we just rethrow the object further. And if it is, then we drop all extra call frames from the stack, replace the function arguments with the ones from the thrown object, and re-evaluate the same function again. This easy approach allows us to eliminate both direct and indirect tail calls.

Let's try it!

So Scheme interpreter works, tests pass, and we can finally put in on an MCU. In our project in the IDE, let's add a new file DFScheme.js , and paste the whole contents of the DFScheme.js there. Hit flash, wait until build and deploy process is done, and then we can run ./MFT --advanced and play with JS REPL. First of all, let's try to evaluate the Scheme interpreter library: [ 32460 / 3068 ] $ File. eval ( "DFScheme.js" ) ; undefined [ 32068 / 2968 ] $ var lisp = new DFScheme ( ) ; undefined [ 24736 / 268 ] $ Ok cool, at least it didn't complain; now, let's finally evaluate some Lisp on our MCU! [ 23672 / 532 ] $ lisp. exec ( "(+ 1 2)" ) "3" Ah! It actually works! However, it took about a second to calculate that. Well, not particularly fast. And what about calculating some factorial? [ 23456 / 532 ] $ lisp. exec ( "(define (fact x) (if (< x 2) x (* (fact (- x 1)) x)))" ) ; "fact" [ 20516 / 2092 ] $ lisp. exec ( "(fact 4)" ) Evaluating the expression (fact 4) causes it to wait for several seconds, and then, … system_restart_local WDT reset, info: exccause=4 epc1=0x40250531 epc2=0x00000000 epc3=0x00000000 vaddr=0x40298140 depc=0x00000000 Dumping core --- BEGIN CORE DUMP --- {"arch": "ESP8266", "cause": 100, "REGS": {"addr": 1073653492, "data": "mwQlQFD6/z8KAAAA6Ab/PwoOAABAdv8/6Q0AAOsNAAAOAAAAIxcAAFyr/j8AgP//7QclQFyr/j84AAAAEs8kQDEFJUAgAAAAAAAAAAAAAAAAAAAAMAAAAA=="} , "DRAM": {"addr": 1073643520, "data": "AAAAAAAAAAAAAAAAAQEBAQABAAABAAAAcBkAACmV9Lp4AAAAAAAAAKEAAABspv4/F3wpQAEAAAB4hilAU9QpQFjUKUBz1ClAAAAAAAAAAAAAAAAANAgAYAAAAAABAAAAGAgAYAMAAAACAAAAOAgAYAAAAAADAAAA FAgAYAMAAAAEAAAAPAgAYAAAAAAFAAAAQAgAYAAAAAAGAAAAAAAAAAAAAAAHAAAAAAAAAAAAAAAIAAAAAAAAAAAAAAAJAAAAKAgAYAMAAAAKAAAALAgAYAMAAAALAAAAAAAAAAAAAAAMAAAABAgAYAMAAAANAAAA ...... It takes too long, and WDT (watchdog timer) resets the application (by the way, yes, we've implemented a GDB server stub for ESP, so that we can save core dumps, examine stack traces, etc). Yeah, we're clearly using Mongoose IoT for something it was not designed for, but… Oh well. Let's turn off the watchdog timer: Sys. conf . sys . wdt_timeout = 0 ; Sys. conf . save ( ) ; By the way, after the reboot we have to evaluate DFScheme.js again, because it's gone. We don't want to type it every time, so, let's just add it to our app.js . Additionally, let's create a file my.scm which will contain our Lisp code. All in all, app.js and my.scm files look as follows: app.js 'use strict' ; // redirect stderr to UART0 if ( Sys. conf . debug . stderr_uart != 0 ) { Sys. conf . debug . stderr_uart = 0 ; Sys. conf . save ( ) ; } // turn off WDT if ( Sys. conf . sys . wdt_timeout != 0 ) { Sys. conf . sys . wdt_timeout = 0 ; Sys. conf . save ( ) ; } // init DFScheme instance File. eval ( "DFScheme.js" ) ; lisp = new DFScheme ( ) ; lisp. exec ( File. read ( "my.scm" ) ) ; my.scm ( define fact ( lambda ( x ) ( if ( < x 2 ) x ( * ( fact ( - x 1 ) ) x ) ) ) ) Hit “run” in the IDE, and after deployment is done, we can try calculating factorial again: [ 21664 / 1932 ] $ lisp. exec ( "(fact 4)" ) "24" It worked this time.

HTTP endpoint

It's kinda annoying that we have to open the tty connection and type all this JavaScript every time: lisp.eval(”…“); . Wouldn't it be better if we could use curl instead? Mongoose Firmware supports a subset of the Node HTTP API ; here, we're going to create a server which responds to the URI /lisp , and listen at port 8080: var server = Http. createServer ( function ( req , res ) { print ( JSON. stringify ( req ) ) ; if ( req. url == '/lisp' ) { var val ; try { val = lisp. exec ( req. body ) res. writeHead ( 200 , { 'Content-Type' : 'text/plain' } ) ; } catch ( e ) { val = e. toString ( ) ; res. writeHead ( 400 , { 'Content-Type' : 'text/plain' } ) ; } res. write ( val ) ; res. end ( '

' ) ; } else { res. end ( 'Not sure what do you mean, try /lisp

' ) ; } } ) ; server. listen ( '8080' ) ; Now, knowing the IP of our NodeMCU (it is printed to the console when the device boots), we can issue a curl request as follows: $ curl '10.42.0.50:8080/lisp' -d "(* 123 45)" 5535 $ curl '10.42.0.50:8080/lisp' -d "(fact 4)" 24 It's relatively fast to evaluate (* 123 45) , but it took about 10 secods to calculate (fact 4) . Yeah, not at all fast… And it's particularly sad to sit in front of the NodeMCU, waiting for the result, and not having any feedback during this long time. Let's make it at least cheer us up a bit by blinking LEDs during the evaluation.

LED blinking

Disclaimer: I'm a software engineer, and I work hard to be good at it, but I know pretty much nothing about the hardware. So, when it comes to the hardware, I suck absolutely and completely. It just happened so that I work in hardware-related fields, but actually there are other people in the team who work with the hardware; and I'm just doing software parts (and I'm trying to do that well). Of course I managed to get LED blinking by attaching it to the MCU's pins in a naive way, but not much more than that. Sorry if I attached LEDs in a wrong way. :) I've added a few callbacks to the lisp interpreter: at least, we want to get notified when it starts and finishes executing the whole script given to lisp.exec() : cbExec ;

cbExecDone . Plus, a callback which is called every time some expression gets evaluated: cbEval . And I attached a couple of LEDs to my NodeMCU, to GPIO4 and GPIO5: Here are the callbacks: var gpion = - 1 ; function cbOn ( ) { print ( GC. stat ( ) . sysfree ) ; if ( gpion >= 0 ) { GPIO. write ( gpion , false ) ; } switch ( gpion ) { case 4 : gpion = 5 ; break ; case 5 : default : gpion = 4 ; break ; } GPIO. write ( gpion , true ) ; } function cbOff ( ) { if ( gpion >= 0 ) { GPIO. write ( gpion , false ) ; } gpion = - 1 ; } And now, we should create the instance of the interpreter as follows: lisp = new DFScheme ( { cbExec : cbOn , cbEval : cbOn , cbExecDone : cbOff , } ) ; And here's how it looks! There's also a bit longer video at YouTube. Watching how it blinks during calculation before providing a result gives me that strange “old good” feeling that the machine is thinking. Although I'm afraid that real hardware Lisp machines, even the very old ones, were faster than that. Strangely enough, I failed to find, really, how fast they were. Maybe you know?

Exploiting tail-call

Apart from being very slow, our “Lisp machine” has extremely low memory available. And given the current implementation of fact lambda, it runs out of memory (and crashes) even if we try to calculate (fact 10) , because there is a deferred operation (namely, multiplication), operands for which need to be stored in stack at each step. So, let's reimplement it so that the tail calls can be eliminated. We need to introduce additional inner lamda iter for that: my.scm ( define ( fact n ) ( define ( iter product counter ) ( if ( < n counter ) product ( iter ( * counter product ) ( + counter 1 ) ) ) ) ( iter 1 1 ) ) We can now verify that the tail-call optimization works: it is now able to calculate even (fact 30) ! Although… $ time curl '10.42.0.50:8080/lisp' -d "(fact 30)" 26525285979 curl '10.42.0.50:8080/lisp' -d "(fact 30)" 0.01s user 0.01s system 0% cpu 1:24.89 total Yes, it took 1 min 25 seconds.

Conclusion