It can be difficult to disassociate the idea that dynamically typed programming languages are tied to byte-code interpreters (e.g. YARV Ruby, CPython, V8, Zend Engine, etc.). But for many languages, a compiled implementation also exists. Cython, Chicken Scheme and SBCL are good examples.

In this post I will briefly describe how I built a compiler for my Scheme implementation using artifacts from the interpreter. In doing this, I learned a simple (not novel) technique for compiling dynamic languages. I'll introduce the Javascript to C++/V8 compiler I am developing using this technique.

BSDScheme

For the past year I've developed a Scheme implementation, BSDScheme. I started with an AST-interpreter (as opposed to a byte-code compiler and VM). A more detailed blog post on the first few steps writing BSDScheme can be found here.

As I built up support for the various objects and operations in the language, I had a sizeable base of D code for the BSDScheme runtime. This included an object representation for primitive types (and support for converting to and from types in D) as well as basic Scheme operations ( + , - , car , cdr , etc.).

When the time came to implement a compiler backend, I only needed to do codegen since the parser already existed. Furthermore, the fundamental bits had already been written: object representation and much of the standard library. So I wrote the simplest compiler I could think of by targeting D and the objects/functions I had already written to support the interpreter.

Take, for example, the equals function in the standard library:

Value equals(Value arguments, void** rest) { auto tuple = valueToList(arguments); auto left = tuple[0]; auto right = car(tuple[1]); bool b; switch (tagOfValue(left)) { case ValueTag.Integer: b = valueIsInteger(right) && valueToInteger(left) == valueToInteger(right); break; case ValueTag.Char: b = valueIsChar(right) && valueToChar(left) == valueToChar(right); break; case ValueTag.String: b = valueIsString(right) && valueToString(left) == valueToString(right); break; case ValueTag.Symbol: b = valueIsSymbol(right) && valueToSymbol(left) == valueToSymbol(right); break; case ValueTag.Function: b = valueIsFunction(right) && valueToFunction(left)[1] == valueToFunction(right)[1]; break; case ValueTag.Bool: b = valueIsBool(right) && valueToBool(left) == valueToBool(right); break; default: b = false; } return makeBoolValue(b); }

So long as my compiler generated code that used the Value object to represent Scheme data, I already had an equals function and large swaths of a Scheme standard library that I could share between the compiler and interpreter.

Ultimately I only needed to implement a few control structures to support compiling a large subset of what I supported in the interpreter. The key aspects here include: function definitions (in D), function calls (D function calls), if/else (if/else in D) and so on.

To give a concrete example of a whole program compiled, this Scheme program:

(define (exp base pow) (if (= pow 0) 1 (* base (exp base (- pow 1))))) (define (main) (display (exp 2 16)) (newline))

when run through the BSDScheme compiler would become:

import std.stdio; import lex; import common; import parse; import utility; import value; import buffer; Value exp(Value arguments, void** ctx) { Value[] tmp = listToVector(arguments); Value base = tmp[0]; Value pow = tmp[1]; Value equals_result = equals(vectorToList([pow, makeIntegerValue(0)]), null); Value if_result; if (truthy(equals_result)) { makeIntegerValue(1); if_result = makeIntegerValue(1); } else { Value minus_result = minus(vectorToList([pow, makeIntegerValue(1)]), null); Value exp_result = exp(vectorToList([base, minus_result]), null); Value times_result = times(vectorToList([base, exp_result]), null); if_result = times_result; }; return if_result; } Value BSDScheme_main(Value arguments, void** ctx) { Value exp_result = exp(vectorToList([makeIntegerValue(2), makeIntegerValue(16)]), null); Value display_result = display(vectorToList([exp_result]), null); Value newline_result = newline(vectorToList([]), null); return newline_result; } void main() { BSDScheme_main(nilValue, cast(void**)0); }

Where every imported function had already been written for the interpreter. I had only to translate a few lines to D and import/call these existing libraries. Now I had a small binary of compiled Scheme.

It was at this point I realized I was using the same technique used by Cython to compile Python code.

...the Cython project has approached this problem by means of a source code compiler that translates Python code to equivalent C code. This code is executed within the CPython runtime environment, but at the speed of compiled C and with the ability to call directly into C libraries. http://docs.cython.org/en/latest/src/quickstart/overview.html

jsc

I played with many PL-research-y languages over the years and wanted to do build something a little more practical. So I took what I learned writing the BSDScheme compiler and decided to write a Javascript compiler. Specifically, it would target the easiest backend I could imagine: C++ using the V8 C++ library and generating a Node addon.

There already existed well-trodden guides/means of writing Node addons in C++ so I spent some time trying to hand-compile simple Javascript programs to C++ and V8. A string in Javascript would become a v8::String type in C++. A number in Javascript would become v8::Number in C++ and so forth.

I decided to write this compiler in Rust given its roots in (and my familiarity with) ML and Python. I found a Javascript parser by Dave Herman and after a few lazy weeks finally got a "Hello world!" program compiling. Getting my first program to compile has by far been the hardest part of building jsc.

Let's look at a concrete example of a recursive fibonacci program (example/recursion.js in the repo):

function fib(i) { if (i <= 1) { return i; } return fib(i - 1) + fib(i - 2); } function main() { console.log(fib(20)); }

Let's add a call to main() at the end and time this with Node to get a baseline:

$ time node example/recursion.js 6765 node example/recursion.js 0.06s user 0.02s system 97% cpu 0.083 total

Now let's install jsc to compare. We'll need Rust, Cargo, Node and Node-GYP.

$ git clone https:/github.com/eatonphil/jsc $ cd jsc $ make && make install $ jsc example/recursion.js

jsc produces a Javascript entrypoint that imports our addon (build/recursion.js):

require("./build/Release/recursion").jsc_main();

And it produces a C++ file that represents the entire program (build/recursion.cc):

#include <string> #include <node.h> using v8::Boolean; using v8::Context; using v8::Exception; using v8::Function; using v8::FunctionTemplate; using v8::FunctionCallbackInfo; using v8::Isolate; using v8::Local; using v8::Null; using v8::Number; using v8::Object; using v8::String; using v8::False; using v8::True; using v8::Value; void fib(const FunctionCallbackInfo<Value>& args) { Isolate* isolate = args.GetIsolate(); Local<Value> i = args[0]; tail_recurse_1: Local<Context> ctx_2 = isolate->GetCurrentContext(); Local<Object> global_3 = ctx_2->Global(); Local<Function> Boolean_4 = Local<Function>::Cast(global_3->Get(String::NewFromUtf8(isolate, "Boolean"))); String::Utf8Value utf8value_tmp_5(i); std::string string_tmp_6(*utf8value_tmp_5); String::Utf8Value utf8value_tmp_7(Number::New(isolate, 1)); std::string string_tmp_8(*utf8value_tmp_7); Local<Value> argv_9[] = { (i->IsNumber() || Number::New(isolate, 1)->IsNumber()) ? Boolean::New(isolate, i->ToNumber(isolate)->Value() <= Number::New(isolate, 1)->ToNumber(isolate)->Value()) : ((i->IsString() || Number::New(isolate, 1)->IsString()) ? Boolean::New(isolate, string_tmp_6 <= string_tmp_8) : (False(isolate))) }; Local<Value> result_10 = Boolean_4->Call(Null(isolate), 1, argv_9); if (result_10->ToBoolean()->Value()) { args.GetReturnValue().Set(i); return; return; } Local<Value> arg_11 = (i->IsNumber() || Number::New(isolate, 1)->IsNumber()) ? (Number::New(isolate, i->ToNumber(isolate)->Value() - Number::New(isolate, 1)->ToNumber(isolate)->Value())) : Local<Number>::Cast(Null(isolate)); Local<FunctionTemplate> ftpl_13 = FunctionTemplate::New(isolate, fib); Local<Function> fn_12 = ftpl_13->GetFunction(); fn_12->SetName(String::NewFromUtf8(isolate, "fib")); Local<Value> argv_14[] = { arg_11 }; Local<Value> result_15 = fn_12->Call(Null(isolate), 1, argv_14); Local<Value> arg_16 = (i->IsNumber() || Number::New(isolate, 2)->IsNumber()) ? (Number::New(isolate, i->ToNumber(isolate)->Value() - Number::New(isolate, 2)->ToNumber(isolate)->Value())) : Local<Number>::Cast(Null(isolate)); Local<FunctionTemplate> ftpl_18 = FunctionTemplate::New(isolate, fib); Local<Function> fn_17 = ftpl_18->GetFunction(); fn_17->SetName(String::NewFromUtf8(isolate, "fib")); Local<Value> argv_19[] = { arg_16 }; Local<Value> result_20 = fn_17->Call(Null(isolate), 1, argv_19); args.GetReturnValue().Set((result_15->IsString() || result_20->IsString()) ? Local<Value>::Cast(String::Concat(result_15->ToString(), result_20->ToString())) : Local<Value>::Cast((result_15->IsNumber() || result_20->IsNumber()) ? (Number::New(isolate, result_15->ToNumber(isolate)->Value() + result_20->ToNumber(isolate)->Value())) : Local<Number>::Cast(Null(isolate)))); return; } void jsc_main(const FunctionCallbackInfo<Value>& args) { Isolate* isolate = args.GetIsolate(); tail_recurse_21: Local<Value> arg_22 = Number::New(isolate, 20); Local<FunctionTemplate> ftpl_24 = FunctionTemplate::New(isolate, fib); Local<Function> fn_23 = ftpl_24->GetFunction(); fn_23->SetName(String::NewFromUtf8(isolate, "fib")); Local<Value> argv_25[] = { arg_22 }; Local<Value> result_26 = fn_23->Call(Null(isolate), 1, argv_25); Local<Value> arg_27 = result_26; Local<Function> fn_28 = Local<Function>::Cast(Local<Object>::Cast(isolate->GetCurrentContext()->Global()->Get(String::NewFromUtf8(isolate, "console")))->Get(String::NewFromUtf8(isolate, "log"))); Local<Value> argv_29[] = { arg_27 }; Local<Value> result_30 = fn_28->Call(Null(isolate), 1, argv_29); result_30; } void Init(Local<Object> exports) { NODE_SET_METHOD(exports, "jsc_main", jsc_main); } NODE_MODULE(NODE_GYP_MODULE_NAME, Init)

Let's time this version:

$ time node build/recursion.js 6765 node build/recursion.js 0.16s user 0.03s system 107% cpu 0.175 total

jsc, over twice as slow, is already falling behind Node. :)

As I incremented the number passed to my fibonacci function the compiled program time to completion get exponentially worse. Node stayed the same. I decided to try tail-call optimization to decrease the performance distance between Node and jsc.

I implemented tail-call optimization for the interpreter in BSDScheme by putting all functions in a loop that would break if tail-call elimination was not to happen. It took me a week to implement this and I never put it in place for the compiler. This time around I was able to add basic tail call elimination to jsc in two hours. It is done by label s and goto s instead of a tail call when applicable.

Here is a tail-call optimized version of the same program (example/tco.js):

function fib(n, a, b) { if (n == 0) { return a; } if (n == 1) { return b; } return fib(n - 1, b, a + b); } function main() { console.log(fib(50, 0, 1)); }

We add a call to main() again for Node and time it:

$ time node example/tco.js 12586269025 node example/tco.js 0.06s user 0.02s system 96% cpu 0.080 total

And compile it with jsc and time it:

$ jsc example/tco.js $ time node build/tco.js 12586269025 node build/tco.js 0.07s user 0.02s system 95% cpu 0.087 total

Well that's not bad at all. :)

Next steps with jsc

jsc has very limited support for... everything. Today I added almost all primitive numeric operations + equality/inequality operations + unit tests. jsc does not yet support nested functions, callbacks, or closures. It supports while loops but not yet for loops. And I'm not sure if it supports else if . It does not support arrays or objects let alone constructors and prototypes. Adding support for these is low-hanging fruit.

After the low-hanging fruit, more interesting projects for jsc include: