Mar 27, 2012

this is the fourth entry in an n-part series explaining the compilation techniques of Clojure. translations: [日本語]

When ClojureScript was first announced there was much gnashing of teeth over the fact that it provided neither eval , nor runtime macros. In response, I did tackle the matter of eval , but code speaks louder than words, so I therefore present Himera, a ClojureScript compilation web-service.

I have a deployment of Himera on Heroku (shown below — caveat emptor) if you’d like to play with it. Additionally, the Himera source code is available on Github.

What Himera is

Himera1 (Russian: Химера, pronounced Hee-mera with a trill) is an experiment in slicing and dicing the typical REPL model of Lisp computation, providing a modularized web service for ClojureScript compilation.

REPL

Lisps, and Clojure is no exception, provide a unique programming experience via the REPL. The canonical representation of the REPL described as source is summarized simply as:

(loop (print (eval (read))))

You’ve probably seen this contrivance, but what exactly does it mean? The diagram below is a graphical representation of the same idea:

That is, a REPL is a composition of three repeating functions: read , eval , and print . The read step takes a string (or maybe an input buffer) and produces a Lisp data structure representing the program in hand2. This data structure is then fed into the eval function and executed as a program. Finally, the result of the evaluation step (another Lisp data structure) is printed to the user.

This model of the REPL is highly simplistic, but it serves as representational for most cases. However, because of a lot of historical baggage, the typical conception of this model is often limited to that of a single process, as in the image below:

But this is an outmoded ideal, whether it be at a console REPL:

… a browser:

… or a phone:

It simply does not need to be configured in such a way. The very nature of Lisp and its furcated architecture allows many different ways to arrange the components of a REPL.

An exploded view of the REPL

Before I can talk about various ways to slice up the REPL into bits and pieces I should mention that the canonical image above is way too simplistic. Instead, the ClojureScript compiler is modularized along much finer dimensions than the Lispy trinity. Observe the following:

The constituent parts of the ClojureScript anatomy are as follows:

Input

Some input device reads a string of characters and feeds it into the reader as a true string datatype or some input buffer.

Reader

The reader consumes the string from the input device and transforms it into a Clojure data structure. In other words, the raw string:

"(vector :thx (-> 1138 - str))"

Is converted into a Clojure persistent list data structure of three elements: 1) the symbol vector , 2) the keyword :thx , and 3) another persistent list of three elements a symbol -> , 1138 , a symbol - and another symbol str . The source view of this data structure is described in Clojure as:

(list 'vector :thx (list '-> 1138 '- 'str))

The result of the Reader is always a Clojure data structure, Java instance, or an error.

Macro expansion (macro-xp)

The raw Clojure data structures produced by the Reader are then processed for macro-expansion to some fixed point (i.e. they are expanded until the input equals the output). In the case of the structure listed above, the macro -> would be expanded into the following:

(vector :thx (str (- 1138)))

This is where Clojure’s idea of (and Lisp in general) code as data diverges from the syntactic representation.

Analyzer

The analysis phase of ClojureScript compilation builds an abstract syntax tree (AST) that represents the program itself, divorced from syntactic matters. That is, the tree structure defines logical groupings along branches, binding contexts alongs tree depth, etc. This is where Clojure’s (and Lisp in general) code as data diverges from its parse form. The analysis phase also marks the end of the the first phase in ClojureScript’s 2-phase compilation process.

Emitter

This is where ClojureScript’s AST is walked and transformed into JavaScript. This is the beginning and end of the second phase of ClojureScript’s 2-phase compilation process. This is also where you would typically deploy your ClojureScript application. However, in the context of a REPL layout, two more elements are missing.

Eval (or runtime)

The JavaScript that is produced by the ClojureScript compiler is evaluated. The original code (vector :thx (str (- 1138))) under examination above would result in what in Clojure would look like the following:

[:thx "-1138"]

However, it would be JavaScript and therefore an instance of cljs.core.Vector containing two strings.3

Print

The result of the JavaScript is “printed” via the appropriate means.

Taking this exploded view of the ClojureScript compiler to heart, imagine how the traditional REPL model might look differently under various operational constraints. Below I will illustrate a few.

The Browser-connected REPL

Because ClojureScript has neither runtime evaluation nor compilation elements, the Clojure/core team had to devise a way to provide an agile development experience that Clojure programmers were accustomed to. The initial release of ClojureScript packaged Rhino and used it as the evaluation engine of the emitted JavaScript, however, this was less than optimal for numerous reasons outside of the scope of this post. Eventually, it was decided that the evaluation engine should instead be that of a browser, as shown below:

That is, the read, compilation, and emission steps all live in Clojure and the evaluation phase lives in the browser. As you may have noticed, the print phase exists a bit in both, but the details of that are not important for the purposes of this post. This scenario turns out to be extremely powerful for a number of reasons, the most obvious being that it’s nice to evaluate live code against the environment in which a large percentage of the production code is likely to execute. Further, connecting to the browser in such a way allows one to build up a browser-based app live and experiment in realtime with different code paths. It’s this very scenario that makes the M.O. of ClojureScript One so compelling. For the future it would be spectacular to see the browser-connected REPL target multiple browsers at once, vetting returned results via quorum. Smarter people than me are thinking through just such a scenario problem as you read this.

Himera – ClojureScript compilation as a web service

Himera rearranges the modularized ClojureScript compiler yet again as illustrated in the following image:

Like the browser-connected REPL, the reading, macro expansion, analysis, and emission phases are separate from the evaluation phase. However, instead of eval being the external service, the entire read and compilation phase is the service. This arrangement allows something like a browser to act as the evaluation engine itself, collocated with the input device (although this collocation is not necessary, and I plan to allow a separation soon). Himera ships with a Browser-embedded REPL that, minus some minor bugs and immaturity, provides an in-browser development experience.4 There is more work to do, but the possibilities of this model have already been explored by Chris Granger with his cljs-live project:

Can you imagine if that was your development environment?

We’ve only scratched the surface.

Emitter.next

But the compilation service need not return JavaScript. In fact, given that the ClojureScript compiler is very tiny (~1500 lines, ~350 being emission only), one could write an entirely different emitter that returns any language at all — say for example Gambit Scheme or maybe even Python.

Compiler as service is the new black.

An AST service

But what if your emitter was not collocated with your compilation service? That’s fine. Himera also provides a service that takes ClojureScript code and returns an AST in the form of embedded Clojure maps (although JSON could be returned without too much effort):

You could then interpose other AST processing services along to way to modify and/or annotate the AST with additional processing. Ambrose Bonnaire-Sergeant is exploring a typed Clojure variant using this very technique.

A la carte compilation

This slicing and dicing of the Clojure compiler is probably not new. I suspect that these techniques have been known and used at various times in the history of the language — the core Lisp philosophy begs for it. The modularized compiler is extremely flexible allowing interposition, enhancement and replacement at any junction along the path from string to evaluation/execution result. Himera is an exploration of this model of programming providing compilation as web service, but it’s certainly not limited this this model only.

The possibilities are endless.

:F

thanks to Craig Andera and Chris Redinger for moral support, content suggestions and reviewing a draft of this post. a special thanks to Jen Myers for her amazing design of the Himera REPL page. once you’ve worked with great designers there is no going back to the dark ages of programmer-styled apps.