Compilers for Free

Programs that manipulate other programs are powerful, interesting and fun. I’ll use Ruby to take you on a tour of how interpreters and compilers work, introduce the ideas behind a technique called partial evaluation, and explain a surprising computer science result which allows compilers to be generated automatically. This article is adapted from a talk I gave at RubyConf 2013, which itself was based on my notes for an unrealised chapter of Understanding Computation (interpreters and compilers are explored in more detail in the sample chapter). The video of the talk is below, and the slides are also available.

Introduction I’m fascinated by programming, and by metaprogramming in particular. When Ruby programmers talk about metaprogramming they usually mean programs that write programs, because Ruby has various features — like BasicObject#instance_eval , Module#define_method and BasicObject#method_missing — which make it possible to write programs that grow new functionality at run time. But there’s another aspect of metaprogramming that I find even more interesting: programs that manipulate representations of programs. These are programs which ingest data that itself represents a program, and then do something with that representation, like analyse it, evaluate it, translate it, or transform it. This is the world of interpreters, compilers and static analysers, and I think it’s fascinating. It’s the purest and most self-referential kind of software that we can write. By showing you around this world I’d like to make you look at programs differently. I don’t have a complicated technical point to make; I just want to tell you a cool story, show you that programs which manipulate other programs are interesting, and hopefully inspire you to find out some more about them.

Execution We’ll begin by considering something easy and familiar: executing programs. Say we have a program that we want to run, as well as a machine to run it on. To execute the program, the first step is to put the program into the machine. Then we have some inputs for that program — command-line arguments, or configuration files, or standard input, or whatever — so we feed all of those inputs into the machine too. In principle, the program executes on the machine, reads the inputs, and produces some output: In practice, though, this simple picture is only possible if the program is written in a language that the machine can execute directly. For a bare-metal physical machine that means the program has to be written in low-level machine code. (If the program’s written in a higher-level language then we may be able to run it on a virtual machine that understands that language natively, e.g. Java bytecode on the JVM.) But if our program’s written in a language that’s unfamiliar to the machine, we can’t execute it directly; we need an interpreter or compiler.

Interpretation An interpreter works roughly like this: it reads in the source code of the program we want to execute; then

it builds an abstract syntax tree (AST) by parsing that source code; and finally

it evaluates the program by walking over the abstract syntax tree and performing the instructions that it finds. Let’s investigate each of those steps by building an interpreter for a toy language called Simple (“simple imperative language”). It’s very straightforward and looks a little bit like Ruby, but it’s not exactly the same. Here are some example Simple programs: a = 19 + 23 x = 2; y = x * 3 if (a < 10) { a = 0; b = 0 } else { b = 10 } x = 1; while (x < 5) { x = x * 3 } Unlike Ruby, Simple makes a distinction between expressions and statements. An expression evaluates to a value; for example, the expression 19 + 23 evaluates to 42 , and the expression x * 3 evaluates to three times whatever the current value of the variable x is. On the other hand, evaluating a statement doesn’t produce a value, but it has the side effect of updating the current values of variables; evaluating the assignment statement a = 19 + 23 updates the variable a to have the value 42 . As well as assignment, Simple has a few other kinds of statement: sequencing (evaluating one statement before evaluating another, i.e. the semicolon), conditionals ( if (…) { … } else { … } ), and loops ( while (…) { … } ). These kinds of statement don’t directly affect the values of variables, but they can all contain nested assignment statements which do. There’s a Ruby library called Treetop which makes it easy to build parsers, so we’re going to quickly use Treetop to build a parser for Simple. (We won’t go into any of the gory details here — have a look at the Treetop documentation if you’d like to know more.) To use Treetop we have to write a file called a grammar, which looks like this: grammar Simple rule statement sequence end rule sequence first:sequenced_statement '; ' second:sequence / sequenced_statement end rule sequenced_statement while / assign / if end rule while 'while (' condition:expression ') { ' body:statement ' }' end rule assign name:[a-z]+ ' = ' expression end rule if 'if (' condition:expression ') { ' consequence:statement ' } else { ' alternative:statement ' }' end rule expression less_than end rule less_than left:add ' < ' right:less_than / add end rule add left:multiply ' + ' right:add / multiply end rule multiply left:term ' * ' right:multiply / term end rule term number / boolean / variable end rule number [0-9]+ end rule boolean ('true' / 'false') ![a-z] end rule variable [a-z]+ end end This grammar contains rules which explain what each kind of Simple statement and expression looks like. The sequence rule means “a sequence is two statements separated by a semicolon”, the while rule means “a loop begins with ‘ while ( ’, followed by a condition, followed by the loop’s body surrounded by curly braces”, the assignment rule means “an assignment looks like a variable name followed by an equals sign and an expression”, and so on. The number , boolean and variable rules explain that numbers are sequences of digits, booleans are written as “ true ” and “ false ”, and variable names are sequences of letters. Once we’ve written a grammar, we can use Treetop to load it and generate a parser. The parser gets generated as a Ruby class, so we can instantiate that class and ask it to parse a string. Assuming that the above grammar has been saved into a file called simple.treetop , here’s how we do that in IRB: >> Treetop.load('simple.treetop') => SimpleParser >> SimpleParser.new.parse('x = 2; y = x * 3') => SyntaxNode+Sequence1+Sequence0 offset=0, "x = 2; y = x * 3" (first,second): SyntaxNode+Assign1+Assign0 offset=0, "x = 2" (name,expression): SyntaxNode offset=0, "x": SyntaxNode offset=0, "x" SyntaxNode offset=1, " = " SyntaxNode+Number0 offset=4, "2": SyntaxNode offset=4, "2" SyntaxNode offset=5, "; " SyntaxNode+Assign1+Assign0 offset=7, "y = x * 3" (name,expression): SyntaxNode offset=7, "y": SyntaxNode offset=7, "y" SyntaxNode offset=8, " = " SyntaxNode+Multiply1+Multiply0 offset=11, "x * 3" (left,right): SyntaxNode+Variable0 offset=11, "x": SyntaxNode offset=11, "x" SyntaxNode offset=12, " * " SyntaxNode+Number0 offset=15, "3": SyntaxNode offset=15, "3" The parser produces a data structure called a concrete syntax tree, or a parse tree, which contains a lot of information about how the parsed string breaks down into all its component parts. There’s a lot of fiddly detail in the concrete syntax tree that we’re not very interested in. In fact we don’t want a concrete syntax tree at all, we want an abstract syntax tree, which is a simpler and more useful representation of the structure of the program. To make an abstract syntax tree, we first need to declare some classes that we’re going to use to build the tree’s nodes. That’s easy to do with Struct : Number = Struct.new :value Boolean = Struct.new :value Variable = Struct.new :name Add = Struct.new :left, :right Multiply = Struct.new :left, :right LessThan = Struct.new :left, :right Assign = Struct.new :name, :expression If = Struct.new :condition, :consequence, :alternative Sequence = Struct.new :first, :second While = Struct.new :condition, :body Here we’re defining a new class for each different kind of expression and statement: simple expressions like Number and Boolean just have a value , binary expressions like Add and Multiply have a left and right subexpression, Assign statements have got the name of a variable and the expression to assign to it, and so on. Treetop allows us to add functionality to the nodes of a concrete syntax tree by putting Ruby method definitions inside a grammar file’s rules. We can use this feature to give each kind of concrete syntax tree node a method that converts it into the corresponding kind of AST node; let’s call that method #to_ast . Here’s how we define #to_ast for numbers, booleans and variables within the Simple grammar: rule number [0-9]+ { def to_ast Number.new(text_value.to_i) end } end rule boolean ('true' / 'false') ![a-z] { def to_ast Boolean.new(text_value == 'true') end } end rule variable [a-z]+ { def to_ast Variable.new(text_value.to_sym) end } end When Treetop uses the number rule to build a node for the concrete syntax tree, that node will now come with an implementation of #to_ast which converts its textual content into an integer (using String#to_i ) and then returns that integer wrapped in an instance of the Number class. The #to_ast implementations in the boolean and variable rules contain similar code to build an instance of the appropriate AST class containing the appropriate value. Here are the #to_ast implementations for the three kinds of binary expression: rule less_than left:add ' < ' right:less_than { def to_ast LessThan.new(left.to_ast, right.to_ast) end } / add end rule add left:multiply ' + ' right:add { def to_ast Add.new(left.to_ast, right.to_ast) end } / multiply end rule multiply left:term ' * ' right:multiply { def to_ast Multiply.new(left.to_ast, right.to_ast) end } / term end In all three cases, #to_ast recursively converts the concrete syntax trees representing the left and right subexpressions into ASTs by calling #to_ast on them, and then combines them into one overall AST node of the appropriate class. Statements work similarly: rule sequence first:sequenced_statement '; ' second:sequence { def to_ast Sequence.new(first.to_ast, second.to_ast) end } / sequenced_statement end rule while 'while (' condition:expression ') { ' body:statement ' }' { def to_ast While.new(condition.to_ast, body.to_ast) end } end rule assign name:[a-z]+ ' = ' expression { def to_ast Assign.new(name.text_value.to_sym, expression.to_ast) end } end rule if 'if (' condition:expression ') { ' consequence:statement ' } else { ' alternative:statement ' }' { def to_ast If.new(condition.to_ast, consequence.to_ast, alternative.to_ast) end } end Again, #to_ast simply converts each subexpression or substatement into an AST and glues the results together with the right kind of AST class. Once all the definitions of #to_ast are in place, we can use Treetop to build a concrete syntax tree and then call #to_ast on the root node to recursively convert the whole thing into an AST: >> SimpleParser.new.parse('x = 2; y = x * 3').to_ast => #<struct Sequence first=#<struct Assign name=:x, expression=#<struct Number value=2> >, second=#<struct Assign name=:y, expression=#<struct Multiply left=#<struct Variable name=:x>, right=#<struct Number value=3> > > > Whereas Treetop’s detailed concrete syntax tree is designed for Treetop’s purposes, the AST is a simpler data structure built out of instances of classes that we designed ourselves. The AST for x = 2; y = x * 3 has this structure: This tells us that the statement x = 2; y = x * 3 is a Sequence of two Assign ments: the first Assign ment is assigning the Number 2 to x , and the second Assign ment is assigning the result of Multiply ing the Variable x and the Number 3 to y . Now that we’ve got this AST, we can evaluate it by recursively walking over it. On each AST node class we’ll define a method called #evaluate which performs the operations represented by that node and any of its subtrees; once that’s done, we can call #evaluate on the root node of the AST to evaluate the whole program. Here’s what #evaluate looks like for Number , Boolean and Variable nodes: class Number def evaluate(environment) value end end class Boolean def evaluate(environment) value end end class Variable def evaluate(environment) environment[name] end end The environment argument is a hash which contains the current value of each variable. An environment of { x: 7, y: 11 } , for example, means that there are two variables, x and y , with the values 7 and 11 respectively. We’ll assume that every Simple program is evaluated within some (possibly empty) initial environment; later we’ll see how statements can change the current environment. When we’re evaluating a Number or Boolean we don’t care what’s in the environment, because we always just return the value that was stored inside that node when it was built. If we’re evaluating a Variable , though, we have to look in the environment and retrieve the current value of the variable with that name. So if we make a new Number AST node and evaluate it in an empty environment, we get the original Ruby number back: >> Number.new(3).evaluate({}) => 3 The same goes for Boolean s: >> Boolean.new(false).evaluate({}) => false If we evaluate a Variable called y in an environment that gives y the value 11 , the result will be 11 , whereas if we evaluate it in an environment that gives y a different value we’ll get that value instead: >> Variable.new(:y).evaluate({ x: 7, y: 11 }) => 11 >> Variable.new(:y).evaluate({ x: 7, y: true }) => true Here are the definitions of #evaluate for binary expressions: class Add def evaluate(environment) left.evaluate(environment) + right.evaluate(environment) end end class Multiply def evaluate(environment) left.evaluate(environment) * right.evaluate(environment) end end class LessThan def evaluate(environment) left.evaluate(environment) < right.evaluate(environment) end end These implementations recursively evaluate their left and right subexpressions and then perform the appropriate operation on the results. If it’s an Add node, the results are added; if it’s a Multiply they’re multiplied together; if it’s a LessThan , the results are compared using Ruby’s < operator. So, when we evaluate the Multiply expression x * y in an environment that gives x the value 2 and y the value 3 , we get the result 6 : >> Multiply.new(Variable.new(:x), Variable.new(:y)). evaluate({ x: 2, y: 3 }) => 6 If we evaluate the LessThan expression x < y in the same environment, we get the result true : >> LessThan.new(Variable.new(:x), Variable.new(:y)). evaluate({ x: 2, y: 3 }) => true For statements, #evaluate works slightly differently. Because evaluating a Simple statement updates the environment rather than returning a value, we define #evaluate to return a new environment hash that reflects any changes that the statement has made to the current values of variables. The only kind of statement which directly affects variables is Assign : class Assign def evaluate(environment) environment.merge({ name => expression.evaluate(environment) }) end end To evaluate an Assign statement we first evaluate the right-hand side of the assignment to get a value, and then use Hash#merge to create a modified environment which associates that value with the appropriate variable name. If we evaluate x = 2 in an empty environment, we get back a new environment which says the value of x is 2 : >> Assign.new(:x, Number.new(2)).evaluate({}) => {:x=>2} Similarly, if we evaluate y = x * 3 in an environment which says x is 2 , then the subexpression x * 3 will evaluate to 6 , resulting in an environment which says that x and y have the values 2 and 6 respectively: >> Assign.new(:y, Multiply.new(Variable.new(:x), Number.new(3))). evaluate({ x: 2 }) => {:x=>2, :y=>6} We can evaluate a Sequence of two statements by evaluating the first statement to get an intermediate environment, then evaluating the second statement in that intermediate environment to get a final environment: class Sequence def evaluate(environment) second.evaluate(first.evaluate(environment)) end end In a Sequence of two assignments to the same variable, for example, the second assignment will win: >> Sequence.new( Assign.new(:x, Number.new(1)), Assign.new(:x, Number.new(2)) ).evaluate({}) => {:x=>2} Finally, we need definitions of #evaluate for If and While statements: class If def evaluate(environment) if condition.evaluate(environment) consequence.evaluate(environment) else alternative.evaluate(environment) end end end class While def evaluate(environment) if condition.evaluate(environment) evaluate(body.evaluate(environment)) else environment end end end If evaluates one of its two substatements depending on the value of its condition, and While uses recursion to repeatedly evaluate the loop body as long as its condition evaluates to true . (See “The Meaning of Programs” for a fuller explanation of how this works.) Now that we’ve implemented #evaluate for each kind of expression and statement, we can parse any Simple program and evaluate the resulting AST. Let’s see what happens when we evaluate x = 2; y = x * 3 in an empty environment: >> SimpleParser.new.parse('x = 2; y = x * 3'). to_ast.evaluate({}) => {:x=>2, :y=>6} As expected, the final result is that x and y have the values 2 and 6 . Let’s try a more sophisticated program: >> SimpleParser.new.parse('x = 1; while (x < 5) { x = x * 3 }'). to_ast.evaluate({}) => {:x=>9} This program initialises x with 1 , then repeatedly multiplies it by 3 until its value becomes greater than 5 . When evaluation finishes, x has the value 9 , because nine is the smallest power of three that’s greater than or equal to five. That’s all we really need to make an interpreter: a parser to generate a concrete syntax tree, a mechanism for turning a concrete syntax tree into an AST, and a way of walking over an AST to evaluate it. An interpreter gives us a way to perform single-stage execution. We can supply a source program and some input to that program (the initial environment, in Simple’s case), and the interpreter will evaluate the program with the input to produce some output: This whole process happens at run time — a single stage of program execution.

Compilation Now we’ve seen how interpreters work. What about compilers? A compiler turns source code into an AST and walks over it, just like an interpreter does. But rather than performing the instructions from the AST, a compiler generates target code by emitting instructions as it finds them. Let’s illustrate this by writing a Simple-to-JavaScript compiler. Instead of defining #evaluate on the nodes of the AST, we’ll define a method called #to_javascript which converts each kind of expression and statement into a string of JavaScript source code. Here’s #to_javascript for Number s, Boolean s and Variable s: require 'json' class Number def to_javascript "function (e) { return #{JSON.dump(value)}; }" end end class Boolean def to_javascript "function (e) { return #{JSON.dump(value)}; }" end end class Variable def to_javascript "function (e) { return e[#{JSON.dump(name)}]; }" end end In each case, the generated JavaScript is a function which takes an environment argument, e , and returns a value. For a Number expression this return value is the original number: >> Number.new(3).to_javascript => "function (e) { return 3; }" We can run this compiled code in a JavaScript environment (e.g. a browser console or Node.js REPL) by calling the generated function with a JavaScript object containing the current values of variables: > program = function (e) { return 3; } [Function] > program({ x: 7, y: 11 }) 3 (Using an explicit object to store the current environment, rather than relying on native JavaScript variables, is just a design choice; it simplifies the implementation and avoids difficulties with JavaScript variable scope.) Similarly, a compiled Boolean is a function that returns constant true or false : >> Boolean.new(false).to_javascript => "function (e) { return false; }" A Simple Variable becomes a JavaScript function that looks up the appropriate variable in the environment and returns its value: >> Variable.new(:y).to_javascript => "function (e) { return e[\"y\"]; }" Of course, this function will return different values depending on the contents of the environment: > program = function (e) { return e["y"]; } [Function] > program({ x: 7, y: 11 }) 11 > program({ x: 7, y: true }) true Here are the implementations of #to_javascript for binary expressions: class Add def to_javascript "function (e) { return #{left.to_javascript}(e) + #{right.to_javascript}(e); }" end end class Multiply def to_javascript "function (e) { return #{left.to_javascript}(e) * #{right.to_javascript}(e); }" end end class LessThan def to_javascript "function (e) { return #{left.to_javascript}(e) < #{right.to_javascript}(e); }" end end To compile a binary expression we convert its left and right subexpressions to JavaScript function s and then wrap them in some more JavaScript code. When it’s eventually executed, this extra code will call both subexpression function s to get their values, and then use addition, multiplication or comparison to combine those values and return a final result. For example, let’s try compiling x * y : >> Multiply.new(Variable.new(:x), Variable.new(:y)).to_javascript => "function (e) { return function (e) { return e[\"x\"]; }(e) * function (e) { return e[\"y\"]; }(e); }" Here’s the compiled code (reformatted for clarity) running in a JavaScript environment: > program = function (e) { return function (e) { return e["x"]; }(e) * function (e) { return e["y"]; }(e); } [Function] > program({ x: 2, y: 3 }) 6 Below are the implementations of #to_javascript for statements. Each of them generates the JavaScript version of its respective Simple statement — an If turns into a JavaScript conditional, a While turns into a JavaScript while loop, and so on — and wraps it in code that returns the updated environment. class Assign def to_javascript "function (e) { e[#{JSON.dump(name)}] = #{expression.to_javascript}(e); return e; }" end end class If def to_javascript "function (e) { if (#{condition.to_javascript}(e))" + " { return #{consequence.to_javascript}(e); }" + " else { return #{alternative.to_javascript}(e); }" + ' }' end end class Sequence def to_javascript "function (e) { return #{second.to_javascript}(#{first.to_javascript}(e)); }" end end class While def to_javascript 'function (e) {' + " while (#{condition.to_javascript}(e)) { e = #{body.to_javascript}(e); }" + ' return e;' + ' }' end end Earlier we tried interpreting the program x = 1; while (x < 5) { x = x * 3 } , but now we can try compiling it into JavaScript: >> SimpleParser.new.parse('x = 1; while (x < 5) { x = x * 3 }'). to_ast.to_javascript => "function (e) { return function (e) { while (function (e) { return function (e) { return e[\"x\"]; }(e) < function (e) { return 5; }(e); }(e)) { e = function (e) { e[\"x\"] = function (e) { return function (e) { return e[\"x\"]; }(e) * function (e) { return 3; }(e); }(e); return e; }(e); } return e; }(function (e) { e[\"x\"] = function (e) { return 1; }(e); return e; }(e)); }" Admittedly this compiled version is much longer than the original, but we have at least turned a Simple program into JavaScript. And if we run the compiled program in a JavaScript environment, we get the same result as when we interpreted it — x ends up having the value 9 : > program = function (e) { return function (e) { while (function (e) { return function (e) { return e["x"]; }(e) < function (e) { return 5; }(e); }(e)) { e = function (e) { e["x"] = function (e) { return function (e) { return e["x"]; }(e) * function (e) { return 3; }(e); }(e); return e; }(e); } return e; }(function (e) { e["x"] = function (e) { return 1; }(e); return e; }(e)); } [Function] > program({}) { x: 9 } This is an extremely stupid compiler — much smarter ones are possible with more effort — but it would in principle allow us to execute Simple programs on a machine that only understood JavaScript. A compiler gives us a way to perform two-stage execution. First we supply just a source program, and the compiler will run and generate a target program as output. And then, later on, we can take that target program, provide it with some input, and run it to get the final output: Compilation achieves the same overall result as interpretation — a non-native source program and input are eventually turned into output — but it splits the execution process into two separate stages. The first stage is compile time, when the target program is generated; the second stage is run time, when the input is read and the output is generated. The good news about compilation is that a good compiler can generate a target program which runs faster than its interpreted counterpart. Splitting the computation into two stages removes the interpretive overhead from run time: all of the work of parsing the program, building an AST and walking over it is done at compile time, so it’s already out of the way by the time the target program runs. (Incidentally, compilation has other performance benefits too; for example, a compiler can use clever data structures and optimisations to make the target program more efficient, especially if it knows how to exploit the architecture of the underlying machine.) The bad news about compilation is that it’s just more difficult than interpretation. There are a few reasons for that: A compiler writer has to think about two stages of execution instead of one. When we write an interpreter, we only need to think about the execution of the interpreter itself and how it’s going to behave at run time in response to the contents of the source program’s AST. But to write a compiler we have to think about both compile time and run time: we must keep track of what the compiler itself is going to do when it generates the target program, and also what the target program will do later when it’s executed.

A compiler writer has to think about two programming languages instead of one. An interpreter is written in a single language, but a compiler contains code written in both the compiler implementation language and the target language — our Simple compiler is written in Ruby interspersed with fragments of JavaScript.

compiler is written in Ruby interspersed with fragments of JavaScript. Compiling dynamic languages into efficient target programs is inherently challenging. Languages like Ruby and JavaScript have features which allow the program itself to change and grow during execution, so the static source code of a program doesn’t necessarily tell the compiler everything it wants to know about what might happen at run time. In short: writing an interpreter is easier than writing a compiler, but interpreted programs are slower than compiled ones. Interpreters only have one execution stage, only use one programming language, and can be as dynamic as necessary — if the program changes when it runs, that’s fine, because the AST can be updated and the interpreter will just keep working — but that ease and flexibility carries a run time performance penalty. Ideally we would only need to write interpreters for our programming languages, but we want our programs to run as fast as possible, so we usually end up writing compilers wherever possible.

Partial evaluation Programmers use interpreters and compilers all the time, but there’s another kind of program-manipulating program that’s less well known: partial evaluators. A partial evaluator is a hybrid of an interpreter and a compiler: Whereas an interpreter executes a program right now, and a compiler generates a program now so that we can execute it later, a partial evaluator executes some of the code now and leaves the rest of it for execution later. A partial evaluator reads a program (called the subject program) along with some of the inputs for that program, and evaluates only the parts of the program that depend on the inputs provided. Once all these parts have been evaluated, the remaining program (called the residual program) is emitted as output. In this way, partial evaluation lets us extract part of the process of program execution and do it earlier, timeshifting it into the past. Instead of providing all of a program’s inputs at once, we can provide some of them now and save the rest for later. So, instead of executing the subject program all at once, we can feed it and some of its input into a partial evaluator. The partial evaluator will execute some of the subject program to produce a residual program; later, we can execute the residual program, feeding it the rest of the input to get the final output: The overall effect of partial evaluation is to split a single-stage execution into two stages by timeshifting some of the work from the future — when we intend to run the program — to the present, so that we can get it out of the way earlier. A partial evaluator works by turning the subject program into an AST, just like an interpreter or compiler, and reading some of the program’s inputs. It analyses the AST by walking over it to find all the places where the provided inputs are used; each of these program fragments is then evaluated and replaced with new code containing the result. Once partial evaluation is complete, the AST is turned back into text and emitted as the residual program. That’s too much machinery for us to actually build here, but let’s walk through a manual example to get a feel for what it looks like. Imagine that some Ruby program contains a method, #power , which calculates x to the power of n : def power(n, x) if n.zero? 1 else x * power(n - 1, x) end end Now let’s play the part of a Ruby partial evaluator, and assume that we’ve been given enough of the program’s input to know that #power will be called with 5 as its first argument, n . We can easily generate a new version of the method — let’s call it #power_5 — where the n argument has been removed, and all occurrences of n have been replaced with 5 (this is called constant propagation): def power_5(x) if 5.zero? 1 else x * power( 5 - 1 , x) end end There are now two expressions inside the method — 5.zero? and 5 - 1 — that we could potentially evaluate. Let’s choose to evaluate 5.zero? and replace it with the result, false : def power_5(x) if false 1 else x * power(5 - 1, x) end end This change makes it possible to evaluate the whole conditional — if false … else … end always evaluates to the else clause (sparse conditional constant propagation): def power_5(x) x * power(5 - 1, x) end Now we can evaluate 5 - 1 and replace it with 4 (constant folding): def power_5(x) x * power( 4 , x) end We could stop there, but because we know the definition of #power , there’s an opportunity to replace the power(4, x) call with the body of #power itself (inline expansion), with the argument 4 replacing all occurrences of the parameter n : def power_5(x) x * if 4.zero? 1 else x * power(4 - 1, x) end end Now we can just repeat what we did before. We know that 4.zero? evaluates to false : def power_5(x) x * if false 1 else x * power(4 - 1, x) end end We know that if false … else … end evaluates to the else branch: def power_5(x) x * x * power(4 - 1, x) end We know that 4 - 1 is 3 : def power_5(x) x * x * power( 3 , x) end And so on. By repeatedly inlining calls to #power and evaluating constant expressions where possible, we keep adding more multiplications until we get a #power call with 0 as its first argument: def power_5(x) x * x * x * x * x * power(0, x) end Let’s expand that call: def power_5(x) x * x * x * x * x * if 0.zero? 1 else x * power(0 - 1, x) end end We know that 0.zero? is true : def power_5(x) x * x * x * x * x * if true 1 else x * power(0 - 1, x) end end For the first time we have a conditional that says if true instead of if false . This evaluates to the if branch rather than the else branch: def power_5(x) x * x * x * x * x * 1 end We’ve ended up with a definition of #power_5 that multiplies x by itself five times and then multiplies by 1 . Multiplying a number by 1 doesn’t change it, so let’s eliminate that last multiplication and clean up the formatting to get the finished method: def power_5(x) x * x * x * x * x end Here’s what that whole process looks like: #power_5 is the residual program generated by the partial evaluation of #power with respect to the input 5 . There’s no new code in it; #power_5 ’s body is made up from bits of #power , just rearranged and stuck together in new ways. The advantage of #power_5 is that it should be faster than #power in the case where n is 5 . #power_5 just multiplies x by itself five times, whereas #power executes several recursive method calls, zero checks, conditionals and subtractions. Partial evaluation has done all of that work already, and left behind a residual program which only contains the remaining work that depends on the (unknown) value of x . So #power_5 is an improved version of #power , if we know that n is going to be 5 . Note that this isn’t the same thing as partial application. We could have used partial application to implement #power_5 differently: def power_5(x) power(5, x) end This time we’ve turned #power into #power_5 by simply fixing one of its arguments to a constant value. Of course, this implementation works fine: >> power_5(2) => 32 But this version of #power_5 isn’t any more efficient than #power — if anything it’ll be slightly slower, because it introduces an extra method call. Or, instead of defining a new method, we could turn #power into a proc with Method#to_proc , use Proc#curry to turn it into a curried proc, and call the curried proc with #power ’s first argument (i.e. 5 ) to partially apply it: >> power_5 = method(:power).to_proc.curry.call(5) => #<Proc (lambda)> Later, we can call the partially applied proc with #power ’s second argument to actually execute the method: >> power_5.call(2) => 32 So, partial application is a bit similar to partial evaluation, but it’s a technique that’s applied inside a program: when we’re writing code, partial application gives us a way to define a lower-arity version of a method by fixing one or more of its arguments. By contrast, partial evaluation is a source-level program transformation that’s applied from outside of a program, generating a faster version of that program by specialising it for particular input values.

Applications Partial evaluation has many useful applications. The ability to specialise a program for a fixed input can provide the best of both worlds: we can write a general and flexible program that supports many different inputs, and then, when we actually know how we want to use that program, we can use a partial evaluator to automatically generate a specialised version that’s had much of its generality removed to improve its performance. For example, a web server like nginx or Apache reads in a configuration file when it starts up. The contents of that file affect the handling of every HTTP request, so the web server must spend some of its execution time checking its configuration data to decide what to do. If we used a partial evaluator to specialise a web server with respect to a particular config file, we’d get a new version that only does what that file says; the overhead of parsing a config file during startup, and checking its data during every request, would no longer be part of the program. Another classic example is ray tracing. To make a movie where the camera moves through a three-dimensional scene, we might end up using a ray tracer to render thousands of individual frames. But if the scene’s the same for every frame, we could use a partial evaluator to specialise the ray tracer with respect to our particular scene description, and we’d get a new ray tracer that could only render that scene. Then, each time we used the specialised ray tracer to render a frame, we’d avoid the overhead of reading the scene file, setting up the data structures needed to represent its geometry, and making camera-independent decisions about how rays of light behave in the scene. A more practical (although more questionable) example is the OpenGL pipeline. In OS X, parts of the OpenGL pipeline are written in LLVM intermediate language, including functionality which is implemented directly in hardware on some GPUs. By shipping software implementations of features that any particular GPU may or may not have, Apple makes it possible for LLVM to partially evaluate away any unnecessary code when it’s run on specific hardware, with any hardware-unsupported features being provided by the residual software.

The Futamura projections At the beginning I said I’d tell you a cool story, so here it is. In 1971, Yoshihiko Futamura realised something interesting while he was working at Hitachi Central Research Laboratory. He was thinking about how partial evaluation allows us to process some of a subject program’s input earlier, leaving us with a residual program that we can run later. Specifically, Futamura was thinking about partial evaluation in the context of interpreters. He realised that an interpreter is just a computer program that we provide with inputs and then execute to produce output. One of its inputs happens to be source code, but otherwise an interpreter is like any other program that turns inputs into output. What would happen, he wondered, if we used partial evaluation to do some of an interpreter’s work earlier, and do the rest of it later by running the residual program? If we use partial evaluation to specialise an interpreter with respect to one of its inputs — a particular source program — we get a residual program out. Then, at some later time, we can run that residual program with the remaining input to get the final output: The overall effect is the same as running the interpreter directly against the source program and its input, but now that execution has been split into two stages. Futamura noticed that this residual program, which reads in the final input and produces output, is what we’d usually call a target program — a version of the source program that can be executed directly on the underlying machine. That means that the partial evaluator has read in a source program and produced a target program; in other words, it’s acted as a compiler. That’s pretty amazing. How does it work? Let’s work through a quick example. Here’s the top level of a Simple interpreter written in Ruby: source, environment = read_source, read_environment Treetop.load('simple.treetop') ast = SimpleParser.new.parse(source).to_ast puts ast.evaluate(environment) This interpreter reads in the source program and its initial environment from somewhere (we don’t care how). It loads the Treetop grammar, instantiates the parser, and uses it to turn the source into an AST. Then it evaluates the AST and prints out the environment so we can see the final values of all the variables. The definitions of #evaluate on each kind of AST node aren’t shown here — they’re the same as before. Let’s imagine feeding this interpreter into a Ruby partial evaluator with a particular Simple source program, x = 2; y = x * 3 , as input. That means #read_source (however it works) will return the string 'x = 2; y = x * 3' , so we can replace the read_source call with that value: source, environment = 'x = 2; y = x * 3' , read_environment Treetop.load('simple.treetop') ast = SimpleParser.new.parse(source).to_ast puts ast.evaluate(environment) Now we can do some copy propagation; the source variable is only used once, so we can eliminate it entirely and replace it with its value: environment = read_environment Treetop.load('simple.treetop') ast = SimpleParser.new.parse( 'x = 2; y = x * 3' ).to_ast puts ast.evaluate(environment) This gives us all the data necessary to built the AST; we already have the Treetop grammar, and now we know what string we’re going to parse with it. Let’s evaluate the expression that creates the AST and replace it with the AST itself: environment = read_environment ast = Sequence.new( Assign.new( :x, Number.new(2) ), Assign.new( :y, Multiply.new( Variable.new(:x), Number.new(3) ) ) ) puts ast.evaluate(environment) At this point we’ve reduced the general Simple interpreter to a program which reads in an environment, constructs a literal AST, and calls #evaluate on it. What does calling #evaluate on this specific AST do? Well, we already have the definitions of #evaluate for each kind of AST node, so we can walk over the tree and partially evaluate each occurrence of #evaluate , just as we did for #power . The AST contains all the data we need, so we can gradually boil down all of these #evaluate implementations to a couple of lines of Ruby: For the Number and Variable leaves we know what the values of value and name are, so we can propagate those constants into the definitions of #evaluate for those nodes. For the Multiply and Assign nodes that evaluate their subexpressions, we can inline the recursive #evaluate calls by looking at the appropriate method definitions. And for the Sequence node, we can inline the calls to first.evaluate and second.evaluate , using a temporary variable to hold the value of the intermediate environment. That glosses over a tremendous amount of detail, but the important thing is that we have all of the code and data we need to build a partially-evaluated definition of #evaluate for the root node of the AST: def evaluate(environment) environment = environment.merge({ :x => 2 }) environment.merge({ :y => environment[:x] * 3 }) end Let’s go back to the main interpreter code: environment = read_environment ast = Sequence.new( Assign.new( :x, Number.new(2) ), Assign.new( :y, Multiply.new( Variable.new(:x), Number.new(3) ) ) ) puts ast.evaluate(environment) Now that we’ve generated the body of #evaluate for the AST’s root node, we can partially evaluate ast.evaluate(environment) by expanding the #evaluate call: environment = read_environment environment = environment.merge({ :x => 2 }) puts environment.merge({ :y => environment[:x] * 3 }) This is a Ruby program that reproduces the behaviour of the original Simple program: it sets x to 2 in the environment , then sets y to the value of x multiplied by 3 . So, in a sense, we’ve compiled the Simple program into Ruby by partially evaluating an interpreter — there’s some extra machinery involving the environment, but what we’ve ended up with is essentially a Ruby version of x = 2; y = x * 3 . As with the #power example, we didn’t have to write any new Ruby code; the residual program is just pieces of the interpreter rearranged and stuck together in such a way that it does the same thing as the Simple program we started with. This is called the first Futamura projection: if we partially evaluate an interpreter with respect to some source code, we get a target program. Futamura was happy when he realised this. Then he thought about it some more, and realised that when we feed an interpreter and a source program into the partial evaluator to produce a target program, this is just another instance of running a program with multiple inputs. What would happen if we used partial evaluation to do some of the partial evaluator’s work earlier, and do the rest of it later by running the residual program? If we use partial evaluation to specialise the partial evaluator with respect to one of its inputs — the interpreter — we get a residual program out. Then, at some later time, we can run that residual program with the source program as input to get a target program. And lastly, we can run the target program with the remaining input to get the final output: The overall effect is the same as running the interpreter directly against the source program and its input, but now that execution has been split into three stages. Futamura noticed that this residual program, which reads in the source program and produces a target program, is what we’d usually call a compiler. That means that the partial evaluator has read in an interpreter and produced a compiler; in other words, it’s acted as a compiler generator. This is called the second Futamura projection: if we partially evaluate a partial evaluator with respect to an interpreter, we get a compiler. Futamura was happy when he realised this. Then he thought about it some more, and realised that when we feed a partial evaluator and an interpreter into the partial evaluator to produce a compiler, this is just another instance of running a program with multiple inputs. What would happen if we used partial evaluation to do some of the partial evaluator’s work earlier, and do the rest of it later by running the residual program? If we use partial evaluation to specialise the partial evaluator with respect to one of its inputs — the partial evaluator — we get a residual program out. Then, at some later time, we can run that residual program with the interpreter as input to get a compiler. Later still, we can run the compiler with the source program as input to get a target program. And lastly, we can run the target program with the remaining input to get the final output: The overall effect is the same as running the interpreter directly against the source program and its input, but now that execution has been split into four stages. Futamura noticed that this residual program, which reads in the interpreter and produces a compiler, is what we’d usually call a compiler generator. That means that the partial evaluator has produced a compiler generator; in other words, it’s acted as a compiler generator generator. This is the third Futamura projection: if we partially evaluate a partial evaluator with respect to a partial evaluator, we get a compiler generator. Thankfully we can’t take it any further than that, because if we repeated the process we’d still be partially evaluating the partial evaluator with respect to itself. There are only three Futamura projections.

Conclusion The Futamura projections are fun and surprising, but their existence doesn’t make compiler authors redundant. Partial evaluation is a fully general technique that works with any computer program; when applied to an interpreter it can eliminate the overhead of parsing source code and manipulating an AST, but it doesn’t spontaneously invent the kind of data structures and optimisations that we expect from a production-quality compiler. We still need clever people to hand-craft compilers to make our programs go as fast as possible. If you’d like to learn more about partial evaluation, there’s a great free book called “Partial Evaluation and Automatic Program Generation” that explains it in detail. The JIT in LLVM, and parts of the PyPy toolchain — namely RPython and the VM and the JIT that underlie it — use related techniques to execute programs efficiently. Those projects are directly relevant to Ruby because Rubinius sits on top of LLVM, and there’s a Ruby implementation written in Python, called Topaz, that sits on top of the PyPy toolchain. Partial evaluation aside, Rubinius and JRuby are high-quality compilers with interesting and accessible implementations. If you’re interested in programs that manipulate programs, you may enjoy cracking open Rubinius or JRuby, having a look at how they work, and (in the spirit of Matz’s RubyConf 2013 keynote) getting involved in their development.