In the summer of 2015, I wanted to work on a side-project, that I can quickly use, instead of spending lots of weekends, and it getting no where. Luckily, I found the right project on Peter Norvig’s website. Here I will try to describe, how to write a simple LISP interpreter, and how to use that in your iOS app.

To those who are new to LISP, it is pretty simple to explain. LISP programs are based on something called ‘s-expressions’. An s-expression looks like this: $(\mathrm{operator}\ \mathrm{operand_1}\ \mathrm{operand_2}\ \mathrm{operand_3}\ …)$.

For example:

$(+\ 1\ 2)$ is an s-expression, where $+$ is the operator, and $1$ and $2$ are the operands. Hence, it is equivalent to $1\ +\ 2$.

Similarly, $(*\ 2\ 3\ 4)$ in LISP evaluates to $2\ *\ 3\ *\ 4$.

The operands can themselves be recursively computed too.

For example, $(+$ $1$ $(*$ $2$ $3))$ is a valid expression. First we evaluate the inner $(*$ $2$ $3)$ part, then the original expression resolves to $(+$ $1$ $6)$, which then evaluates to $7$. This can go on recursively.

For someone who wants to design an interpreter, LISP is the ideal real-life language to start with. This is for two reasons:

There are minimal symbols to interpret. Only $($ and $)$ and the operators that you define. The parsing is straight-forward and recursive. Zero syntactic-sugar.

I could only stay motivated, and bring this project to a closure, because you can pick a very small subset of LISP, and still do a lot of interesting things.

What I Built

To keep you motivated about reading the article, lets do a demo first and then we can go into details about how I built this.

Here is the GitHub repository for the interpreter, and the app on iTunes (Lambda Lisp). Feel free to file issues / contribute.

If you are still reading: Let’s build an interpreter!

Lexing

Lexing involves finding lexemes, or syntactical tokens, which can then be combined to interpret a grammatical sentence. In the expression $(+$ $1$ $2)$, the tokens are [$($, $+$, $1$, $2$, $)$]. Sophisticated compilers use lex or flex for finding these tokens, handling white-space, attaching a token type to each of them, etc.

I did not want to bloat up my simple interpreter by using lex / flex. I found this nifty one-line bare-bones Lexer in Peter Norvig’s article:

Peter Norvig’s one-line lexer 1 2 3 def tokenize ( chars ): "Convert a string of characters into a list of tokens." return chars . replace ( '(' , ' ( ' ) . replace ( ')' , ' ) ' ) . split ()

Essentially, what this does is to handle white-space (somewhat). It basically adds spaces around the brackets, and then splits the expression on white-space.

We need to do the replacement for all operators, but otherwise it works well. This is because LISP is simple enough that attaching types to tokens (and error-ing out, if required) can be done at the time of parsing. This is how I did it in Go, just for completeness sake.

My one-line lexer in Go 1 2 3 4 5 func tokenize ( exp string ) [] string { return strings . Fields ( strings . Replace ( strings . Replace ( exp , "(" , " ( " , - 1 ), ")" , " ) " , - 1 ), ) }

The recurrent theme in the design of this interpreter, is to be lazy and push the harder problems to the next layer.

Constructing an AST

Given an expression, we would need to make sure that the expression follows a structure, or a Grammar. This means two things in our case:

The s-expression should be well formed. The brackets should be balanced, should not be out of order, etc. Operators such as $+$, $*$, etc. get the right number of operands, have the right type of operands, etc.

At this stage, we are only concerned about well-formedness of the s-expression. We don’t care if the $+$ operator received incompatible operands, for instance. This means that given an expression like $(+$ $1)$, we would mark this expression to be okay at this point, because the expression is well-formed. We will catch the problem of too few operands to $+$, at a later time.

We can start solving the problem of checking well-formedness of the expression by using an Abstract Syntax Tree (or AST). An AST is a way of representing the syntactic structure of code. In this tree, the leaf nodes are atomic values, and all the non-leaf nodes are operators. Recursion can be naturally expressed using an AST.

This is how we can represent a node of this tree in the code:

Definition of ASTNode 1 2 3 4 5 type ASTNode struct { value string // Only valid for leaf nodes. isValue bool // Checks if this is a value (also if children == nil). children [] * ASTNode // Children of this AST Node. }

To actually verify the well-formedness of the expression and build the AST, we would go about it this way:

Building the AST 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 // This method gets a list of tokens, and returns: // 1. The ASTNode of the tree so formed. // 2. Unused tokens in the end of the array. // 3. Any error while constructing the tree. // Removed some error handling to make it a little brief. func buildAST ( tokens [] string ) ( * ASTNode , [] string , error ) { var token = "" tokensLen := len ( tokens ) // If it is an empty list of tokens, the AST is a nil node if tokensLen == 1 { token , tokens = pop ( tokens ) if ! isValue ( token ) { return nil , tokens , errStr ( "value" , token ) } // Create a one-node AST. node := new ( ASTNode ) node . isValue = true node . value = token node . children = nil return node , tokens , nil } else { token , tokens = pop ( tokens ) // Expect an opening bracket to start. if token != openBracket { return nil , tokens , errStr ( openBracket , token ) } node := new ( ASTNode ) node . isValue = false // Create a slice with 0 length initially. node . children = make ([] * ASTNode , 0 ) tokensLen = len ( tokens ) for len ( tokens ) != 0 && tokens [ 0 ] != closedBracket { var childNode * ASTNode = nil var err error = nil // If we get an opening brace, it means we need to recursively get the // AST of the sub-expression from here on. if tokens [ 0 ] != openBracket { token , tokens = pop ( tokens ) childNode , _ , err = buildAST ([] string { token }) } else { childNode , tokens , err = buildAST ( tokens ) } if err != nil { return nil , tokens , err } node . children = append ( node . children , childNode ) } // Expect a closing bracket when ending. token , tokens = pop ( tokens ) if token != closedBracket { return nil , tokens , errStr ( token , closedBracket ) } return node , tokens , nil } }

You can see how the grammar for interpreting the s-expression grammar is hard-coded in the code here. We expect the expression to be either a single value, or something like $(\mathrm{operator}\ \mathrm{o_1}\ \mathrm{o_2}\ …\ )$, where $\mathrm{o_i}$ can be an atomic value, or a nested expression.

Note that we construct the AST slightly differently. The operator is also part of the children .

Parsing & Evaluation

We combine the parsing and evaluation of the AST into one stage. The result of evaluating an AST is an Atom , which can either have a Value or an errror .

Defining Atom 1 2 3 4 type Atom struct { Err error Val Value }

Here is a stripped down AST evaluation code:

Evaluating the AST 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 func evalAST ( env * LangEnv , node * ASTNode ) Atom { var retVal Atom if node . isValue { retVal . Value , retVal . Err = getValue ( env , node . value ) return retVal } // Assuming that the first child is an operand symbol := node . children [ 0 ]. value operator := env . getOperator ( symbol ) // Skipped error handling related to operator & arguments for brevity. operands := make ([] Atom , 0 ) for i := 1 ; i < len ( node . children ); i ++ { v := evalAST ( env , node . children [ i ]) if v . Err != nil { return v } operands = append ( operands , v ) } v := operator . handler ( env , operands ) if v . Err != nil { return v } retVal . Val = v . Val return retVal }

Basic evaluation is very simple. We have a struct called LangEnv , which is the ‘environment’ data-structure storing amongst other things, defined operators. When evaluating an AST, if it is a single node, the value of the node is the result. Otherwise, we simply lookup the operator in the environment using getOperator , then resolve the operands recursively, and pass the operands to the operator. The operand deals with making sure that the operands are sane.

An operator looks something like this:

Defining Operator 1 2 3 4 5 6 type Operator struct { symbol string minArgCount int maxArgCount int handler ( func ( * LangEnv , [] Atom ) Atom ) }

As seen, symbol is the name of operator, so for the binary addition it can be “+”. handler is the function which will actually do the heavy lifting we have been avoiding all this while. It takes in a slice of Atom s (and a LangEnv , more on that later) and returns an Atom as a result.

Now, the fun stuff.

Type System

Remember Atom has a Value inside? Value is an interface, and any type which wants to be a Value , needs to implement the following methods.

What should a Value look like? 1 2 3 4 5 6 7 type Value interface { Str () string // Returns a string representation of the value. getValueType () valueType // Return the valueType (enum of all Values). to ( valueType ) ( Value , error ) // Convert to a different Value type. ofType ( string ) bool // Check if a given string is of this type. newValue ( string ) Value // Create a new Value of this type. }

This is enough power to figure out which value is of which type. In LangEnv we keep a list of builtin Value types, such as intValue , floatValue , stringValue , etc.

To deduce the type of a value, we simply do this:

Type deduction 1 2 3 4 5 6 7 8 9 func getValue ( env * LangEnv , token string ) ( Value , error ) { types := builtinTypes () for _ , t := range types { if t . ofType ( token ) { return t . newValue ( token ), nil } } return nil , errors . New ( fmt . Sprintf ( "Could not get type for token: %s" , token )) }

Now imagine an expression like $(+$ $1.2$ $3)$.

$1.2$ resolves to floatValue , and $3$ resolves to intValue . How would we implement the handler method for the $+$ operator to add these two different types? You might say, that this would involve casting of $3$ to floatValue . Now how do we decide what values to cast, and what type should we cast them to?

This is how we do it. In the method, typeCoerce , we try to find out which is the right value type to cast all our values to. It is declared like:

1 2 func typeCoerce(operatorName string, operands *[]Atom, typePrecendenceMap map[valueType]int) (valueType, error)

This is what we do inside typeCoerce :

We get the type : count mapping for all the param values. If there is only one type, there is nothing to do, return the corresponding type. Every value belongs to the same type. If there are multiple types in step 1, pick the one with the highest precedence. Try and cast all operand values to that type. Error out if any of them resists.

Hence, the $+$ operator could be implemented this way:

Defining the ‘+’ operator 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 op := & Operator { symbol : add , minArgCount : 2 , maxArgCount : 100 , // Just an arbit large value. handler : func ( env * LangEnv , operands [] Atom ) Atom { var retVal Atom var finalType valueType finalType , retVal . Err = typeCoerce ( add , & operands , numValPrecedenceMap ) if retVal . Err != nil { return retVal } switch finalType { case intType : var finalVal intValue finalVal . value = 0 for _ , o := range operands { v , _ := o . Val .( intValue ) finalVal . value = finalVal . value + v . value } retVal . Val = finalVal break case floatType : var finalVal floatValue finalVal . value = 0 for _ , o := range operands { v , _ := o . Val .( floatValue ) finalVal . value = finalVal . value + v . value } retVal . Val = finalVal break } return retVal }, } // Add the operator to the LangEnv's opMap (operator map) addOperator ( env . opMap , op )

Here we basically just call typeCoerce on the operands, and if its possible to cast them to one single type, we do that casting, and actually perform the addition in the new type.

The $+$ operator can be used to add strings as well. However, we don’t want something like $($$+$ $3.14\ \mathrm{“foo”})$. The typeCoerce method can be trivially extended to support a list of type valid type precedence maps, and all operands need to belong to the same precedence map. In this case, it could be { {intType: 1, floatType: 2}, {stringType: 1 } } . This particular list ensures that we don’t add ints and strings for example, because they don’t belong to the same precedence map.

Note that the entire implementation of the operator is defined in the Operator struct’s handler method. Whether or not the operator supports this sort of casting, or decides to roll its own, or not use it at all, is the prerogative of the operator.

Defining Variables

A typical variable definition could look like this: $(\mathrm{defvar}\ \mathrm{x}\ 3.0)$.

Now, defvar is an operator too. It expects the first argument to be of varType (matches the regex of a variable), and the value can be anything (except varType ). We just need to check if the type conditions match, and the variable is not a defined operator. If both are okay, we define the variable in our LangEnv ’s varMap .

We need to change the part in our evalAST method which to support variable lookup.

Changes in evalAST to support variables 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 func evalAST ( env * LangEnv , node * ASTNode ) Atom { // ... for i := 1 ; i < len ( node . children ); i ++ { v := evalAST ( env , node . children [ i ]) if v . Err != nil { return v } // <!-- Lookup code begins if v . Val . getValueType () == varType { v . Val , v . Err = getVarValue ( env , v . Val ) if v . Err != nil { return v } } // Lookup code ends --!> operands = append ( operands , v ) } // ... }

Here we can assume we have a helper method called getVarValue , which looks up the value of a variable in the varMap , or throws an error if required (this is also simple to implement).

Defining Methods

Defining methods is even more fun! We use the defun operator. Consider this:

1 (defun circle-area (r) (* 3.14 r r))

The first operand is the name of the method, the second is a list of variables that you would pass to the method, and the last is the actual method, assuming that the variables you need are already defined. (circle-area 10) after this definition should return 314 .

Calculating the area of a rectangle is pretty much similar.

1 (defun rect-area (x y) (* x y))

We need a couple of things for function calls to work fine:

Create a new astValue which can be used for keeping ASTs. So far we were keeping ints, floats and so on.

which can be used for keeping ASTs. So far we were keeping ints, floats and so on. Have a way to tell evalAST to not evaluate the AST in the defun arguments. This is because in circle-area , the (* 3.14 r r) itself is the value (AST value).

to not evaluate the AST in the arguments. This is because in , the itself is the value (AST value). The defun operator needs to add an operator to the opMap , with the same name as the method, and define its handler method.

operator needs to add an operator to the , with the same name as the method, and define its method. The handler would need to expect the same number of params as specified in the definition.

Up until now, our variables have been global in scope. Assume that there is a (defvar x 3.0) defining the variable x , followed by (defun foo (x) (+ 1 x)) which defines a method uses a param labelled x . The interpreter may look at the varMap and think that the programmer wants to use the global x , which is $3.0$. The actual intention of the programmer is to use the parameter x .

For this to work correctly, we would need: * A new LangEnv to be created, inside the handler . * First copy the same varMap as the parent LangEnv passed to the handler. * Then copy the params passed to the handler. Any duplicates will be overwritten, but all global definitions would be preserved. The variable defined in the inner scope would be visible. * Inside the handler, we will call evalAST to evaluate the AST we were provided in the method definition with the new LangEnv * We also keep track of the recursion depth in LangEnv , and it is incremented every time a recursive call is made. If it exceeds a large value (100000 for now), we can error out, so as to salvage the interpreter at least.

This is the only complicated part of the interpreter. Those interested in the code can check it out here.

Recursion

Apart from making sure that we have some sort of recursion depth limit enforced, recursion does not need any special handling. Except, defining some new operators like cond (the if-else equivalent), which are required for writing something useful.

Here is the implementation of the factorial function:

1 2 3 4 5 6 7 8 (defun fact (x) (cond ; conditional block ((= x 0) 1) ; if x == 0, return 1 (true ; else (* x (fact (- x 1))) ; return x * fact(x - 1) ) ) )

fact(10) returns 3628800 as expected.

Porting to iOS

Once I had the interpreter working fine, I wanted to run this on an iOS app. Why? Just for fun. It turns out with Go 1.5, a new tool called gomobile. Hana Kim gave a talk about this at GopherCon 2015.

What it does is, it compiles your Go package into a static library, and generates Objective-C bindings for it, and wraps them together in a nice iOS friendly .framework package.

There are a few restrictions regarding not being able to return complex types such as structs within structs, but apart from that it was fairly easy to use in my bare-bones app. Here is the code for the app, and we have already seen the demo earlier.

(Thanks to Dhruv Matani for reviewing this blog-post.)