7

Evaluating Expressions

This book is a work in progress! × If you see a mistake, find something unclear, or have a suggestion, please let me know. To follow its progress, please join the mailing list: (I post about once a month. Don’t worry, I won’t spam you.)

You are my creator, but I am your master; Obey! Mary Shelley, Frankenstein

If you want to properly set the mood for this chapter, try to conjure up a thunderstorm, one of those swirling tempests that likes to yank open shutters at the climax of the story. Maybe toss in a few bolts of lightning. In this chapter, our interpreter will take breath, open its eyes, and execute some code.

A decrepit Victorian mansion is optional, but adds to the ambiance.

There are all manner of ways that language implementations make a computer do what the user’s source code commands. They can compile it to machine code, translate it to another high level language, or translate to some bytecode format for a virtual machine that runs it. For our first interpreter, though, we are going to take the simplest, shortest path and execute the syntax tree itself.

Right now, our parser only supports expressions. So, to “execute” code, we will evaluate the expression and produce a value. For each kind of expression syntax we know how to parse—literal, operator, etc.—we need a corresponding chunk of code that knows how to evaluate that tree to produce a result. That raises two questions:

What kinds of values do we produce? How do we organize the code to execute expressions?

Taking them on one at a time…

Deep in the bowels of our interpreter are objects that represent Lox values. They are created by literals, computed by expressions, and stored in variables. To the user, they are Lox objects. But to us, the language implementer, they are defined in terms of the implementation language—Java in our case.

Here, I’m using “value” and “object” pretty much interchangeably. Later, in the C interpreter, we’ll make a slight distinction between them, but that’s mostly to have unique terms for two different corners of the implementation—in-place versus heap-allocated data. From the user’s perspective, the terms are synonymous.

So how do we want to implement Lox values? Aside from the basic “pass around and store in variables”, we need to do a few things with a given value:

Determine its type. Since Lox is dynamically typed, we need to be able to check the type of a value at runtime to make sure you don’t do things like subtract a string from a number.

Tell if the object is truthy or not. When the object is used in something like an if condition, we need to tell if that means the condition succeeds or not.

Tell if two objects are equal. We support == and != on all kinds of objects, so we need to be able to implement that.

If we can determine an object’s type, then we can implement truthiness and equality as methods that check the type and do the right thing for each kind of value. So, really, asking an object its type is the only fundamental operation.

Fortunately, the JVM gives us that for free. The instanceof operator lets you ask if an object has some type. There are also boxed types for all of the primitive types we need for Lox. Here’s how we map each Lox type to Java:

The other thing we need to do with values is manage their memory, but Java does that too. Built in instance checks and a really nice garbage collector are the main reasons we’re writing our first interpreter in Java.

Lox type Java representation nil null Boolean Boolean number Double string String

We’ll have to do a little more work later when we add Lox’s notions of functions, classes, and instances, but these are fine for the basic types. We don’t need any kind of wrapper classes or indirection. We won’t be, say, defining a LoxNumber class. Plain old Double works fine. When we want to store a Lox object in a variable, or pass it around, we statically type it as Object, since that’s a superclass of all of the above types.

Next, we need blobs of code to implement the evaluation logic for each kind of expression we can parse. We could stuff those into the syntax tree classes directly in something like an interpret() method. In effect, we could tell each syntax tree node, “Interpret thyself.” This is the Gang of Four’s Interpreter design pattern. It’s a neat pattern, but like I mentioned earlier, it gets messy if we jam all sorts of logic into the tree classes.

Instead, we’re going to break out our groovy Visitor pattern. In the previous chapter, we created an AstPrinter class. It took in a syntax tree and recursively traversed it, building up a string which it ultimately returned. That’s almost exactly what a real interpreter does, except instead of concatenating strings, it computes values.

We start with a new class:

lox/Interpreter.java

create new file package com.craftinginterpreters.lox ; class Interpreter implements Expr . Visitor < Object > { }

lox/Interpreter.java, create new file

It declares that it’s a visitor. The return type of the visit methods is Object, the root class that we use to refer to a Lox value in our Java code. To satisfy the Visitor interface, we need to define visit methods for each of the four expression tree classes we’ve defined. We’ll start with the simplest…

The leaves of an expression tree—the atomic bits of syntax that all other expressions are composed of—are literals. Literals are almost values already, but the distinction is important. A literal is a bit of syntax that produces a value. A literal always appears somewhere in the user’s source code. Lots of values are produced by computation and don’t exist anywhere in the code itself. Those aren’t literals. A literal comes from the parser’s domain. Values are an interpreter concept, part of the runtime’s world.

In the next chapter, when we implement variables, we’ll add identifier expressions, which are also leaf nodes.

So, much like we converted a literal token into a literal syntax tree node in the parser, now we convert the literal tree node into a runtime value. That turns out to be trivial:

lox/Interpreter.java

in class Interpreter @Override public Object visitLiteralExpr ( Expr . Literal expr ) { return expr . value ; }

lox/Interpreter.java, in class Interpreter

We eagerly produced the runtime value way back during scanning and stuffed it in the token. The parser took that value and stuck it in the literal tree node, so to evaluate a literal, we simply pull it back out.

The next simplest node to evaluate is grouping—the node you get as a result of using explicit parentheses in an expression.

lox/Interpreter.java

in class Interpreter @Override public Object visitGroupingExpr ( Expr . Grouping expr ) { return evaluate ( expr . expression ); }

lox/Interpreter.java, in class Interpreter

A grouping node has a reference to an inner node for the expression contained inside the parentheses. To evaluate the grouping expression itself, we recursively evaluate that subexpression and return it.

Some parsers don’t define tree nodes for parentheses. Instead, when parsing a parenthesized expression, they simply return the node for the inner expression. We do create a node for parentheses in Lox because we’ll need it later to correctly handle the left-hand sides of assignment expressions.

We rely on this helper, which simply sends the expression back into the interpreter’s visitor implementation:

lox/Interpreter.java

in class Interpreter private Object evaluate ( Expr expr ) { return expr . accept ( this ); }

lox/Interpreter.java, in class Interpreter

Like grouping, unary expressions have a single subexpression that we must evaluate first. The only difference is that the unary expression itself does a little work afterwards:

lox/Interpreter.java

add after visitLiteralExpr() @Override public Object visitUnaryExpr ( Expr . Unary expr ) { Object right = evaluate ( expr . right ); switch ( expr . operator . type ) { case MINUS : return -( double ) right ; } // Unreachable. return null ; }

lox/Interpreter.java, add after visitLiteralExpr()

First, we evaluate the operand expression. Then we apply the unary operator itself to the result of that. There are two different unary expressions, identified by the type of the operator token.

Shown here is - , which negates the result of the subexpression. The subexpression must be a number. Since we don’t statically know that in Java, we cast it before performing the operation. This type cast happens at runtime when the - is evaluated. That’s the core of what makes a language dynamically typed right there.

You’re probably wondering what happens if the cast fails. Fear not, we’ll get into that soon.

You can start to see how evaluation recursively traverses the tree. We can’t evaluate the unary operator itself until after we evaluate its operand subexpression. That means our interpreter is doing a post-order traversal—each node evaluates its children before doing its own work.

The other operator is ! :

switch (expr.operator.type) { lox/Interpreter.java

in visitUnaryExpr() case BANG : return ! isTruthy ( right ); case MINUS:

lox/Interpreter.java, in visitUnaryExpr()

The implementation is simple, but what is this “truthy” thing about? We need to make a little side trip to one of the great questions of Western philosophy: what is truth?

OK, maybe we’re not going to really get into the universal question, but at least inside the world of Lox, we need to decide what happens when you use something other than true or false in a logic operation like ! or other place where a Boolean is expected.

We could just say it’s an error because we don’t roll with implicit conversions, but most dynamically-typed languages aren’t that ascetic. Instead, they take the universe of values of all types and partition them into two sets, one of which they define to be “true”, or “truthful”, or (my favorite) “truthy”, and the rest which are “false” or “falsey”. This partitioning is somewhat arbitrary and gets weird in some languages.

In JavaScript, strings are truthy, but empty strings are not. Arrays are truthy but empty arrays are… also truthy. The number 0 is falsey, but the string "0" is truthy. In Python, empty strings are falsey like JS, but other empty sequences are falsey too. In PHP, both the number 0 and the string "0" are falsey. Most other non-empty strings are truthy. Get all that?

Lox follows Ruby’s simple rule: false and nil are falsey and everything else is truthy. We implement that like so:

lox/Interpreter.java

add after visitUnaryExpr() private boolean isTruthy ( Object object ) { if ( object == null ) return false ; if ( object instanceof Boolean ) return ( boolean ) object ; return true ; }

lox/Interpreter.java, add after visitUnaryExpr()

Onto the last expression tree class, binary operators. There’s a handful of them, and we’ll start with the arithmetic ones:

lox/Interpreter.java

add after evaluate() @Override public Object visitBinaryExpr ( Expr . Binary expr ) { Object left = evaluate ( expr . left ); Object right = evaluate ( expr . right ); switch ( expr . operator . type ) { case MINUS : return ( double ) left - ( double ) right ; case SLASH : return ( double ) left / ( double ) right ; case STAR : return ( double ) left * ( double ) right ; } // Unreachable. return null ; }

lox/Interpreter.java, add after evaluate()

Did you notice we pinned down a subtle corner of the language semantics here? In a binary expression, we evaluate the operands in left-to-right order. If those operands had side effects, that choice is user visible, so this isn’t simply an implementation detail. If we want our two interpreters to be consistent (hint: we do), we’ll need to make sure clox does the same thing.

I think you can figure out what’s going on. The only difference from the unary operator is we have two operands to evaluate. I left out one arithmetic operator because it’s a little special:

switch (expr.operator.type) { case MINUS: return (double)left - (double)right; lox/Interpreter.java

in visitBinaryExpr() case PLUS : if ( left instanceof Double && right instanceof Double ) { return ( double ) left + ( double ) right ; } if ( left instanceof String && right instanceof String ) { return ( String ) left + ( String ) right ; } case SLASH:

lox/Interpreter.java, in visitBinaryExpr()

The + operator can also be used to concatenate two strings. To handle that, we don’t just assume the operands are a certain type and cast them, we dynamically check the type and choose the appropriate operation. This is why we need our object representation to support instanceof .

We could have defined an operator specifically for string concatenation. That’s what Perl ( . ), Lua ( .. ), Smalltalk ( , ), Haskell ( ++ ) and others do. I thought it would make Lox a little more approachable to use the same syntax as Java, JavaScript, Python, and others. This means that the + operator is overloaded to support both adding numbers and concatenating strings. Even in languages that don’t use + for strings, they still often overload it for adding both integers and floating point numbers.

Next up are the comparison operators:

switch (expr.operator.type) { lox/Interpreter.java

in visitBinaryExpr() case GREATER : return ( double ) left > ( double ) right ; case GREATER_EQUAL : return ( double ) left >= ( double ) right ; case LESS : return ( double ) left < ( double ) right ; case LESS_EQUAL : return ( double ) left <= ( double ) right ; case MINUS:

lox/Interpreter.java, in visitBinaryExpr()

They are basically the same as arithmetic. The only difference is that where the arithmetic operators produce a value whose type is the same as the operands (numbers or strings), the comparison operators always produce a Boolean.

The last pair of operators are equality:

lox/Interpreter.java

in visitBinaryExpr() case BANG_EQUAL : return ! isEqual ( left , right ); case EQUAL_EQUAL : return isEqual ( left , right );

lox/Interpreter.java, in visitBinaryExpr()

Unlike the comparison operators which require numbers, the equality operators support operands of any type, even mixed ones. You can’t ask Lox if 3 is less than "three" , but you can ask if it’s equal to it.

Spoiler alert: it’s not.

Like truthiness, the equality logic is hoisted out into a separate method:

lox/Interpreter.java

add after isTruthy() private boolean isEqual ( Object a , Object b ) { // nil is only equal to nil. if ( a == null && b == null ) return true ; if ( a == null ) return false ; return a . equals ( b ); }

lox/Interpreter.java, add after isTruthy()

This is one of those corners where the details of how we represent Lox objects in terms of Java matters. We need to correctly implement Lox’s notion of equality, which may be different from Java’s.

Fortunately, the two are pretty similar. We have to handle nil / null specially so that we don’t throw a NullPointerException if we try to call equals() on null . Otherwise, we’re fine. .equals() on Boolean, Double, and String have the behavior we want for Lox.

What do you expect this to evaluate to: ( 0 / 0 ) == ( 0 / 0 ) According to IEEE 754, which specifies the behavior of double precision numbers, dividing a zero by zero gives you the special “NaN” (“not a number”) value. Strangely enough, NaN is supposed to be not equal to itself. In Java, the == operator on doubles preserves that behavior, but the equals() method on Double does not. Lox uses the latter, so doesn’t follow IEEE. These kinds of subtle incompatibilities occupy a dismaying fraction of language implementer’s lives.

And that’s it! That’s all the code we need to correctly interpret a valid Lox expression. But what about an invalid one? In particular, what happens when a subexpression evaluates to an object of the wrong type for the operation being performed?

I was cavalier about jamming casts into the previous code whenever some subexpression produces an Object and the operator wants it to be a number or a string. Those casts can fail. Even though the user’s code is erroneous, if we want to make a usable language, we are responsible for handling that error gracefully.

We could simply not detect or report a type error at all. This is what C does if you cast a pointer to some type that doesn’t match the data that is actually being pointed to. C gains flexibility and speed by allowing that, but is also famously dangerous. Once you misinterpret bits in memory, all bets are off. Few modern languages accept unsafe operations like this. Instead, most are memory safe and ensure—through a combination of static and runtime checks—that a program can never incorrectly interpret the value stored in a piece of memory.

It’s time for us to talk about runtime errors. I spilled a lot of ink in the previous chapters talking about error handling, but those were all syntax or static errors. Those are detected and reported before any code is executed. Runtime errors are failures that the language semantics demand we detect and report while the program is running (hence the name).

Right now, if an operand is the wrong type for the operation being performed, the Java cast will fail and the JVM will throw a ClassCastException. That unwinds the whole stack and exits the application, vomiting a Java stack trace onto the user. That’s probably not what we want. The fact that Lox is implemented in Java should be a detail hidden from the user. Instead, we want them to understand that a Lox runtime error occurred, and give them an error message relevant to our language and their program.

The Java behavior does have one thing going for it, though—it correctly stops executing any code when the error occurs. Let’s say the user enters some expression like:

2 * ( 3 / - "muffin" )

You can’t negate a muffin, so we need to report a runtime error at that inner - expression. That in turn means we can’t evaluate the / expression since it has no meaningful right operand. Likewise the * . So when a runtime error occurs deep in some expression, we need to escape all the way out.

I don’t know, man, can you negate a muffin?

We could print a runtime error and then abort the process and exit the application entirely. That has a certain melodramatic flair. Sort of the programming language interpreter equivalent of a mic drop.

Tempting as that is, we should probably do something a little less cataclysmic. While a runtime error needs to stop evaluating the expression, it shouldn’t kill the interpreter. If a user is running the REPL and has a typo in a line of code, they should still be able to keep going and enter more code after that.

Our tree-walk interpreter evaluates nested expressions using recursive method calls, and we need to unwind out of all of those. Throwing an exception in Java is a fine way to accomplish that. However, instead of using Java’s own cast failure, we’ll define a Lox-specific one so that we can handle it like we want.

Before we do the cast, we validate the type ourselves. So, for unary - , we add:

case MINUS: lox/Interpreter.java

in visitUnaryExpr() checkNumberOperand ( expr . operator , right ); return -(double)right;

lox/Interpreter.java, in visitUnaryExpr()

The code to check the operand is:

lox/Interpreter.java

add after visitUnaryExpr() private void checkNumberOperand ( Token operator , Object operand ) { if ( operand instanceof Double ) return ; throw new RuntimeError ( operator , "Operand must be a number." ); }

lox/Interpreter.java, add after visitUnaryExpr()

When the check fails, it throws one of these:

lox/RuntimeError.java

create new file package com.craftinginterpreters.lox ; class RuntimeError extends RuntimeException { final Token token ; RuntimeError ( Token token , String message ) { super ( message ); this . token = token ; } }

lox/RuntimeError.java, create new file

Unlike the Java cast exception, our class tracks the token that identifies where in the user’s code the runtime error came from. As with static errors, this helps the user know where to fix their code.

I admit the name “RuntimeError” is confusing since Java defines a RuntimeException class. An annoying thing about implementing languages is you often want names that collide with ones already taken by the implementation language. Wait until we get to implementing Lox classes.

We need similar checking to the binary operators. Since I promised you every single line of code needed to implement the interpreters, I’ll run through them all.

Greater than:

case GREATER: lox/Interpreter.java

in visitBinaryExpr() checkNumberOperands ( expr . operator , left , right ); return (double)left > (double)right;

lox/Interpreter.java, in visitBinaryExpr()

Greater than or equal to:

case GREATER_EQUAL: lox/Interpreter.java

in visitBinaryExpr() checkNumberOperands ( expr . operator , left , right ); return (double)left >= (double)right;

lox/Interpreter.java, in visitBinaryExpr()

Less than:

case LESS: lox/Interpreter.java

in visitBinaryExpr() checkNumberOperands ( expr . operator , left , right ); return (double)left < (double)right;

lox/Interpreter.java, in visitBinaryExpr()

Less than or equal to:

case LESS_EQUAL: lox/Interpreter.java

in visitBinaryExpr() checkNumberOperands ( expr . operator , left , right ); return (double)left <= (double)right;

lox/Interpreter.java, in visitBinaryExpr()

Subtraction:

case MINUS: lox/Interpreter.java

in visitBinaryExpr() checkNumberOperands ( expr . operator , left , right ); return (double)left - (double)right;

lox/Interpreter.java, in visitBinaryExpr()

Division:

case SLASH: lox/Interpreter.java

in visitBinaryExpr() checkNumberOperands ( expr . operator , left , right ); return (double)left / (double)right;

lox/Interpreter.java, in visitBinaryExpr()

Multiplication:

case STAR: lox/Interpreter.java

in visitBinaryExpr() checkNumberOperands ( expr . operator , left , right ); return (double)left * (double)right;

lox/Interpreter.java, in visitBinaryExpr()

All of those rely on this validator, which is virtually the same as the unary one:

lox/Interpreter.java

add after checkNumberOperand() private void checkNumberOperands ( Token operator , Object left , Object right ) { if ( left instanceof Double && right instanceof Double ) return ; throw new RuntimeError ( operator , "Operands must be numbers." ); }

lox/Interpreter.java, add after checkNumberOperand()

Another subtle semantic corner: We evaluate both operands before checking the type of either. Say we had a function say() that prints its argument then returns it. Consider: say ( "left" ) - say ( "right" ); Our implementation prints both “left” and “right” before reporting the runtime error. We could have instead specified that it checks the first operand before even evaluating the other.

The odd one out, again, is addition. Since + is overloaded for numbers and strings, it already has code to check the types. All we need to do is fail if neither of the two success cases match:

if (left instanceof String && right instanceof String) { return (String)left + (String)right; } lox/Interpreter.java

in visitBinaryExpr() throw new RuntimeError ( expr . operator , "Operands must be two numbers or two strings." ); case SLASH:

lox/Interpreter.java, in visitBinaryExpr()

That gets us detecting runtime errors deep in the bowels of the evaluator. The errors are getting thrown. The next step is to write the code that catches them. For that, we need to wire up the Interpreter class into the main Lox class that drives it.

The visit methods are sort of the guts of the Interpreter class, where the real work happens. We need to wrap a skin around it to interface with the rest of the program. The Interpreter’s public API is simply:

lox/Interpreter.java

in class Interpreter void interpret ( Expr expression ) { try { Object value = evaluate ( expression ); System . out . println ( stringify ( value )); } catch ( RuntimeError error ) { Lox . runtimeError ( error ); } }

lox/Interpreter.java, in class Interpreter

It takes in a syntax tree for an expression and evaluates it. If that succeeds, evaluate() returns an object for the result value. interpret() converts that to a string and shows it to the user. To convert any Lox value to a string, we rely on:

lox/Interpreter.java

add after isEqual() private String stringify ( Object object ) { if ( object == null ) return "nil" ; // Hack. Work around Java adding ".0" to integer-valued doubles. if ( object instanceof Double ) { String text = object . toString (); if ( text . endsWith ( ".0" )) { text = text . substring ( 0 , text . length () - 2 ); } return text ; } return object . toString (); }

lox/Interpreter.java, add after isEqual()

This is another of those pieces of code like isTruthy() that crosses the membrane between the user’s view of Lox objects and their internal representation in Java.

It’s pretty straightforward. Since Lox was designed to be familiar to someone coming from Java, things like Booleans look the same in both languages. The two edge cases are nil , which we represent using Java’s null , and numbers.

Lox uses double-precision numbers even for integer values. In that case, they should print without a decimal point. Since Java has both floating point and integer types, it wants you to know which one you’re using. It tells you by adding an explicit .0 to integer-valued doubles. We don’t care about that, so we hack it off the end.

Yet again, we take care of this edge case with numbers to ensure that jlox and clox work the same. Handling weird corners of the language like this will drive you crazy but is a vital part of the job. Users rely on these details—either deliberately or inadvertently—and if the implementations aren’t consistent, their program will break when they run it on different interpreters.

If a runtime error is thrown while evaluating the expression, interpret() catches it. This lets us report the error to the user and then gracefully continue. All of our existing error reporting code lives in the Lox class, so we put this method there too:

lox/Lox.java

add after error() static void runtimeError ( RuntimeError error ) { System . err . println ( error . getMessage () + "

[line " + error . token . line + "]" ); hadRuntimeError = true ; }

lox/Lox.java, add after error()

It uses the token associated with the RuntimeError to tell the user what line of code was executing when the error occurred. It would be even better to give the user an entire call stack to show how they got to be executing that code. But we don’t have function calls yet, so I guess we don’t have to worry about it.

After showing the error, it sets this field:

static boolean hadError = false; lox/Lox.java

in class Lox static boolean hadRuntimeError = false ; public static void main(String[] args) throws IOException {

lox/Lox.java, in class Lox

That field only has one minor use:

run(new String(bytes, Charset.defaultCharset())); // Indicate an error in the exit code. if (hadError) System.exit(65); lox/Lox.java

in runFile() if ( hadRuntimeError ) System . exit ( 70 ); }

lox/Lox.java, in runFile()

If the user is running a Lox script from a file and a runtime error occurs, we set an exit code when the process quits to let the calling process know. Not everyone cares about shell etiquette, but we do.

If the user is running the REPL, we don’t care about tracking runtime errors. After they are reported, we simply loop around and let them input new code and keep going.

Now that we have an interpreter, the Lox class can start using it. It creates and stores an instance of it:

public class Lox { lox/Lox.java

in class Lox private static final Interpreter interpreter = new Interpreter (); static boolean hadError = false;

lox/Lox.java, in class Lox

We make it static so that successive calls to run() inside a REPL session reuse the same interpreter. That doesn’t make a difference now, but it will later when the interpreter stores state like global variables. That state needs to persist throughout the entire REPL session.

Finally, we remove the line of temporary code from the last chapter for printing the syntax tree and replace it with this:

// Stop if there was a syntax error. if (hadError) return; lox/Lox.java

in run()

replace 1 line interpreter . interpret ( expression ); }

lox/Lox.java, in run(), replace 1 line

We have an entire language pipeline now: scanning, parsing, and execution. Congratulations, you now have your very own arithmetic calculator.

As you can see, the interpreter is pretty bare bones. But the Interpreter class and the visitor pattern we’ve set up today form the skeleton that later chapters will stuff full of interesting guts—variables, functions, etc. Right now, the interpreter doesn’t do very much, but it’s alive!

Challenges Allowing comparisons on types other than numbers could be useful. The syntax is shorter than named function calls and might have a reasonable interpretation for some types like strings. Even comparisons among mixed types, like 3 < "pancake" could be handy to enable things like heterogeneous ordered collections. Or it could lead to bugs and confused users. Would you extend Lox to support comparing other types? If so, which pairs of types do you allow and how do you define their ordering? Justify your choices and compare them to other languages. Many languages define + such that if either operand is a string, the other is converted to a string and the results are then concatenated. For example, "scone" + 4 would yield scone4 . Extend the code in visitBinaryExpr() to support that. What happens right now if you divide a number by zero? What do you think should happen? Justify your choice. How do other languages you know handle division by zero and why do they make the choices they do? Change the implementation in visitBinaryExpr() to detect and report a runtime error for this case.