Plan to throw one away

How I had to write a JavaScript engine in two weeks.

This is a software development story from early in my career, some time in the late 1990s. The company in question went bust many years ago, so it’s probably safe to tell the tale now.1

✴

When I had interviewed for the job, the interviewer had explained that I was needed to work on their JavaScript engine. The company was a web development tools startup, and they had spotted that it would be a good idea to have JavaScript code running on the server as well as on the client, so that you could share your business logic and validation code between the client-side and server-side parts of your application. There are now lots of products in this space but back then was probably too soon for this to be a success.

Anyway, the interviewer told me that the JavaScript engine was nearly done, but there were one or two features that were missing, and it was running a bit slowly, so I might have to do a bit of optimization. Sure, I said, that sounded like it would be fun. Oh, and it had to be done for the beta release in two weeks’ time, so I would have to get my finger out.

So, on my first day in the office I needed to make a plan for what was left to be done. Exactly which features were missing? Was there a list? No, there was no list. Had any testing been done (with only two weeks to go, surely there should have been some)? No, there had been no testing. Could I talk to the programmer who had written the engine? No, he was the CTO of the company, and he was very very busy preparing for the release. So there was nothing to be done except check out the source code for the engine from CVS, compile it, and sit down with the first edition of David Flanagan’s excellent JavaScript: The Definitive Guide and work through the features. Sure enough, “one or two” things were missing. Global objects and functions like Math and parseInt weren’t implemented. There didn’t seem to be any string methods. Arrays were missing. Object prototypes didn’t work. At least the core syntax was working, but it did seem very slow. Very slow. I mean, interpreted code can easily be a hundred times slower than compiled code, but it shouldn’t take several seconds to evaluate something like for (i=0; i<1000; ++i); .

Clearly the most important thing was to figure out what was going on inside the interpreter. So I started to read the code. And all became clear:

it was a hand-coded recursive descent parser (fair enough, but not the easiest thing to maintain, because the syntax and the parser logic were not separated, and because it was very long-winded);

with lexical analysis inlined in the parser (not a complete disaster, but it led to lots of duplicated code);

with lexical analysis using string matching rather than finite state machines (this was inefficient because attempted matches could fail many times before a token was finally accepted);

with evaluation interleaved with the parsing (very dodgy: it meant that programs would start running even if there were syntax errors later in the code);

and no data structure representing the parsed code (disaster!).

The failure to store a represention of the parsed code was disastrous because it meant that in order to evaluate an expression or statement for a second time, the engine had to parse it a second time. And if a loop executed a thousand times, then the engine had to parse the loop body (and termination condition) a thousand times. No wonder it was slow!

But how was I going to fix this disaster of an engine in the two weeks I had available? Normally under this kind of deadline it’s imperative that you work with what you’ve got, because if you try to rewrite from scratch then there’s a risk that you’ll end up with nothing. But sometimes the code is so bad that there’s nothing for it but to throw everything away and start again. Even if it’s the CTO who wrote it.2

Now, if there was one thing that I had been taught in my computer science degree, it was how to quickly implement a programming language. You take two very powerful tools:

Lex (which takes a description of the language’s source code tokens in the form of regular expression patterns, and outputs a finite-state-machine based lexical analyzer in C); and

Yacc (which takes a description of the syntax of a language in Backus–Naur-like format, and outputs an LALR parser)

and use these to build your parser. Then you use the syntax-directed translation technique to turn the output of the parser directly into stack-oriented byte code without having to represent the parse tree as a data structure. (This works neatly for context-free language syntaxes because each expression in the parse tree corresponds to pushing a value onto the stack in the byte code.) Finally, you need an engine to evaluate the byte code, but you can get away with only a handful of byte code instructions, and so the actual engine can be very small.

I didn’t tell anyone what I was doing, because I suspected that if I did, then someone would try to stop me. Luckily everyone was so busy in the run-up to the release that they had no time to ask how their new hire was coming along with a last-minute rewrite of one of the critical components of the product, and in about a week I had got my new interpreter roughly up to par with the interpreter I was throwing away (up to par as far as features were concerned—it was of course thousands of times faster). Then it was a case of implementing as many of the JavaScript types, objects, functions and methods as I could in the remaining time. I found the first edition of the ECMAScript standard enormously useful at this point. It seems to have be written as a guide to language implementers, setting out features in a logical order, starting with the object representation. The semantics of operations are defined in terms of “internal properties” on objects like “[[Get]]” and “[[Put]]”, which of course as an implementer you turn directly into Get and Put methods on the classes you’re using for your object representations. Here’s a typical section from the standard:

8.6.2.1 [[Get]](P)

When the [[Get]] method of O is called with property name P, the following steps are taken: If O doesn’t have a property with name P, go to step 4. Get the value of the property. Return Result(2). If the [[Prototype]] of O is null, return undefined. Call the [[Get]] method of [[Prototype]] with property name P. Return Result(5).

which you’ll see is trivial to turn into an implementation. I’ve heard complaints about this standard from people who tried to learn how to program in JavaScript by reading it. Sorry, but it wasn’t written for you, it was written for me! You should have bought David Flanagan’s book instead.

In the course of implementing the language semantics, I realized that the original implementation (the one that I had thrown away) was hopelessly wrong in many cases. It had obviously been reverse-engineered from Flanagan’s guide, and not implemented directly from the standard, and consequently many details, like step (4) in the quote above, were missing or broken. It would have taken far longer than I took to write my implementation just to find and fix these kinds of errors in the semantics.

Anyway, by the time the two weeks were up and the beta release was made, I had produced a working JavaScript interpreter that included the core language semantics and a few of the more important global objects, and which was not much slower than the JavaScript implementation in Netscape Navigator. Mission accomplished! One thing that I had not had time to do was to write a garbage collector. This meant that any program using the engine would keep allocating until it eventually exited with an “out of memory” error.3 But I reasoned that this was a beta release, and most likely people would be running small and short-lived programs, and that given the limited time available, it was more important to add features than to improve the quality of the implementation. I decided that I wouldn’t even start work on the garbage collector until someone actually ran out of memory and complained about it to me.4