Part B: Building our own compiler

1. Building a JavaScript code generator

We will be starting with the below architecture. With the aim to produce a transformed file (index.es5.js) and source map (index.es5.js.map) after compilation.

Our src/index.es6.js will look like this (a simple “add” function):

So now we have our pre-compiled source code. We want to start looking at the compiler.

THE PROCESS

There are a couple of steps our compiler must perform:

1. Parse the code to AST

As this article is not focusing on parsing, we will use a basic 3rd party tool for this (esprima or escodegen)

2. Add a shallow clone of each node onto the AST

This idea was borrowed from recast. The idea is that each Node will hold itself as well as a clone of itself (i.e. the original). The clone is used to check if the Node has changed. More about this later.

3. Transformation

We will manually be doing this. We could have used a library such as ast-types or @babel/types as they have useful APIs.

4. Generate source code

Turn our AST into JavaScript.

5. Add source map support

4 and 5 are done at the same time as above. This will involve traversing the tree and detecting where the AST node has changed with its “original” property. For those instances store a mapping between the “original” and the “generated” code.

6. Write to build/

Finally write our generated source code and its source map to the appropriate file.

THE CODE

Let us look at these steps again, but this time in more detail.

Parse the code to AST

Using a basic 3rd party tool (I went for a simple one called ast), we grab our file contents and pass them into the libraries parser.

2. Add a shallow clone of each node onto the AST

First we define a function called “visit” with the job of traversing the tree and executing our callback function on every single Node.

Here we are doing a “depth-first search” as mentioned above. For a given Node it will:

Execute the callback Check for the location property, if so return early Check for any properties which are arrays, if so call itself with each child Check For any properties which are AST Nodes, if so call itself with the node.

Next we move onto producing our clones.

Our cloneOriginalAst function produces a clone of the Node and appends that onto the original.

For our cloning we use Object.assign so it is a shallow clone and copies the top-level properties. The nested props are still connected by pass-by-reference i.e. changing them will change the clone. We could have also used the spread operator here as that does the same thing. We will do our comparison using the top-level which is enough to compare 2 AST nodes and determine if the node has changed or not.

Overall our code here will return the same tree except with “original” property on every single Node.

3. Transformation

Next we will do our node manipulation. We will keep it simple so are going to just swap 2 nodes from our program. So we will start with:

number + 1

And will end with:

1 + number

Simple in theory right !

Our code to do the swap is below:

We have not used a clean API to do this (which many libraries provide) as we have manually swapped the 2 nodes.

An example of using a library with a helpful API could look something like below, provided by the documentation on ast-types.

This way is certainly safer, easier to follow and faster to develop with. So in general I would recommend using it for any complex AST manipulation, most big name compilers do.

4. Generate source code

Code generators are typically housed in a single file and are several thousand lines long. For example escodegen’s compiler is 2,619 lines (see here). That is on the smaller side compared to others (crazy right!)

I have used much of the same code for our compiler (as most generators need very similar logic to process AST into JavaScript) EXCEPT only what is absolutely necessary for us to process the code from our “index.es6.js” file.

Below I have defined the 3 types of code we have inside our compiler.

a) Node processors and character utilities

These are general utility functions used to process AST nodes (depending on the type e.g. a function declaration will have an identifier) and build source code. It also includes some common character constants (e.g. a “space”). They are called from our code “type statements” in the next section.

I would not worry too much about the details here unless you plan on writing a compiler. This was largely borrowed from the generator in escodegen here.

b) Type statements

This is an object holding functions which are tied to an AST node type. Each contains the logic necessary to process that AST node type and produce source code. For example for a function declaration it contains all possible variations of arguments, identifiers, logic and return types. There is a level of recursion that is common here i.e. for a type statement to trigger another type statement which might trigger another etc.

Here we ONLY have the necessary statement functions to process our “index.es6.js” file, so it is fairly limited. You can see how much code is required just to process our AST tree of 3–4 lines of code (in addition to that of the above section).

Again this has borrowed from “escodegen here” so please feel free to ignore the details, unless you plan to write your own compiler.

c) Process code statements

Lastly we are going to iterate over the program body (i.e. each line of code) and start running our generator. This will now return an array called “code” which contains every line of our newly generated source code.

6. Write to build/

We are going to skip step 5 for now and complete the core elements of our compiler. So for this step we will

Add a source map location to our generated code (we will build this in the next section)

Produce a bundle for the generated code (joining our array of code together), and copy the original code so that the browser can see it (this is only 1 approach to this).

5. Add source map support

There are 4 requirements when it comes to building a source map:

Store record of source file Store record of generated file Store mappings of line/columns Display in Source Map file using spec version3

For a quick win we can use the library which almost every JavaScript code generator uses called source-map. It is from Mozilla and handles storing of points 1–3 as well as the processing the mappings into Base64 VLQ (step 4).

Little reminder what a source map looks like with mappings highlighted (from way above):

The mappings are Base64 VLQ, but what is that?