There are many different kinds of bugs. There are the simple ones: you see the error message, facepalm yourself and fix it. There are the hard ones. Different components of a complex software interact with each other in unforeseen ways and some good debugging love is required.

Of course there are even harder bugs once concurrency joins the party, debugging becomes a nightmare or reproducing the bug is complicated. Everytime I try to find some concealed bug, I try to ask myself what could go wrong and look at the code before I use the debugger.

Sometimes you can fix a really hard bug by trying to reason about the code, rather than stepping through it. We had a bug in our register allocator once and the best way to fix it was to write the algorithm down on paper and to figure out what is going on.

Last but not least there is the Heisenbug. The most diabolic bug. Once you start trying to understand its behaviour, it disappears and is no longer present. The system changes its state as you start to observe it. Those bugs are really hard to track down. Especially when the code looks so inconspicuous.

function toInt(value, radix) {

return value / radix | 0;

}

What could possibly go wrong? We have to start from the beginning in order to understand how it took me three days of debugging, building V8 from source and why this is an actual Heisenbug.

Where do these Exceptions come from?

We build a compiler with a Java bytecode frontend and various backends, one being JavaScript. Our compiler is lazy in the same way a Java Virtual Machine evaluates and verifies bytecode. You may have a method referencing a class that isn't present at runtime. If that method is never called, you’ll never get the ClassNotFoundException. The same applies to our compiler. One could say its various phases, abstract interpretation and platform agnostic type system is already quite complex. Therefore I am always looking for a mistake in our system first. The web backend contains more than 35 compiler phases, excluding the optimizer pipeline. Plenty of stuff that can go wrong.

Invalid int: “15”

One day I was presented with a NumberFormatException inside an application that did run fine otherwise. The message was odd. It said that 15 is an invalid integer. I reloaded the app and it worked fine. Nothing to worry about. Maybe an old error I introduced somewhere and the JavaScript file was cached. I am running Ubuntu and the Chrome developer channel. I am used to odd behaviour and got other things on my plate that day.

The bug disappeared and I didn't experience it again. For a while.

Faster, Faster

Building a compiler means you are out for a long lasting quest. The ultimate goal: maximum performance. Our JavaScript backend makes a tradeoff between code size and performance. You could achieve a smaller footprint if you accept a certain performance hit. We tried this for instance with keywords like null, true and false. Binding them to global single-character variables reduces code size but comes with a 20% performance hit in the benchmarks.

Renaming has the biggest impact on code size. Simply speaking, every name in a Java class needs to be fully qualified to resolve in JavaScript. Here is an example:

class Foo {

public int value;

} class Bar extends Foo {

public int value;

} void set(Bar bar) {

bar.value = 123;

} int get(Foo foo) {

return foo.value;

} void main(String[] args) {

Bar bar = new Bar();

set(bar);

System.out.println(get(bar));

}

What does it print? The result will be 0, since Foo::value is not Bar::value. How do we model this behaviour when emitting JavaScript? We can't simply generate “this.value” since it’s ambiguous. One of our last phases resolves all names so that they become unique. In fact we'd probably emit code like “bar.Bar_value = 123” and “return foo.Foo_value” for the getter and setter function.

Minification plays an important role. Of course, assigning names could be done much smarter but it’s a time consuming task once you start thinking about it. In fact it is so complicated that we won’t do it by default.

The first minifier we wrote followed a simple algorithm. Use any valid identifier-start character and continue with all possible identifier-part characters. The first name we'd encounter would become “a”, the next “b” and so on until we reach “Z”. The next name would be “aa”, “ab” et cetera. What’s the problem with this algorithm?

First of all we could count the occurrences of a name in the whole program and assign it the shortest sequence as a heuristic. But even worse, what if we generate a name like “abs”, “sqrt” or “sin”? Programs are large enough so that we create names that start clashing with JavaScript APIs. defrac allows developers to extend native classes with custom functions and properties. In that case we could create a conflict. The first solution was to choose a prefix that isn't used in the APIs, like _ or $. Now all of our variables are named “_a” and so on. I didn't like this for various reasons. Other JavaScript frameworks could define their own properties, JavaScript developers could define their own properties and the whole interop layer becomes unstable as a result. Also if we start to use an extra byte, isn't there a better solution?

Unicode. (Wow, I couldn't believe I actually write that some day)

I started generating Unicode characters for our identifiers. They are only one character, but still two (or more!) bytes — no reduction in file size.

On the other hand V8 is a very complicated piece of software with all different kinds of heuristics and tricks to make JavaScript code fly. One is lazy compilation. And one heuristic when to inline a method is based on the number of characters it contains. Why the characters? Sounds stupid at first. But if you never generated the AST, what else could you use as a heuristic?

In V8 terms, “_a” is two characters whereas “ツ” is only one character. Pretty cool, huh? Due to our change in the minifier, we're able to trick V8 into optimizing more code. And that’s exactly when the exceptions started to show up. Only I didn't know. And I didn't notice at that time.

“Tobias isn't able to start Audiotool anymore!”

Audiotool is rebuilding their application with defrac. One morning André Michelle came to me and told me that someone from the office wasn't able to launch Audiotool anymore. On a Mac. With a beta version of Chrome.

I got interested and went to Tobias, asking what happened. There it was.

Illegal int: “251”

I immediately remembered the message from about a month ago. But this time it was the beta version of Chrome on a normal operating system. I tried to reproduce it on my system and loaded the HTML5 app. It presented me with “Illegal int: 16”. I reloaded, “Illegal int: 251”, again, “Illegal int: 134”. Why are those numbers so random? Usually, I'd expect it to fail with the same message and besides: 16, 251 and 134 are not illegal integers!

Backpropagation

Artificial neural networks learn with a technique called backpropagation. You present them with the desired output for a given input and propagate the weights of the synapses backwards throughout the layers of the network. I like to think of my way of tracking down bugs the same way. Audiotool is a large piece of software with more than 500kloc on the client. The JavaScript file we generate is approx. 2.5MB, minified. Obviously the first idea is to start from the generated code where the exception is thrown, backwards to the initial code triggering the issue.

This already happened a couple of times. We didn’t obey the JVM semantics quite right in the early days and saw some rendering glitches. The reason was that some totally unrelated code generated an invalid value that propagated into the rendering system. Those bugs are not easy to find. A stack-trace doesn't really help.

So I started looking at the implementation of Integer.parseInt; that’s the one that throws. Nothing suspicious. I built a small example app and tried parsing some integers. It worked. So if this code is correct, something else must be broken. As I said, Audiotool is quite big and the minified code contains all kinds of meaningless characters, no spaces and line breaks. I cloned the repo and created a debug build with defrac. Eager to understand what was happening I started the app — but I didn't get the exception anymore.

Searching for a culprit

Now there is an obvious difference. “web:config debug true” and the exception is gone. “web:config debug false” and it appears again. Each time with different value which is in fact a valid integer.

Let’s do a usage search for the the debug setting in the compiler.