Is a universal web bytecode worth the trouble creating it? Is LLVM the solution? Which is better at running native code in the browser: Mozilla asm.js or Google PNaCl? This article contains opinions expressed on the web on these issues.

A comment by Raniz on an ArsTechnica post regarding video codecs written in JavaScript sparked a series of reactions in the comments section and on the web. Raniz suggested a “standardized bytecode for browsers [that] would (most likely) allow developers a broader range of languages to choose from”, letting developers the option to choose the language they like for web programming without having to use JavaScript. The bytecode would be, like the JVM or CLR bytecode, a common platform for web development. The idea sounds interesting at first glance, and some even suggested using LLVM’s bitcode as the intermediary “bytecode.” There are already LLVM compilers for many languages including ActionScript, Ada, D, Fortran, Haskell, Java bytecode, Objective-C, Python, Ruby, Rust, Scala and C#.

The main problem with LLVM bitcode is that it is target dependent, i.e. the bitcode generated for different architectures is different, unlike Java which has identical bytecode for different targets, the JVM taking care of generating the native code for the machine it runs on. And there are a series of other problems with a universal web bytecode, some of them plaguing LLVM bitcode too(more details here), problems noted by msclrhd in his comment, from which we extract some excerpts:

The problem with standardizing on a bytecode is that you are restricting how the browser optimizes the JavaScript code… You also have the problem of what bytecode to standardize on -- each JavaScript engine will have a different set of bytecodes with different semantics. All engines will need to agree on the bytecode to use. There are also other considerations as the string representation differs between engines (V8/Chrome has an ASCII string variant; Mozilla keeps them all in UTF-16) and type representation (e.g. Firefox has "fatvals" that are 64-bit value types with 32-bits for the type and 32-bits for the value; 64-bit doubles take advantage of the representation of NaN values… If the bytecode is binary, you have endian issues, floating point representation issues, etc.

Alon Zakai, a researcher for Mozilla working on Emscripten and asm.js, wrote an entire blog post on universal web bytecode, outlining some of the difficulties to be encountered in pursuing such a goal:

Some people want one bytecode, others want another, for various reasons. Some people just like the languages on one VM more than another. Some bytecode VMs are proprietary or patented or tightly controlled by a single corporation, and some people don't like some of those things. So we don't actually have a candidate for a single universal bytecode for the web. What we have is a hope for an ideal bytecode - and multiple potential candidates.

Zakai also made a list of requirements such a bytecode should meet:

Support all the languages

Run code at high speed

Be a convenient compiler target

Have a compact format for transfer

Be standardized

Be platform-independent

Be secure

While Zakai does not give much chance to a new bytecode to meet the requirements, he does see JavaScript as the right candidate: “arguably JavaScript is already very close to providing what a bytecode VM is supposed to offer, as listed in the 7 requirements above,” also mentioning what’s still missing in JavaScript:

At this point the main missing pieces are, first (as already mentioned) improving language support for ones not yet fully mature, and second, a few platform limitations that affect performance, notably lack of SIMD and threads with shared state. Can JavaScript fill the gaps of SIMD and mutable-memory threads? Time will tell, and I think these things would take significant effort, but I believe it is clear that to standardize them would be orders of magnitude simpler and more realistic than to standardize a completely new bytecode. So a bytecode has no advantage there.

After outlining more difficulties in creating a universal VM – type conflicts between languages, garbage collection issues – Zakai concludes:

So I don't think there is much to gain, technically speaking, from considering a new bytecode for the web. The only clear advantage such an approach could give is perhaps a more elegant solution, if we started from scratch and designed a new solution with less baggage. That's an appealing idea, and in general elegance often leads to better results, but as argued earlier there would likely be no significant technical advantages to elegance in this particular case - so it would be elegance for elegance's sake.

While it seems that a universal bytecode does not stand much chance to succeed, there are still at least two major attempts at bringing other languages to the web. Both have started with C/C++ but efforts can be relatively easily extended to other languages, and, interestingly enough, both use LLVM:

Mozilla: C/C++ –> LLVM bitcode –> Emscripten –> asm.js –> Browser

Google: C/C++ –> LLVM bitcode –> PNaCl –> Browser

asm.js is an attempt at standardizing a subset of JavaScript that would run in any browser, containing constructs that can be better optimized for speed by a JavaScript engine. Emscripten is another project that generates asm.js from LLVM bitcode. According to Zakai, C++ code runs in Firefox via asm.js at 50% the speed of native code, and they expect the performance to improve over time.

PNaCl, recently announced by Google and covered in detail by InfoQ, runs C/C++ code in the browser in a sandbox at 80-90% of the native code speed with room to improve, according to David Sehr. While the performance is significantly better than Mozilla’s, it comes at a price: PNaCl has been in development for more than 2 years. It’s pretty hard to deal with endian issues, different pointer sizes, different floating point representations, etc. on multiple architectures. It would be simpler to enhance Chrome to include asm.js optimizations. But, on the other hand, asm.js may be too slow, as yab**uz commented:

And I will never use asm.js. Simply because it's too slow on non asm.js supported browsers. Epic Citadel at 20 fps on the latest Core i7-3770K is a joke. Slower than Flash Player!

JavaScript, a language created by Brendan Eich in 10 days in 1995, was meant to be a client scripting language that would infuse some dynamism to the static web pages of that time. Perhaps nobody foreseen the role this little language would play almost two decades later in spite of all the criticism and flaws it carried with it. JavaScript is heavily used today on the client side in all major browsers and it is making inroads on the server side especially because of Node.js’ popularity. And that’s not because JavaScript is such a brilliant language, but because it’s so hard to bring major players together to work on a better solution and to switch all the gears of the software industry. Like HTTP and HTML, JavaScript is going to thrive in spite of its shortcomings and the fact that we all know that we could do better, if we just agreed on it.

Now that we are stuck with JavaScript, will we have at least a universal web bytecode? Do we need one? Will attempts to run code written in others languages in the browser, such as Mozilla’s asm.js or Google’s PNaCl get traction? Which is better: asm.js or PNaCl? Have your say in the comments.