WebAssembly 102

This cartoon post does a great job of explaining what WebAssembly is and why it matters. If you’re unfamiliar with WebAssembly, give it a read — it’s the best 5 minute introduction I have found. I’m going to skip past the overview and jump right in to implementation details.

To compile any language to WebAssembly, you will need the Emscripten toolchain. I started by following the instructions and tutorial here to get that installed, the emsdk compiled, and my first hello.c running in the browser.

Output Format

One thing that might strike you as curious is the output format of emcc . While it can be used to output straight WebAssembly, there is a decent amount of setup that needs to be done on the browser to actually load and run the .wasm file. When you run emcc with the -o {filename}.html flag, Emscripten helpfully generates a loader index.js file and a sandbox index.html file that you can use to play around with your WebAssembly code.

The Emscripten sandbox

This sandbox is extremely valuable for debugging, it gives you a quick way to REPL your module and captures all of the output of your module on screen. I found myself, while debugging my build, running python -m SimpleHTTPServer 3005 in the output directory of emcc for a super light little development environment.

Once I had finished my development, I incorporated that same loading boilerplate into my app bundle, which exposes the Module object Emscripten uses to mount your WebAssembly interface to.

While it is easy to do development like this, it is not particularly efficient. You need to do a lot of manual reloading and copy your compiled files over to their target project after you are done. I think there is an opportunity for a Webpack plugin that will take a source c , cpp , or rust file, perform the emcc compilation, and automatically include the results in your project.

Bindings, Three Ways

After you have played around a bit with hello world and the sandbox, you will quickly see the need to do Javascript <-> WebAssembly bindings. A binding is a way to reference Javascript from WebAssembly, or vice versa. It describes the interface between WebAssembly and the rest of your code.

One problem I had getting started with WebAssembly was the sheer number of different ways you could do these bindings. By my count there are three unique ways to do WebAssembly -> Javascript:

ccall, cwrap and direct function calls

WebIDL

Embind

The first two ways of doing bindings are for C code only. They involve passing a command line flag to describe your exported functions:

emcc test.cpp -s WASM=1 -s EXPORTED_FUNCTIONS=”['_int_sqrt']" -o index.html

With this EXPORTED_FUNCTIONS flag, you tell the compiler to make a binding to that particular function. You can then reference it from Javascript in a number of ways:

That last method of invocation seems like it would be the simplest, and it certainly is for integers and strings. For more complicated data structures, however, it gets quite a bit trickier as you need to tell it exactly how it can convert your Javascript data into a structure that C can understand.

I would recommend this style if the code you are optimizing is constrained to a single function or small set of functions that take relatively simple input — numbers and strings are perfect. If you are going to be passing around larger structures, I would instead suggest writing in C++ so you can leverage Embind and WebIDL.

What is WebIDL?

WebIDL is a W3C specification that describes a standard language for interfaces between external code and the browser. Similarly, it can be used to describe the interface between WebAssembly and Javascript. It uses a very familiar C++-style notation:

For WebAssembly, Emscripten provides a python script that consumes an IDL file and turns it into glue code that you then include when compiling your wasm . Unfortunately, I was not able to get this pipeline working well for my classes and so I decided to try Embind instead.

I do think that as the WebAssembly ecosystem matures further, WebIDL will be the preferred method for describing the interfaces between WebAssembly and Javascript. It is a standard component of the browser contract already and provides a very logical way of expressing these interfaces.

Embind to the rescue

Embind allows us to write WebAssembly to Javascript bindings from directly within our C++ code. I found its syntax a little opaque at times, but it is by far the most well-supported binding method and worked well out of the box.

When using Embind, you provide a block of binding declarations somewhere in your cpp file. It looks like:

This declares 3 classes that I can now access through Javascript off of Emscripten’s Module object, and also exposes a meta-class of VectorHNode that describes an array of HNodes. I can use these (mostly) just like any old Javascript class, e.g.:

One thing to note is that I have declared the allow_raw_pointers flag on the Hierarchy.pack method. This is to tell Embind that I am going to use pointers of any kind as either an input or a return value from this method. Embind doesn’t have any way of type-checking these pointers so this is basically an opt-in to undefined behavior and Bad Things™ if I screw up either the method call or the handling of the pointer inside the method.

How do I debug this stuff?

Let’s assume that I goof and pass the wrong thing to my allow_raw_pointers() 'd method — what then? Luckily, we have several lines of defense against this sort of issue. The first is through the compiler — it will catch syntax errors and type mismatches, within your code and between your bindings and your code.

The wrapping code that Emscripten provides will also do its best to provide helpful error messages:

These combined will get you a good deal of the way, but when you’re instrumenting something more complex you will eventually get an error within WebAssembly that you don’t have a good way to debug. Thankfully, Emscripten provides a killer feature for c and cpp files — sourcemaps!

If you run your compilation with the -g4 flag, you will get bona-fide browser sourcemaps for your compiled code. While you don’t get the actual context, you do get breakpoints. Combined with robust logging from within your app, you can diagnose and fix pretty much any issue.

I will say though that while in-browser debugging via is pretty great, it doesn’t take the place of a test suite and C++ debugging environment. If I was going to do it over again, the first thing I would do would be to set up NetBeans with a CppTest suite. It might have taken a couple of hours to get going properly, but it would have easily paid itself off over the course of the project.

The final compilation command

After a lot of trial and error, I ended up with the following command:

# for dev

emcc hierarchy.cpp -g4 --bind -o index.html -s WASM=1 -s ALLOW_MEMORY_GROWTH=1 # for prod

emcc hierarchy.cpp -O3 --bind -o index.html -s WASM=1 -s ALLOW_MEMORY_GROWTH=1

-g4 sets the debugging verbosity level to maximum. This includes sourcemaps in the output

-O3 sets the optimization level. It is incompatible with the -g flag

--bind tells Emscripten to use Embind bindings.

-s WASM=1 tells Emscripten to emit WASM instead of asm.js. It is worth testing both outputs as I have read that for some applications asm.js can actually be faster than WebAssembly.