Emscripten & asm.js:

C++'s role in

the modern web



Alon Zakai / @kripken



All major web browsers are written in C++ For the obvious reasons:

fast, familiar, library support

For the same reasons, people want to use C++ to write web content too, that is, websites That's what this talk is about

The Web

Largest open platform in existence

Modern, standards-compliant websites are built using HTML, CSS, and JavaScript (JS)

No C++ there :(

What about non-standardized approaches (ActiveX, Flash/Alchemy, PNaCl/PPAPI)?

Plugins and proposals for entirely new technologies on the web have failed to reach significant adoption or standardization, for both technical and non-technical reasons

And plugins are on the way out (no plugins on iPhone/iPad, etc.)

This is a good trend - standardization is why websites work on both your laptop and your phone

...where does that leave C++, then?

Well, JavaScript is already standardized, so how about if we... compile C++ into that?

This has happened with many languages, in fact: Java

C#

Python

New languages like TypeScript

etc.

Compiling to JavaScript?

JavaScript is a dynamic scripting language var x = 42; var y = "a string"; var z = x + y; // z = "42a string" eval("z = z.substr(1, 2)"); // z = "2a" [1, "two", { three: 3 }].forEach(function(item) { if (typeof item === typeof z) console.log([z, item]); }); // emits ["2a", "two"] Kind of a weird compiler target...

But from the developer's point of view, compiling to JavaScript can be very conventional!

First, a reminder of compiling to a native executable:

// hello.cpp #include <iostream> int main() { std::cout << "hello, world!" << std::endl; }

$ g++ hello.cpp -o a.out $ ./a.out hello, world!

Compiling to JavaScript using Emscripten: $ em++ hello.cpp -o a.html $ firefox a.html # or any other browser

Here's the output, running in an iframe right here on this web page:



emcc, em++ are drop-in replacements for a native C or C++ compiler, workflow is almost identical



Open source (MIT license) LLVM-based C++ to JavaScript compiler

C++ ⇒ LLVM ⇒ Emscripten ⇒ JavaScript

Emscripten builds on the LLVM family of projects:

clang C++ frontend LLVM optimizer libc++ C++ standard library libc++abi low-level C++ support

Currently an out-of-tree fork of LLVM, but we hope to get upstream eventually

Other libraries

Hybrid libc: musl + parts written in JavaScript

Implementations of SDL, OpenGL, etc., using Web APIs

You might be curious at this point what the emitted code looks like...

// C++ int func(int *p) { int r = *p; return calc(r, r << 16); } ⇒ Emscripten ⇒ // JavaScript function func(p) { var r = HEAP32[p >> 2]; return calc(r, r << 16); } Almost direct mapping in many cases

Another example: float array[5000]; // C++ int main() { for (int i = 0; i < 5000; ++i) { array[i] += 1.0f; } } ⇒ Emscripten ⇒ var buffer = new ArrayBuffer(32768); // JavaScript var HEAPF32 = new Float32Array(buffer); function main() { var a = 0, b = 0; do { a = (8 + (b << 2)) | 0; HEAPF32[a >> 2] = +HEAPF32[a >> 2] + 1.0; b = (b + 1) | 0; } while ((b | 0) < 5000); } This "style" of code is a subset of JS called asm.js, which we'll discuss more later

So that's what the code can look like. But there are some fundamental differences here...

Builds

C++ Need to recompile for another CPU or OS JS Single build runs the same everywhere



Single build prevents some optimizations

Undefined Behavior

C++ Has undefined behavior, compiler can use it to optimize JS No undefined behavior



dev machine | user machine C++ ⇒ JS | JS ⇒ Executable | NO undefined behavior

Security

C++ Applications can use the system libs, access the local filesystem, etc. JS Sandboxed, cannot see the machine it is running on



Applications must ship their own system libraries

We "fake" a filesystem to make porting easy

JS sandboxing helps in some unexpected ways!

Remember that we implement C++ functions using JS functions:

// Simple C++ function compiled to JavaScript function func(p) { var r = HEAP[p]; return calc(r, r << 16); }

The JS call stack is managed, and unobservable/unmodifiable by executing code

Compiled C++ is therefore immune to some types of buffer overflow attacks

Numeric Types

C++ char, short, int, int64, float, double JS double



We build for a 32-bit target, because 64-bit integers cannot all fit in doubles (but 32-bit ones can)

Perf Model

C++ C-style code maps closely to CPU, higher-level C++ aspects can use RAII, etc., giving predictability JS virtual machine (VM), just in time (JIT) compilers w/ type profiling, garbage collection, etc.

But without good and predictable performance, this is pointless...

Historically, JS began as a slow interpreted language

Competition ⇒ type-specializing JITs

Those are very good at implicitly statically typed code function add(x, y) { x = x | 0; // | 0 => int32 y = y | 0; return (x + y) | 0; // int32 addition! } That's what asm.js is: a subset of JavaScript where all the operations are clearly statically typed

Memory access var buffer = new ArrayBuffer(32768); var HEAP8 = new Int8Array(buffer); var HEAP16 = new Int16Array(buffer); var HEAP32 = new Int32Array(buffer); function mem_access() { return HEAP32[HEAP8[100] >> 2]; }

Loads in C++ become reads from typed arrays in JS, which become loads in machine code

Emscripten's memory representation/layout is identical to LLVM's, including aliasing, so can use all LLVM opts

Ok, we've just seen some encouraging things about speed, but before we saw some scary things too...?

Performance



Performance / time

source: awfy; lower numbers are better

Overall, performance is around 50-67% of native speed, and still improving

Missing pieces remain, like SIMD, but work is underway in the standards bodies

Already fast enough for many applications, even performance sensitive ones like games

In fact, the game industry has been an early adopter of compiling C++ to JavaScript, using Emscripten:

Unity Unigine Minko Torque 2D Unreal Nebula3 Cocos2D-X Godot etc.

Products are shipping

online demos from Unity:





Links tofrom Unity:

Adoption and usage in production show that while JS is a weird compiler target, the results can be robust and reliable

One way we work towards that is fuzzing using csmith; not currently aware of any Emscripten-specific bugs

While there are differences between browsers, having a single build for all of them improves reliability

Emscripten supports practically all C++ features, because clang does

But exception handling isn't something we just get for free

Emscripten supports C++ exceptions... differently // C++ void func() { try { something(); } catch (Type T) { handle(T); } } ⇒ Emscripten ⇒ // JS void func() { invoke(10); // call a function pointer, checking for throw var T = get_thrown(); if (T) { if (can_handle(T, 400)) { // 400 -> typeid of Type handle(T); } else { do_throw(T); } } }

// JS function invoke(ptr) { __thrown__ = 0; try { dyn_call(ptr); } catch (e) { __thrown__ = e; } } function can_handle(ptr, type) { // call into libc++abi internals } function do_throw(ptr) { throw ptr; } Here are those runtime functions: We implement C++ exceptions using JS exceptions, JS VM provides stack unwinding

Perf depends on the speed of JS exceptions

We can compile C++ into JavaScript and run it on the web, in a fast and standards-compliant way

JavaScript is a weird - but fun! - compiler target

That's it! Questions? will tweet link to slides @kripken http://emscripten.org http://asmjs.org

Back to Memory

Recall that we represent memory using a single flat array Pointers are indexes into the array var buffer = new ArrayBuffer(32768); var HEAP8 = new Int8Array(buffer); function compiledCode(ptr) { HEAP[ptr] = 12; // write to an address return HEAP[ptr + 4]; // read from an address } Which is basically how C and C++ see memory: a pointer can point anywhere in all of memory

But this is not how languages like JavaScript, C#, Java, Python etc. see memory

Each object or array in those languages is in its own "space", which is bounds-checked, and pointers cannot point to anywhere, they are references to distinct objects