Helping WebAssembly reach close to native speeds will be a great step forward to wide adoption.

The current WebAssembly SIMD proposal is closing the performance gap, by allowing numeric-intensive WebAssembly programs to leverage SIMD to improve their runtime performance.

Over the past couple of months at Wasmer, we’ve been hard at work adopting SIMD into our server-side WebAssembly runtime, and found some great results from our speed analysis of native vs WASM-SIMD vs WASM-without-SIMD in our new SIMD implementation.

Wasmer is the first Wasm runtime to fully support WASI and SIMD! 🎉

What is SIMD?

SIMD stands for Single Instruction, Multiple Data.

With just one instruction we can perform an arithmetic operation in multiple data lanes.

Let’s say we want to multiply four numbers ( i32 ) by two and get the results back:

1 × 2 = 2

2 × 2 = 4

3 × 2 = 6

4 × 2 = 8

Normally, to get the results (2, 4, 6, 8) we will have to do four multiplying operations (one for each number).

By using SIMD we can do the same multiplication with just one CPU operation.

(1, 2, 3, 4) × (2, 2, 2, 2) = (2, 4, 6, 8)

This will be much faster to compute ⚡️.

In this example we are using 128-bit SIMD registers with 4 data-lanes (32-bit each), however we can vary the number of data lanes (and the total bits of the register) depending on what our needs are.

Where SIMD can be useful?

SIMD can be specially useful for programs that are very intensive on numeric operations (addition, subtraction, multiplication, …) over a large set of numbers.

Examples of this are:

Image/Audio/Video Processing

Crypto/Hashing

Physics engines

By leveraging SIMD, a WebAssembly program could have speedups up to 16× on operations in 8-bit numbers (255), or up to 2× if we are operating on 64-bit numbers (since WebAssembly SIMD operates on 128 bits)

Note: outside of WebAssembly instructions, SIMD can operate on registers wider than 128 bits. Thus the speedups can be even higher in certain native implementations.

Our work

Wasmer has 3 different backends: Singlepass, Cranelift and LLVM.

We decided to start the SIMD implementation in the LLVM backend to take advantage of LLVM’s implementation of SIMD instructions.

Adding SIMD support into Wasmer touched other external open-source projects such as:

WABT: The WebAssembly binary toolkit

WABT-rs: the Rust bindings to WABT

Wasmparser: a fast Rust WebAssembly parser

In the process of working on the SIMD feature for Wasmer WebAssembly runtime, we created (and successfully merged) over 10 different PRs into these projects.

We also added an intensive set of SIMD spectests to assure we comply with the original specification

We would like to use this article to personally thank all the maintainers of these repos for their incredible support and quick response time: Yuri Delendik ( wasmparser project), Ben Smith and Thomas Lively ( wabt ), and Sergey Pepyakin ( wabt-rs ).

SIMD Speed Analysis

Now that SIMD support has landed in Wasmer (LLVM backend), we can analyze the speedup that we can achieve with it.

We created a SIMD example that emulates particle physics using C++, WASI and of course... WebAssembly!

Here are the timings of running our physics simulation:

Time to execute the particles emulation program (lower is better)

As you can see, the speed when running the SIMD in the native executable versus running it with Wasmer… is almost the same!

How can you use it?

The latest release of Wasmer ( 0.6.0 ) has shipped with SIMD support.

You can install Wasmer with:

curl https://get.wasmer.io -sSfL | sh

Note: you can also use Wasmer with SIMD on Windows — Download the Wasmer installer

Running WebAssembly-SIMD programs in Wasmer is as easy as choosing the LLVM backend and passing the --enable-simd flag when executing a .wasm program with wasmer :

We will have SIMD enabled in the other backends soon. Stay tuned! 🙂

Happy Hacking!