We have been obsessed with making the Wasmer WebAssembly runtime faster: first by minimizing the compilation time by using caching and then by adding different compiler tiers into the runtime.

As time progressed, we started asking ourselves… What is the fundamental cause of VM-based programs being slower than native ones? Is there any way can we solve it for some specific use-cases?

In this article we will overview what we did to make Wasmer to (optionally) run on the Kernel… achieving over 10% speedup on a WebAssembly tcp echo server over native code! 🎉

Background

“The Second OS”

Many languages and runtimes, including WebAssembly (WASI implementations) and JavaScript (Node.js and browsers), have been trying to build another sandboxed “OS” on top of the real operating system. The second layer, however, incurs a significant overhead in performance.

VM running in Ring 3

As shown above, in the traditional architecture, OS service requests (system calls) from a VM-based program have to go through two boundaries, before reaching the kernel.

Neither of those two boundaries are lightweight enough to cross. While a normal function call takes less than 5 nanoseconds, a system call originating from a program in the VM can waste hundreds of nanoseconds.

The Successor to Cervus

I wrote Cervus — another WebAssembly “usermode” subsystem running in the Linux kernel — about a year ago. Back then WASI didn’t exist and neither did any “production-ready” non-Web WebAssembly runtimes, though the Cervus project has proved that the idea was possible and had great potential.

Now, the WASM ecosystem is growing and the Wasmer runtime is in a very good place, so it’s time to build a complete in-kernel WASM runtime for real applications.

Why run WebAssembly in the kernel?

Mainly for performance and flexibility.

Since WASM is a virtual ISA protected by a Virtual Machine, we do not need to rely on external hardware and software checks to ensure safety.

Running WASM in the kernel avoids most of the overhead introduced by those checks, e.g. system call (context switching) and copy_{from,to}_user , thereby improving performance.

VM running in Ring 0

Also, having low-level control means that we can implement a lot of features that were heavy or impossible in userspace, e.g. virtual memory tricks, direct hardware access, and handling of intensive kernel events (like network packet filtering).

Security

Running user code in kernel mode is always a dangerous thing.

Although we use many techniques to protect against different kinds of malicious code and attacks, it’s advised that only trusted binaries should be run through this module in the short term, before we fully review the runtime’s codebase for security.

Here are some known security risks and what we did to fix them:

Stack overflow : emit bound checking code from the codegen backend

: emit bound checking code from the codegen backend Out-of-bound memory access : allocate a 6GB virtual address space for each WASM task so that out-of-bound load/stores cannot even be represented

: allocate a 6GB virtual address space for each WASM task so that out-of-bound load/stores cannot even be represented Lack of signal-based forceful termination : set the NX bit on WASM code pages when a fatal signal arrives

: set the NX bit on WASM code pages when a fatal signal arrives Lack of floating point register state preserving : explicitly save FP state on preemption with kernel_fpu_{begin,end} and preempt_notifier

: explicitly save FP state on preemption with and Red Zone not supported in kernel: avoid using Red Zone in the codegen backend

Examples and benchmark

We have created two examples: echo-server and http-server (living in the examples directory of Wasmer main repo).

When executed with the singlepass backend (unoptimized direct x86-64 code generation) and benchmarked locally using tcpkali / wrk , echo-server is ~10% faster (25210 Mbps / 22820 Mbps) than its native equivalent in userspace, and http-server is ~6% faster (53293 Rps / 50083 Rps).

Even higher performance is expected when the other two Wasmer backends with optimizations (Cranelift and LLVM) are updated to support generating code for the kernel.

Those two examples use WASI (for file abstraction and printing to console) and the asynchronous networking extension (via the kernel-net crate).

Take a look at them to learn how to do high-performance networking in kernel-wasm.

How to run it

Before running Wasmer on the kernel, ensure that:

Your system is running Linux kernel 4.15 or higher .

. Your kernel has preemption enabled . Attempting to run WASM user code without kernel preemption will freeze your system.

. Attempting to run WASM user code without kernel preemption will freeze your system. Kernel headers are installed and the building environment is properly set up.

First, clone the repo:

Then just run make in the root directory, and (optionally) networking and wasi :

make

cd networking && make

cd ../wasi && make

cd ..

Load the modules into the kernel:

sudo insmod kernel-wasm.ko

sudo insmod wasi/kwasm-wasi.ko

sudo insmod networking/kwasm-networking.ko

When running Wasmer, select the kernel loader and singlepass backend:

Make sure you are running on the latest version ( 0.4.2 ) by executing

wasmer self-update

sudo wasmer run --backend singlepass --loader kernel the_file.wasm