IncludeOS: A minimal, resource efficient unikernel for cloud systems – Bratterud et al. 2015

There has been lots of excitement around unikernels over the last year, and especially with the recent acquisition of the Unikernel Systems team by Docker (MirageOS, Mergeable Persistent Data Structures, Jitsu: Just-in time summoning of Unikernels). Whereas MirageOS is built around an OCaml stack, in today’s paper choice we get a look at IncludeOS, which is built on a C++ stack. In true unikernel style, you just include the parts of the operating system you need, directly linked with your application. What makes me smile every time is that in IncludeOS this is literally achieved via ‘#include <os>’ !

In this paper we present IncludeOS, a single-tasking operating system designed for virtualizedenvironments. IncludeOS provides a novel way for developers to build their C++-based code directlyinto a virtual machine at compile-time…

A fully virtualized “Hello World” service in IncludeOS (which of course includes the necessary components of the OS) uses only 8.45MB of memory. A Ubuntu 14.04 OS image (the default guest OS for OpenStack) is around 300MB by comparison. Even running a regular Java Hello World program (ignoring the OS), just the Java process itself takes about 28MB. If you’re spinning up lots of instances in your cluster, this reduction in memory overhead can result in significant savings – memory being one of the most expensive resources. A minimal IncludeOS VM can also boot in about 0.3s. A DNS service built with IncludeOS results in a 158K disk image (for comparison, the MirageOS DNS server image came in at 200K). Finally, IncludeOS is designed to be very efficient at runtime, when idle it uses no CPU at all.

The designers of IncludeOS were guided by the ‘Zero Overhead Principle‘ :

IncludeOS aims for true minimality in the sense that nothing should be included by default that the service does not explicitly need. This corresponds to the zero overhead principle of e.g. C++; ”what you don’t use you don’t pay for.” … While many other projects are related, IncludeOS is different: where systems such as Mirage and OSv aims to provide a platform for a high-level language-runtimes, which impose significant resource penalties in themselves, IncludeOS aims to represent absolute minimality.

Including only what is needed in an OS image is a job that can be delegated to the GCC tool chain:

The mechanism used for extracting only what is needed from the operating system, is the one provided by default by modern linkers. Each part of the OS is compiled into an object-file, such as ip4.o , udp.o , pci_device.o etc., which are then combined using ar to form a static library os.a . When a program links with this library, only what’s necessary will automatically be extracted by the linker and end up in the final binary. To facilitate this build process a custom GCC-toolchain has been created.

For the C standard library, IncludeOS uses RedHat’s standard library implementation due to its small size, reliance on only a handful of system calls, and ability to be compiled into a statically linked library. “The C++ standard library is larger and trickier…” Currently IncludeOS uses Electronic Art’s EASTL exception-free implementation, future work will include a port of a full-featured implementation.

IncludeOS currently has only one device driver, namely a VirtioNet Device driver. The key benefit of virtio is that the hypervisor does not need to emulate a certain physical device, but instead can insert data directly into a queue in memory shared by the guest. While Virtio 1.0 has recently emerged as an OASIS standard, none of the hypervisors used during development supported any of the new features. Therefore the driver currently only implements Virtio Legacy functionality, but development has been done with future support for Virtio 1.0 in mind.

The network stack was a more complex challenge, since existing network stacks are often entangled with the operating system and not designed with the zero overhead principle in mind. The IncludeOS project is working on a completely modularized networking stack – the current implementation is sufficiently advanced to support e.g. the DNS server implementation previously mentioned, and work is underway to complete a full TCP and IPv6 stack that will also be running standalone in Linux user space.

Currently, all IRQ handlers in IncludeOS will simply (atomically) update a counter, and defer further handling to the main event-loop, whenever there is time. This eliminates the need for a context switch, while also eliminating concurrency-related issues such as race conditions. The CPU is kept busy by having all I/O be asynchronous, so that no blocking occurs. This encourages a callback-based programming model, such as is common in modern Javascript applications.

This is one of a number of factors that contribute to IncludeOS’s excellent runtime performance:

There is no system call overhead as the OS and the service are one binary, eliminating the need for memory protection barriers.

There is no unnecessary overhead from timer interrupts.

There is no I/O waiting, since IncludeOS uses an asynchronous event-based I/O model.

There is no overhead from emulating the Programmable Interrupt Timer (i.e. no periodic timer interrupts, and no pre-emptive scheduling).

The number of protected instructions has been kept very low reduce VM exits.

That being so, IncludeOS in its current form is not fit for every task. In particular, deferring all IRQ’s will cause the VM to seem unresponsive (i.e. not answer ping) under workloads requiring a lot of CPU activity per request (this is not the case for DNS)… For services requiring several seconds of CPU-processing for each request, ICMP-packets would simply be queued until the virtio-queue was full, at which point they would be dropped, giving the impression of an unresponsive service.

A really interesting future development for IncludeOS is the design of a Node.js style framework for supporting high performance web-applications: