December 17, 2013 posted by Antti Kantee

A cyclic trend in operating systems is moving things in and out of the kernel for better performance. Currently, the pendulum is swinging in the direction of userspace being the locus of high performance. The anykernel architecture of NetBSD ensures that the same kernel drivers work in a monolithic kernel, userspace and beyond. One of those driver stacks is networking. In this article we assume that the NetBSD networking stack is run outside of the monolithic kernel in a rump kernel and survey the open source interface layer options.

There are two sub-aspects to networking. The first facet is supporting network protocols and suites such as IPv6, IPSec and MPLS. The second facet is delivering packets to and from the protocol stack, commonly referred to as the interface layer. While the first facet for rump kernels is unchanged from the networking stack running in a monolithic NetBSD kernel, there is support for a number of interfaces not available in kernel mode.

The Data Plane Development Kit is meant to be used for high-performance, multiprocessor-aware networking. DPDK offers network access by attaching to hardware and providing a hardware-independent API for sending and receiving packets. The most common runtime environment for DPDK is Linux userspace, where a UIO userspace driver framework kernel module is used to enable access to PCI hardware. The NIC drivers themselves are provided by DPDK and run in application processes.

For high performance, DPDK uses a run-to-completion scheduling model -- the same model is used by rump kernels. This scheduling model means that NIC devices are accessed in polled mode without any interrupts on the fast path. The only interrupts that are used by DPDK are for slow-path operations such as notifications of link status change.

The rump kernel interface driver for DPDK is available here. DPDK itself is described in significant detail in the documents available from the Intel DPDK page.

Like DPDK, netmap offers user processes access to NIC hardware with a high-performance userspace packet processing intent. Unlike DPDK, netmap reuses NIC drivers from the host kernel and provides memory-mapped buffer rings for accessing the device packet queues. In other words, the device drivers still remain in the host kernel, but low-level and low-overhead access to hardware is made available to userspace processes. In addition to the memory-mapping of buffers, netmap uses other performance optimization methods such as batch processing and buffer reallocation, and can easily saturate a 10GigE with minimum-size frames. Another significant difference to DPDK is that netmap allows also for a blocking mode of operation.

Netmap is coupled with a high-performance software virtual switch called VALE. It can be used to interconnect networks between virtual machines and processes such as rump kernels. The netmap API is used also by VALE, so VALE switching can be used with the rump kernel driver for netmap.

The rump kernel interface driver for netmap is available here. Multiple papers describing netmap and VALE are available from the netmap page.

A tap device injects packets written into a device node, e.g. /dev/tap , to a tap virtual network interface. Conversely, packets received by the virtual tap network can be read from the device node. The tap network interface can be bridged with other network interfaces to provide further network access. While indirect access to network hardware via the bridge is not maximally efficient, it is not hideously slow either: a rump kernel backed by a tap device can saturate a gigabit Ethernet. The advantage of the tap device is portability, as it is widely available on Unix-type systems. Tap interfaces also virtualize nicely, and most operating systems will allow unprivileged processes to use tap interface as long as the processes have the credentials to access the respective device nodes.

The tap device was the original method for accessing with a rump kernel. In fact, the in-kernel side of the rump kernel network driver was rather short-sightedly named virt back in 2008. The virt driver and the associated hypercalls are available in the NetBSD tree. Fun fact: the tap driver is also the method for packet shovelling when running the NetBSD TCP/IP stack in the Linux kernel; the rationale is provided in a comment here and also by running wc -l .

Xen hypercalls

After a fashion, using Xen hypercalls is a variant of using the TAP device: a virtualized network resource is accessed using high-level hypercalls. However, instead of accessing the network backend from a device node, Xen hypercalls are used. The Xen driver is limited to the Xen environment and is available here.

NetBSD PCI NIC drivers

The previous examples we have discussed use a high-level interface to packet I/O functions. For example, to send a packet, the rump kernel will issue a hypercall which essentially says "transmit these data", and the network backend handles the request. When using NetBSD PCI drivers, the hypercalls work at a low level, and deal with for example reading/writing the PCI configuration space and mapping the device memory space into the rump kernel. As a result, using NetBSD PCI device drivers in a rump kernel work exactly like in a regular kernel: the PCI devices are probed during rump kernel bootstrap, relevant drivers are attached, and packet shovelling works by the drivers fiddling the relevant device registers.

The hypercall interfaces and necessary kernel-side implementations are currently hosted in the repository providing Xen support for rump kernels. Strictly speaking, there is nothing specific to Xen in these bits, and they will most likely be moved out of the Xen repository once PCI device driver support for other planned platforms, such as Linux userspace, is completed. The hypercall implementations, which are Xen specific, are available here.

shmif

For testing networking, it is advantageous to have an interface which can communicate with other networking stacks on the same host without requiring elevated privileges, special kernel features or a priori setup in the form of e.g. a daemon process. These requirements are filled by shmif, which uses file-backed shared memory as a bus for Ethernet frames. Each interface attaches to a pathname, and interfaces attached to the same pathname see the the same traffic.

The shmif driver is available in the NetBSD tree.

We presented a total of six open source network backends for networking with rump kernels. These backends represent four different methodologies:

DPDK and netmap provide high-performance network hardware access using high-level hypercalls.

and provide high-performance network hardware access using high-level hypercalls. TAP and Xen hypercall drivers provide access to virtualized network resources using high-level hypercalls.

and drivers provide access to virtualized network resources using high-level hypercalls. NetBSD PCI drivers access hardware directly using register-level device access to send and receive packets.

access hardware directly using register-level device access to send and receive packets. shmif allows for unprivileged testing of the networking stack without relying on any special kernel drivers or global resources.

Choice is a good thing here, as the optimal backend ultimately depends on the characteristics of the application.