The anatomy of a Vulkan driver

This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

Jason Ekstrand gave a presentation at the 2016 X.Org Developers Conference (XDC) on a driver that he and others wrote for the new Vulkan 3D graphics API on Intel graphics hardware. Vulkan is significantly different from OpenGL, which led the developers to making some design decisions that departed from those made for OpenGL drivers.

He started with an "obligatory brag slide" (slides [PDF]) that outlined the progress that had been made on the driver in only eight months, with roughly three and a half people. Ekstrand, Kristian Høgsberg, and Chad Versace, with help from a dozen others, got a Vulkan driver working that was released (as open source) on the same day that the Vulkan specification was released in February. Not everything was written from scratch; the driver uses the same internal representation and back-end compiler that Mesa uses. The driver passed the conformance tests on day one as well, which is not something that everyone in the industry can say, Ekstrand said.

Vulkan is a new industry-standard 3D rendering and compute API from Khronos, which is the same group that maintains OpenGL. It is not simply OpenGL++, he said, as it has been redesigned from the ground up. Vulkan is designed for modern GPUs and software. It will run on currently shipping (OpenGL ES 3.1 class) hardware.

A lot has happened since SGI released OpenGL 1.0 in 1992, which is why a new 3D API is needed. In the 24 years since that first release: GPUs have become more powerful and flexible, memory has become much cheaper, and multi-core CPUs are common. OpenGL has done "amazingly well" over that time, but it is showing its age at this point.

Multi-threaded programs are now commonplace, which makes OpenGL's state machine based on a singleton context kind of obsolete. Off-screen rendering is common as well. Beyond that, GPU hardware has become more standardized, so application developers don't want the API to hide the details of what the GPU is doing as OpenGL does.

Vulkan takes a different approach. It has an object-based API where there is no global state. All state is stored in the command buffer and there can be multiple command buffers. It is more explicit about what the GPU is doing: texture formats, memory management, and synchronization are all client-controlled. Those things are needed to support multi-threading, but also make drivers simpler.

Vulkan drivers do no error checking. There is a set of open-source, vendor-neutral validation layers that do much the same checking as is done in Mesa but they are meant to be disabled at runtime. The idea is for application developers to check their Vulkan code during development, so "why burn 10% of my CPU doing validation" when there are no errors in the Vulkan code?

There is a short distance between the API call and the driver in Vulkan, rather than traversing multiple layers as in Mesa. There is also a short distance between the driver function and actually putting data into the command buffer for the GPU. There are "no extra layers", Ekstrand said.

To handle multiple generations of hardware, each with its own packet format and packing scheme, the Vulkan driver has header files that are generated using Python scripts to process an XML representation of the formats. There is a function that uses that header file information to pack the command data into the buffer in the right way. It has debugging support that can assert() for various problems and the code can be run under Valgrind to find other kinds of problems.

To handle four separate Intel GPU generations, the code is compiled four times to create one version per generation. That allows the driver to keep up with new hardware more easily. The hardware-generation checks for each command function (as in the Mesa driver) are compiled away and the right thing is done for the generation in use. This is one example of where the team got to rethink things because it is a new, from-scratch driver.

One of the challenges faced by the team was in memory allocation. Vulkan provides a collection of heaps where clients can allocate VkDeviceMemory objects. The client can place VkImage or VkBuffer objects at explicit offsets within the VkDeviceMemory object. This doesn't map well to allocation from LibDRM, he said, but it does map well to Graphics Execution Manager (GEM) buffer objects. Other objects have small amounts of driver-allocated memory for state that the driver needs to track. The team had to figure out how to manage all those pieces of memory. Complicating matters was that the Intel hardware has different base addresses for different types of allocations (e.g. shaders, surface states), so the state information needs to be stored with others of the same type.

He and Høgsberg came up with a "crazy" memory allocation structure that they are pretty proud of, Ekstrand said. For device memory objects, GEM buffers are used; there is also a pool of GEM buffers that are used for back buffers. For the state objects, there are block pools that are allocated as a buffer object that grows in both directions as needed. The pools are initialized to provide objects of a specific size. Allocating from either end of the pool is required because of some hardware-specific restrictions.

The block pools are implemented as a 2GB memfd that gets mmap() -ed into the driver. An address in the middle is then turned into a GEM buffer object. The block pool is used to implement both a traditional "allocate and free" style state pool as well as a pool that is used for state that is associated with a command buffer. The latter pool has no free function, it simply gets reset when the command buffer is thrown away. It is a complicated infrastructure, but has worked well, he said.

Most hardware has support for compressed surfaces, but not all parts of the GPU understand all of the different formats. So a "resolve" operation is needed to decompress or recompress the surface at different points in the pipeline. Due to the multi-threaded nature of Vulkan, though, there is no real way to track when the resolves are needed on the CPU side. The Vulkan API provides two features ("render passes" and "layout transitions") that can help. Layout transitions are not currently used in the driver, but render passes delineate where resolves may be needed.

It is easier to write a Vulkan driver than one for OpenGL, Ekstrand said. The lack of error checking simplifies things to start with. The SPIR-V shader language is a bit easier to deal with than OpenGL's GLSL. Also, the Vulkan conformance tests consist of 115,000 tests that the driver developer doesn't have to write. It is a good set of tests, but there are still some holes, he said.

Some things are harder to do for Vulkan than for OpenGL. There is no CPU-side object state-tracking, for one thing. In addition, "applications have a lot more power for stupid". If the application is doing something wrong, which results in a bug filed against the driver, there is a good bit of work—without good tools—needed to track down the problem.

As far as sharing code between Vulkan and OpenGL drivers goes, there are a couple of different approaches. The approach taken was a "toolbox" that provides a number of different parts, from which a driver can be created. That approach has also provided better infrastructure for building other drivers in the future. Those looking for more details may want to view the YouTube video of the talk.

[I would like to thank the X.Org Foundation for sponsoring my travel to Helsinki for XDC.]

