Supporting virtual reality displays in Linux

Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

At linux.conf.au (LCA) 2017 in Hobart, Tasmania, Keith Packard talked with kernel graphics maintainer Dave Airlie about how virtual reality devices should be hooked up to Linux. They both thought it would be pretty straightforward to do, so it would "only take a few weeks", but Packard knew "in reality it would take a lot longer". In a talk at LCA 2018 in Sydney, Packard reported back on the progress he has made; most of it is now in the upstream kernel.

Packard has been consulting for Valve, which is a game technology company, to add support for head-mounted displays to Linux. Those displays have an inertial measurement unit (IMU) for position and orientation tracking and a display with some optics. The display is about 2Kx1K pixels in the hardware he is working with; that is split in half for each eye. The displays also have a "bunch of lenses", which makes them "more complicated than you would hope".

The display is meant to block out the real world and to make users believe they inhabit the virtual reality. "It's great if you want to stumble into walls, chairs, and tables." Nearly all of the audience indicated they had used a virtual reality headset, leading Packard to hyperbolically proclaim that he is the last person in the universe to obtain one.

Display

There is a lot of computation that needs to be done in order to display a scene on the headset. The left and right eye buffers must be computed separately by the application. Those buffers must undergo an optical transform to compensate for the distortion introduced by the lenses. In addition, the OLED display has a different intensity for every pixel, which produces highly visible artifacts when moving an image across the display as the head moves. So his hardware has a table that is used to alter the intensities before the buffer is displayed. A regular Linux desktop is not suitable to be shown on a virtual reality headset; these devices should inherently be separate from the set of normal desktop display outputs.

Displaying to one of these head-mounted devices has hard realtime requirements. If the scene does not track the head motion perfectly, users will "get violently ill". Every frame must be displayed perfectly, both from a pixel-perfect perspective and that they must be delivered on time. The slowest frame rate is 90 frames per second, which is 50% faster than the usual desktop frame rate; the application "must hit every frame" or problems will occur. If the device simply presents a static image that does not change with head position, which can happen during debugging, it can cause the wearer to fall down if they are standing or fall out of their chair if they are not.

He and Airlie discussed a few different possibilities for supporting these displays. Most required changes to the desktops, X, the kernel, or some combination of them. All except one would have serious latency problems because X is not a hard realtime environment; that could lead to missed frames or other glitches. X is kind of a "best effort" display manager, Packard said; you give it a frame and it will display it sometime in the future. These displays need more than that.

The idea they came up with was to allow applications to "borrow" a display and get X out of the way—because "it's not helping". It adds latency and the application doesn't need any of the services it provides. By hacking the kernel and X (something that he and Airlie are well qualified to do, Packard said), they could provide a way for X to talk to some displays and for applications to talk to others without the two interacting at all.

Direct rendering manager (DRM) clients already talk directly to the kernel without going through X, he said. Those clients have a file descriptor to use for that communication; either those descriptors could be made more capable or a new kind of file descriptor could be added. Beyond rendering, the applications need to be able to do mode setting, which is a privileged operation that is normally mediated by the window system (e.g. X). What is needed is a way to pull the display away from X temporarily, so that the head-mounted display application has full control of it. But there is also a need for X to be able to recover the display if the application crashes.

Leasing

So they came up with the idea of a "lease". The application would not own the device, but would have limited access for a limited amount of time. The lessor (e.g. X server) promises not to "come in and bug the application while it is busy displaying stuff"; it gives the application free rein on the display. The lessee (application) can set modes, flip buffers, and turn the display on or off for power saving; when the lease terminates (or the lessee does), the lessor can clean up.

When doing new things in the kernel, it often leads to getting sidetracked into a series of "yak-shaving exercises", Packard said. One of those came about when he was trying to change the kernel frame counter from 32-bit to 64-bit. The current vertical blank (VBLANK) API is a mess; there is a single ioctl() command that is used for three separate functions. It only supports a 32-bit frame counter that will wrap in a few years, which means that it cannot be treated as a unique value. A multi-year test-debug cycle is not conducive to finding and fixing any bugs that result from the wrap, so he was hesitant to rely on it. It also only supported microsecond resolution, which was insufficient for his needs.

So he added two new ioctl() commands, one that would retrieve the frame counter and another to queue an event to be delivered on a particular count. Those counts are now 64-bit quantities with nanosecond resolution. He got sidetracked into solving this problem and "it took a surprising amount of time to integrate this into the kernel". It was such a small change to the API and "the specification is really clear and easy to read", so that meant that it got bikeshedded, which was frustrating. He thought it would be quick and easy to do, but "it took like three months". "Welcome to kernel development", he added.

For DRM leasing, he created three patch sets to "tell the story" of the changes he wanted to make. The first chunk simply changed the internal APIs so that the hooks he needed were available; it made no functional change to the kernel. The second was the core of his changes; it added lots of code but didn't touch other parts of the system. That allowed it to be reviewed and merged without actually changing how the kernel functioned. Only with the last chunk, which exposed the new functionality to user space, does the kernel API change. Breaking the patches up this way makes it easier for maintainers to review, Packard said.

Two different kinds of objects can be leased: connectors, which correspond to the physical output connectors, and CRTCs, which are the scan-out engines. Packard noted that CRTC stands for "cathode-ray tube controller" but that few people actually have a cathode-ray tube display any more. In fact, the only person he knows that does still have one plays a competitive video game; the one-frame latency introduced by using an LCD display evidently interferes with optimal game playing. "Wow!"

He also added some new ioctl() commands to allow applications to lease a connector and CRTC. The lessee can only see and manipulate the connector and CRTC resources the lease has been granted for; only the DRM master (e.g. X server) has access to the full set. The application only needs to change in one place: instead of opening the graphics device, it should request a lease. Once the lease is granted, it will only see the resources that it has leased, so nothing else in the application needs to change, he said.

Demo time

He then demonstrated the code using an application he wrote (xlease) that gets a lease from the X server, starts a new X server, and tells the new server to use the leased device for its output. That came up with an xterm, but he could not actually type in that terminal window because the second X server had no input devices. He then ran the venerable x11perf 2D-performance test in a second window. He killed xlease, which killed the second X server; the original X server was able to recover and redisplay his desktop at that point.

The second X server was running as a normal, unprivileged user because it does not require special privileges to open the graphics device as the regular X server does. That could lead to a way to solve the longstanding request for multi-seat X. He has a graphics card with four outputs, for example, and X already handles multiple input devices, so it should be straightforward to set up true multi-seat environments. Each display would have its own X server and input devices simply by changing configuration files, he said.

The master X server can still see all of the resources, including those that have been leased out. It could still mess with the output on the leased devices; it has simply agreed not to do so, there is no enforcement of that by the kernel. But desktops, such as GNOME or KDE/Plasma, would also be able to mess with those displays. So the X server effectively hides the leased resources from other clients; it simply pretends that there is nothing connected while the lease is active.

In an aside, Packard noted that he had seen patches to add leasing to the Wayland protocol the day before. There was no code yet, but "somebody is starting to think about how this might work in a Wayland environment, which is pretty cool".

All of the leasing mechanism sits atop an ability that was added for direct rendering infrastructure (DRI) 3: passing file descriptors from the server to X applications. Passing file descriptors via Unix-domain sockets has been possible for a long time, but was never used in X. DRI 2 had some "hokey mechanism" that was replaced by file-descriptor passing when DRI 3 came about. That required a bunch of changes and infrastructure in the X server that made this leasing work "a piece of cake", he said. The lease is represented as a file descriptor that can be passed to an application; using file descriptors that way is a powerful primitive that he thinks we will be seeing more of for X in the future.

There was a pile of minor fixes that he needed to make in X to support leasing. For example, the cursor needed to be disabled when the CRTC was so that the lessee doesn't end up with a random cursor on the screen. Similarly, the cursor needed to be restored to a known state when the lease terminates.

He has also done some work to provide leasing support for Vulkan. His original plan was to create a Vulkan extension that would be passed the file descriptor for it to use, but that ran aground on the rather heavyweight Vulkan standardization process for extensions. NVIDIA has an extension that does something related and he was able to modify that to handle leasing while still allowing it to be used for its original purpose with the binary-only NVIDIA driver.

He concluded the talk by saying that virtual reality on Linux is "working great and should be coming to your desktop soon". The kernel changes were merged for 4.15, while the X changes will be coming in xserver 1.20. The YouTube video of the talk is available for those interested.

[I would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to Sydney for LCA.]

