Following on from part 1 in the series, today we cover more developments in low-level graphics, which we’ve enabled through the whole stack. Support for buffer modifiers is something we’ve worked on in the kernel, Mesa, Wayland, Weston, Mutter and GNOME Shell, and X.Org. Meanwhile, our community continues to grow thanks to Google Summer of Code. Read on for more.

More performance from buffer modifiers

Another kernel feature from the same era we are now able to take full advantage of, is buffer modifiers.

Although we tend to think of images as being laid out in memory how they appear: left to right, then top to bottom, hardware would very much prefer you wouldn't. The most optimal way to access buffers is in tiled modes, and even supertiled modes (with up to three levels of tiles containing tiles). Not only this, but GPUs are beginning to be able to compress data quite well.

Whilst great for performance, until recently we didn't have a good way to annotate buffers with their modifiers. For some time, we could pretend that the buffers were linear and hope that we could get away with it, but with the increasing complexity of current use cases, these heuristics are not good enough. This illusion of being linear was also underpinned by magic side-channels: each driver implementing tiling support had its own way of telling userspace what the true configuration of the buffer was. But if you were not that driver, you had no way of knowing: if you want to share buffers across different GPUs, or even different drivers, you had to pessimise and disable all support for tiling, with an often unacceptable performance cost.

At Collabora, we started developing an EGL extension, which we contributed upstream to Khronos as well as support for Mesa. This extension allows GPU drivers to advertise which format modifiers (tiling/compression modes) they support, and to explicitly specify modifiers on import. We also wrote a Wayland extension with support in Weston and GStreamer, allowing Wayland to be used as a two-way transit for modifier information.

Last year, sponsored by Intel, we finished the task by adding support to the window systems. We helped finalise support for modifier advertisement in KMS, wrote an X11 equivalent to our Wayland protocol so legacy systems could also get the benefit of modifier support, wired up support in Mesa so it was able to discover the modifiers available to allocate with and communicate the chosen modifier back to the window system once done, wired this up for both Wayland and X11 clients running on any of EGL/GLX/Vulkan, implemented support for the Wayland protocol in Mutter (GNOME Shell's display server), Xwayland (for supporting legacy X11 clients in a Wayland session), and also the classic Xorg X server. Whilst we were there, we implemented support for the atomic modesetting API in the classic X.Org server, though not using planes as the X server is architecturally unable to do so.

With the groundwork having been laid by Mesa, Robert Foss implemented support for buffer modifiers in the open-source drm_hwcomposer implementation, allowing Android to get the benefit of this support. This built on top of support added to the Etnaviv open-source driver used in NXP i.MX SoC family by Lucas Stach of Pengutronix. The i.MX family is surprisingly complex, in that its GPU doesn't actually understand linear formats at all, requiring a completely separate copy to preserve the illusion that the buffer is laid out linearly. Expressing modifiers clearly and explicitly allows this copy to be elided where it is not necessary. Modifier support later spread to the open-source VC4 driver used in the Raspberry Pi (suffering similar shadow-copy problems), and now also the open-source NVIDIA Tegra driver.

The end result of all this work is that we have been able to eliminate the magic side channels which used to proliferate, and lay the groundwork for properly communicating this information across multiple devices as well. Devices supporting ARM's AFBC compression format are just beginning to hit the market, which share a single compression format between video decoder, GPU, and display controller. We are also beginning to see GPUs from different vendors share tiling formats, in order to squeeze the most performance possible from hybrid GPU systems.

Whilst it seems simple: take a 32-bit format token and 'just' add a 64-bit modifier token, the implementation was surprisingly complex. Part of the reason was that so much of the knowledge about modifiers was previously implicit, and mixing implicit and explicit systems is notoriously difficult. Unpicking all this took time and persistence, but it seems we've finally arrived at our end goal.

More performance from fewer copies in XWayland

Finally, over the past year I had the pleasure of mentoring Roman Gilg, as he worked on reducing copies in XWayland for Google SUmmer of Code 2018. Thanks to the tireless work of Martin Peres and others, Wayland was included in GSoC under the umbrella of the X.Org Foundation. Roman's project was inside the X server, making things better for people using legacy X11 clients such as Google Chrome or Steam, when running in a Wayland session.

When Wayland clients submit content to the server, they render to a buffer which the display server then uses directly until the client provides a new one. This is true if the content is displayed directly on a hardware overlay, or if it is composited on the GPU. However, in a composited X11 environment, the compositor only has a single buffer handle for each client window, even when it's getting updated. In this environment, when the client sends a buffer to the X server, the X server copies the content of the client buffer into the staging buffer it has created for the compositor to source window content from, and releases the client buffer immediately.

This extra copy not only causes extra GPU workload, but also an unsightly tearing effect: the server may be copying new content from the client buffer at the same time as the compositor is rendering. Even worse, whilst an X server running on bare hardware can skip this for full-screen applications if there is only one screen, Xwayland cannot skip it at all.

Roman's work involved a lot of work on the core of the X11 Present extension, allowing it to do copy-free 'flips' for any window if the backend supports it. He then implemented direct flips for Xwayland, so the client buffer would be directly passed through to the Wayland server without an extra copy. This means that windowed GPU-accelerated X11 clients can be faster under Wayland then when running natively under X11.

This work is currently being reviewed, mainly by the tireless Michel Dänzer, and it's looking hopeful for landing quite soon. This will provide a great performance boost to X11 games (such as through Steam) in particular, but even browsers and other clients. Congratulations to Roman for a successful Summer of Code!

A cast of many

Over its various iterations, at Collabora the above work work has also seen work from Derek Foreman, Emil Velikov, Louis-Francis Ratté-Boulianne, Robert Foss, Tomeu Vizoso, and Varad Gautam; externally, in particular Ben Widawsky, Chad Versace, Daniel Vetter, Fabien Dessenne, Jason Ekstrand, Kristian Høgsberg, Michel Dänzer, Sergi Granell, and Tomohito Esaki, have provided assistance, code, review, and moral support. Thanks also to those who have sponsored our work on this: Intel, Google, Renesas, Zodiac Inflight Innovation, as well as Collabora's own internal efforts.