LCA: The ways of Wayland

This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

Collabora's Daniel Stone presented the final piece of the linux.conf.au 2013 display server triptych, which started with a pair of talks from Keith Packard and David Airlie. Stone explained the concepts behind Wayland and how it relates to X11—because, as he put it, "everything you read on the Internet about it will be wrong."

The Dark Ages

Stone, who said that he was "tricked into" working on X about ten years ago, reviewed X11's history, starting with the initial assumption of single-keyboard, single-mouse systems with graphics hardware focused on drawing rectangles, blitting images, and basic window management. But then, he continued, hardware got complicated (from multiple input devices to multiple GPUs), rendering got complicated (with OpenGL and hardware-accelerated video decoding), and window management got awful (with multiple desktop environments, new window types, and non-rectangular windows). As time passed, things slowly got out of hand for X; what was originally a well-defined mechanism swelled to incorporate dozens of protocol extensions and thousands of pages of specifications—although on the latter point, Packard chimed in to joke that the X developers never wrote anything that could be called specifications.

The root of the trouble, Stone said, was that—thanks to politics and an excessive commitment to maintaining backward compatibility even with ancient toolkits—no one was allowed to touch the core protocol or the X server core, even as the needs of the window system evolved and diverged. For one thing, the XFree86 project, where much of the development took place, was not itself the X Consortium. For another, "no one was the X Consortium; they weren't doing anything." As a result, more and more layers got wrapped around the X server, working around deficiencies rather than fixing them. Eventually, the X server evolved into a operating system: it could run video BIOSes, manage system power, perform I/O port and PCI device management, and load multiple binary formats. But in spite of all these features, he continued, it was "the dumbest OS you've ever seen." For example, it could generate a configuration file for you, but it was not smart enough to just use the correct configuration.

Light at the end of the tunnel

Things did improve, he said. When the X.Org Foundation was formed, the project gained a cool domain name, but it also undertook some overdue development tasks, such as modularizing the X server. The initial effort may have been too modular, he noted, splitting into 345 git modules, but for the most part it was a positive. Using autotools, the X server was actually buildable. Modularization allowed X developers to excise old and unused code; Stone said the pre-refactoring xserver 1.0.2 release contained 879,403 lines, compared to 562,678 lines today.

But soon they began adding new features again; repeating the pile-of-extensions model. According to his calculations, today X includes a new drawing model (XRender), four input stacks (core X11, XInput 1.0, 2.0, and 2.2), five display management extensions (core X11, Xinerama, and the three generations of RandR that Airlie spoke about), and four buffer management models (core X11, DRI, MIT-SHM, and DRI2). At that point, the developers had fundamentally changed how X did everything, and as users wanted more and more features, those features got pushed out of X and into the client side (theming, fonts, subwindows, etc.), or to the window manager (e.g., special effects).

That situation leaves the X server itself with very little to do. Client applications draw everything locally, and the X server hands the drawing to the window manager to render it. The window manager hands back the rendered screen, and the X server "does what it's told" and puts it on the display. Essentially, he said, the X server is nothing but a "terrible, terrible, terrible" inter-process communication (IPC) bus. It is not introspectable, and it adds considerable (and variable) overhead.

Wayland, he said, simply cuts out all of the middleman steps that the X server currently consumes CPU cycles performing. Client applications draw locally, they tell the display server what they have drawn, and the server decides what to put onto the display and where. Commenters in the "Internet peanut gallery" sometimes argue that X is "the Unix way," he said. But Wayland fits the "do one thing, do it well" paradigm far better. "What one thing is X doing, and what is it doing well?"

The Wayland forward

Stone then turned his attention to providing a more in-depth description of how Wayland works. The first important idea is that in Wayland, every frame is regarded as "perfect." That is, the client application draws it in a completed form, as opposed to X, where different rectangles, pixmaps, and text can all be sent separately by the client, which can result in inconsistent on-screen behavior. DRI2 almost—but not quite—fixed this, but it had limitations (chiefly that it had to adhere to the core X11 protocol).

Wayland is also "descriptive" and not "prescriptive," he said. For example, in X, auxiliary features like pop-up windows and screensavers are treated exactly like application windows: they grab keyboard and window input and must be positioned precisely on screen. Unpleasant side effects result, such as being unable to use the volume keys when a screensaver is active, and being unable to trigger the screensaver when a menu is open on the screen. With Wayland, in contrast, the application tells the server that a frame is a pop-up and lets the compositor decide how to handle it. Yes, he said, it is possible that someone would write a bad compositor that would mishandle such a pop-up—but that is true today as well. Window managers are also complex today; the solution is to not run the bad ones.

Wayland also uses an event-driven model, which simplifies (among other things) listening for input devices. Rather than asking the server for a list of initial input devices which must be parsed (and is treated separately from subsequent device notifications), clients simply register for device notifications, and the Wayland server sends the same type of message for existing devices as it does for any subsequent hot-plugging events. Wayland also provides "proper object lifetimes," which eliminates X11's fatal-by-default and hard-to-work-around BadDevice errors. Finally, it side-steps the problem that can occur when a toolkit (such as GTK+ or Clutter) and an application support different versions of the XInput extension. In X, the server only gets one report from the application about which version is supported; whether that equals the toolkit or the application's version is random. In Wayland, each component registers and listens for events separately.

Go Weston

Stone capped off the session with a discussion about Weston, the reference implementation of a Wayland server, its state of readiness, and some further work still in the pipeline. Weston is reference code, he explained. Thus it has plugin-based "shells" for common desktop features like docks and panels, and it supports existing X application clients. It offers a variety of output and rendering choices, including fbdev and Pixman, which he pointed out to refute the misconception that Wayland requires OpenGL. It also supports hardware video overlays, which he said will be of higher quality than the X implementation.

The GNOME compositor Mutter has an out-of-date port to Wayland, he continued, making it in essence a hybrid X/Wayland compositor as is Weston. GNOME Shell used to run on Mutter's Wayland implementation, he said, or at least "someone demoed it once in July ... so it's ready for the enterprise." In fact, Stone is supposed to bring the GNOME Shell code up to date, but he has not yet had time. There are implementations for GTK+, Clutter, and Qt all in upstream git, and there is a Gstreamer waylandvideosink element, although it needs further work. In reply to a question from the audience, Stone also commented that Weston's touchpad driver is still incomplete, lacking support for acceleration and scrolling.

Last but clearly not least, Stone addressed the state of Wayland support for remoting. X11's lousy implementation of IPC, he said, in which it acts as a middleman between the client and compositor, hits its worst-case performance when being run over the Internet. Furthermore, the two rendering modes every application uses (SHM and DRI2), do not work over the network anyway. The hypothetical "best" way to implement remoting support, he explained, would be for the client application to talk to the local compositor only, and have that compositor speak to the remote compositor, employing image compression to save bandwidth. That, he said, is precisely what VNC does, and it is indeed better than X11's remote support. Consequently, Wayland developer Kristian Høgsberg has been experimenting with implementing this VNC-like remoting support in Weston, which has its own branch interested parties can test. "We think it's going to be better at remoting than X," Stone said, or at least it cannot be worse than X.

For end users, it will still be a while before Wayland is usable on Linux desktops outside of experimental circumstances. The protocol was declared 1.0 in October 2012, as was Weston, but Weston is still a reference implementation (lacking features, as Stone described in his talk). It may be a very long time before applications are ported from X11 to Wayland, but by providing a feature-by-feature comparison of Wayland's benefits over X, Stone has crafted a good sales pitch for both application developers and end users.

