Completing the DRI3 Extension

This week marks a pretty significant milestone for the glorious DRI3000 future. The first of the two new extensions is complete and running both full Gnome and KDE desktops.

DRI3 Extension Overview

The DRI3 extension provides facilities for building direct rendering libraries to work with the X window system. DRI3 provides three basic mechanisms:

Open a DRM device. Share kernel objects associated with X pixmaps. The direct rendering client may allocate kernel objects itself and ask the X server to construct a pixmap referencing them, or the client may take an existing X pixmap and discover the underlying kernel object for it. Synchronize access to the kernel objects. Within the X server, Sync Fences are used to serialize access to objects. These Sync Fences are exposed via file descriptors which the underlying driver can use to implement synchronization. The current Intel DRM driver passes a shared page containing a Linux Futex.

Opening the DRM Device

Ideally, the DRM application would be able to just open the graphics device and start drawing, sending the resulting buffers to the X server for display. There's work going on to make this possible, but the current situation has the X server in charge of 'blessing' the file descriptors used by DRM clients.

DRI2 does this by having the DRM client fetch a 'magic cookie' from the kernel and pass that to the X server. The cookie is then passed to the kernel which matches it up with the DRM client and turns on rendering access for that application.

For DRI3, things are much simpler—the DRM client asks the X server to pass back a file descriptor for the device. The X server opens the device, does the magic cookie dance all by itself (at least for now), and then passes the file descriptor back to the application.

┌─── DRI3Open drawable: DRAWABLE driverType: DRI3DRIVER provider: PROVIDER ▶ nfd: CARD8 driver: STRING device: FD └─── Errors: Drawable, Value, Match This requests that the X server open the direct rendering device associated with drawable, driverType and RandR provider. The provider must support SourceOutput or SourceOffload. The direct rendering library used to implement the specified 'driverType' is returned in 'driver'. The file descriptor for the device is returned in 'device'. 'nfd' will be set to one (this is strictly a convenience for XCB which otherwise would need request-specific information about how many file descriptors were associated with this reply).

Sharing Kernel Pixel Buffers

An explicit non-goal of DRI3 is support for sharing buffers that don't map directly to regular X pixmaps. So, GL ancillary buffers like depth and stencil just don't apply here.

The shared buffers in DRI3 are regular X pixmaps in the X server. This provides a few obvious benefits over the DRI2 scheme: In the kernel, the buffers are referenced by DMA-BUF handles, which provides a nice driver-independent mechanism.

Lifetimes are easily managed. Without being associated with a separate drawable, it's easy to know when to free the Pixmap. Regular X requests apply directly. For instance, copying between buffers can use the core CopyArea request.

To create back- and fake-front- buffers for Windows, the application creates a kernel buffer, associates a DMA-BUF file descriptor with that and then sends the fd to the X server with a pixmap ID to create the associated pixmap. Doing it in this direction avoids a round trip.

┌─── DRI3PixmapFromBuffer pixmap: PIXMAP drawable: DRAWABLE size: CARD32 width, height, stride: CARD16 depth, bpp: CARD8 buffer: FD └─── Errors: Alloc, Drawable, IDChoice, Value, Match Creates a pixmap for the direct rendering object associated with 'buffer'. Changes to pixmap will be visible in that direct rendered object and changes to the direct rendered object will be visible in the pixmap. 'size' specifies the total size of the buffer bytes. 'width', 'height' describe the geometry (in pixels) of the underlying buffer. 'stride' specifies the number of bytes per scanline in the buffer. The pixels within the buffer may not be arranged in a simple linear fashion, but 'size' will be at least 'height' * 'stride'. Precisely how any additional information about the buffer is shared is outside the scope of this extension. If buffer cannot be used with the screen associated with drawable, a Match error is returned. If depth or bpp are not supported by the screen, a Value error is returned.

To provide for texture-from-pixmap, the application takes the pixmap ID and passes that to the X server which returns the a file descriptor for a DMA-BUF which is associated with the underlying kernel buffer.

┌─── DRI3BufferFromPixmap pixmap: PIXMAP ▶ depth: CARD8 size: CARD32 width, height, stride: CARD16 depth, bpp: CARD8 buffer: FD └─── Errors: Pixmap, Match Pass back a direct rendering object associated with pixmap. Changes to pixmap will be visible in that direct rendered object and changes to the direct rendered object will be visible in the pixmap. 'size' specifies the total size of the buffer bytes. 'width', 'height' describe the geometry (in pixels) of the underlying buffer. 'stride' specifies the number of bytes per scanline in the buffer. The pixels within the buffer may not be arranged in a simple linear fashion, but 'size' will be at least 'height' * 'stride'. Precisely how any additional information about the buffer is shared is outside the scope of this extension. If buffer cannot be used with the screen associated with drawable, a Match error is returned.

Tracking Window Size Changes

When Eric Anholt and I first started discussing DRI3, we hoped to avoid needing to learn about the window size from the X server. The thought was that the union of all of the viewports specified by the application would form the bounds of the drawing area. When the window size changed, we expected the application would change the viewport.

Alas, this simple plan isn't sufficient here—a few GL functions are not limited to the viewport. So, we need to track the actual window size and monitor changes to it.

DRI2 does this by delivering 'invalidate' events to the application whenever the current buffer isn't valid; the application discovers that this event has been delivered and goes to as the X server for the new buffers. There are a couple of problems with this approach:

Any outstanding DRM rendering requests will still draw to the old buffers. The Invalidate events must be captured before the application sees the related ConfigureNotify event so that the GL library can react appropriately.

The first problem is pretty intractable within DRI2—the application has no way of knowing whether a frame that it has drawn was delivered to the correct buffer as the underlying buffer object can change at any time. DRI3 fixes this by having the application in control of buffer management; it can easily copy data from the previous back buffer to the new back buffer synchronized to its own direct rendering.

The second problem was solved in DRI2 by using the existing Xlib event hooks; the GL library directly implements the Xlib side of the DRI2 extension and captures the InvalidateBuffers events within that code, delivering those to the driver code. The problem with this solution is that Xlib holds the Display structure mutex across this whole mess, and Mesa must be very careful not to make any Xlib calls during the invalidate call.

For DRI3, I considered placing the geometry data in a shared memory buffer, but my future plans for the "Present" extension led me to want an X event instead (more about the "Present" extension in a future posting).

An X ConfigureNotify event is sufficient for the current requirements to track window sizes accurately. However, there's no easy way for the GL library to ensure that ConfigureNotify events will be delivered to the application—other application code may (and probably will) adjust the window event mask for its own uses. I considered adding the necessary event mask tracking code within XCB, but again, knowing that the "Present" extension would probably need additional information anyhow, decided to create a new event instead.

Using an event requires that XCB provide some mechanism to capture those events, keep them from the regular X event stream, and deliver them to the GL library. A further requirement is that the GL library be absolutely assured of receiving notification about these events before the regular event processing within the application will see a core ConfigureNotify event.

The method I came up with for XCB is fairly specific to my requirements. The events are always 'XGE' events, and are tagged with a special 'event context ID', an XID allocated for this purpose. The combination of the extension op-code, the event type and this event context ID are used to split off these events to custom event queues using the following APIs:

/** * @brief Listen for a special event */ xcb_special_event_t *xcb_register_for_special_event(xcb_connection_t *c, uint8_t extension, uint16_t evtype, uint32_t eid, uint32_t *stamp);

This creates a special event queue which will contain only events matching the specified extension/type/event-id triplet.

/** * @brief Returns the next event from a special queue */ xcb_generic_event_t *xcb_check_for_special_event(xcb_connection_t *c, xcb_special_event_t *se);

This pulls an event from a special event queue. These events will not appear in the regular X event queue and so applications will never see them.

There's one more piece of magic here—the 'stamp' value passed to xcb_register_for_special_event. This pointer refers to a location in memory which will be incremented every time an event is placed in the special event queue. The application can cheaply monitor this memory location for changes and known when to check the queue for events.

Within GL, the value used is the existing dri2 'stamp' value. That is checked at the top of the rendering operation; if it has changed, the drawing buffers will be re-acquired. Part of the buffer acquisition process is a check for special events related to the window.

For now, I've placed these events in the DRI3 extension. However, they will move to the Present extension once that is working.

┌─── DRI3SelectInput eventContext: DRI3EVENTID window: WINDOW eventMask: SETofDRI3EVENT └─── Errors: Window, Value, Match, IDchoice Selects the set of DRI3 events to be delivered for the specified window and event context. DRI3SelectInput can create, modify or delete event contexts. An event context is associated with a specific window; using an existing event context with a different window generates a Match error. If eventContext specifies an existing event context, then if eventMask is empty, DRI3SelectInput deletes the specified context, otherwise the specified event context is changed to select a different set of events. If eventContext is an unused XID, then if eventMask is empty no operation is performed. Otherwise, a new event context is created selecting the specified events.

The events themselves look a lot like a configure notify event:

┌─── DRI3ConfigureNotify type: CARD8 XGE event type (35) extension: CARD8 DRI3 extension request number length: CARD16 2 evtype: CARD16 DRI3_ConfigureNotify eventID: DRI3EVENTID window: WINDOW x: INT16 y: INT16 width: CARD16 height: CARD16 off_x: INT16 off_y: INT16 pixmap_width: CARD16 pixmap_height: CARD16 pixmap_flags: CARD32 └─── 'x' and 'y' are the parent-relative location of 'window'.

Note that there are a couple of odd additional fields—off_x, off_y, pixmap_width, pixmap_height and pixmap_flags are all place-holders for what I expect to end up in the Present extension. For now, in DRI3, they should be ignored.

Synchronization

The DRM application needs to know when various X requests related to its buffers have finished. In particular, when performing a buffer swap, the client wants to know when that completes, and be able to block until it has. DRI2 does this by having the application make a synchronous request from the X server to get the names of the new back buffer for drawing the next frame. This has two problems:

The synchronous round trip to the X server isn't free. Other running applications may cause fairly arbitrary delays in getting the reply back from the X server. Synchronizing with the X server doesn't ensure that GPU operations are necessarily serialized between the application and the X server.

What we want is a serialization guarantee between the X server and the DRM application that operates at the GPU level.

I've written a couple of times (dri3k first steps and Shared Memory Fences) about using X Sync extension Fences (created by James Jones and Aaron Plattner) for this synchronization and wanted to get a bit more specific here.

With the X server, a Sync extension Fence is essentially driver-specific, allowing the hardware design to control how the actual synchronization is performed. DRI3 creates a way to share the underlying operating system object by passing a file descriptor from application to the X server which somehow references that device object. Both sides of the protocol need to tacitly agree on what it means.

┌─── DRI3FenceFromFD drawable: DRAWABLE fence: FENCE initially-triggered: BOOL fd: FD └─── Errors: IDchoice, Drawable Creates a Sync extension Fence that provides the regular Sync extension semantics along with a file descriptor that provides a device-specific mechanism to manipulate the fence directly. Details about the mechanism used with this file descriptor are outside the scope of the DRI3 extension.

For the current GEM kernel interface, because all GPU access is serialized at the kernel API, it's sufficient to serialize access to the kernel itself to ensure operations are serialized on the GPU. So, for GEM, I'm using a shared memory futex for the DRI3 synchronization primitive. That does not mean that all GPUs will share this same mechanism. Eliminate the kernel serialization guarantee and some more GPU-centric design will be required.

What about Swap Buffers?

None of the above stuff actually gets bits onto the screen. For now, the GL implementation is simply taking the X pixmap and copying it to the window at SwapBuffers time. This is sufficient to run applications, but doesn't provide for all of the fancy swap options, like limiting to frame rate or optimizing full-screen swaps.

I've decided to relegate all of that functionality to the as-yet-unspecified 'Present' extension.

Because the whole goal of DRI3 was to get direct rendered application contents into X pixmaps, the 'Present' extension will operate on those X objects directly. This means it will also be usable with non-DRM applications that use simple X pixmap based double buffering, a class which includes most existing non-GL based Gtk+ and Qt applications.

So, I get to reduce the size of the DRI3 extension while providing additional functionality for non direct-rendered applications.

Current Status

As I said above, all of the above functionality is running on my systems and has booted both complete KDE and Gnome sessions. There have been some recent DMA-BUF related fixes in the kernel, so you'll need to run the latest 3.9.x stable release or a 3.10 release candidate.

Here's references to all of the appropriate git repositories:

DRI3 protocol and spec:

git://people.freedesktop.org/~keithp/dri3proto master

XCB protocol

git://people.freedesktop.org/~keithp/xcb/proto.git dri3

XCB library

git://people.freedesktop.org/~keithp/xcb/libxcb.git dri3

xshmfence library:

git://people.freedesktop.org/~keithp/libxshmfence.git master

X server:

git://people.freedesktop.org/~keithp/xserver.git dri3

Mesa:

git://people.freedesktop.org/~keithp/mesa.git dri3

Next Steps

Now it's time to go write the Present extension and get that working. I'll start coding and should have another posting here next week.