The state of Nouveau, part I

Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

Nouveau is an effort to create a complete open source driver for NVidia graphics cards for X.org. It aims to support 2D and 3D acceleration from the early NV04 cards up to the latest G80 Cards and work across all supported architectures like x86-64, PPC and x86. The project originated when Stéphane Marchesin set out to de-obfuscate parts of the NVidia-maintained nv driver. However, NVidia had corporate policies in place about the nv driver, and had no plans to change them at the time. So they refused Stéphane's patches.

This left Stéphane with the greatest open source choice: "fork it"! At FOSDEM in February 2006, Stéphane unveiled his plans for an open source driver for NVidia hardware called Nouveau. The name was suggested by his IRC client's French autoreplace feature which suggested the word "nouveau" when he typed "nv". People liked it, so the name stuck. The FOSDEM presentation got the project enough publicity to engage the curiosity of other developers.

Ben Skeggs was one of the first developers to sign up. He had worked on reverse engineering the R300 (one of ATI graphics chips) shader components and writing parts of the R300 driver; as a result, he had great experience with graphics drivers. He initially showed interest in the NV40 shaders only, but he got caught in the event horizon and has worked on every aspect of the driver for NV40 and later cards.

The project engaged other developers with short and long term interest. It also generated a large amount of interest due to a pledge drive that an independent user started.

However, the project was mainly developed on IRC and it was quite difficult for newcomers to get any insight into previous development; reading IRC logs is unpractical at best. With this in mind, KoalaBR decided to start summarizing development in a series of articles known as the TiNDC (The irregular Nouveau Development Companion). This series of articles proved very useful for attracting developers and testers to the project. TiNDC issues are published every two to four weeks; as of this writing, the current issue is TiNDC #34.

Linux.conf.au 2007 saw the first live demo of Nouveau. Dave Airlie had signed up to give a talk on the subject; he managed to persuade Ben Skeggs that showing a working glxgears demo would be a great finish to the talk. Ben toiled furiously with the other developers to get the init code into shape for his laptop card and the presentation was a great success.

After missing a Google Summer of Code place, X.org granted Nouveau a Vacation of Code alternative. This saw Arthur Huillet join the team to complete proper Xv support on Nouveau. Arthur saw the light and continued with the project once the VoC ended. In autumn 2007 Stuart Bennett and Maarten Maathuis vowed to get Nouveau's RandR1.2 into a better shape. Since then a steady stream of patches has advanced the code greatly.

The project now has 8 regular contributors (Stéphane Marchesin, Ben Skeggs, Patrice Mandin, Arthur Huillet, Pekka Paalanen, Maarten Maathuis, Peter Winters, Jeremy Kolb, Stuart Bennett) with many more part time contributors, testers, writers and translators.

NVidia card families

This article will use the NVidia GPU technical names as opposed to marketing names.

GPU name Product name(s) NV04/05 Riva TNT, TNT2 NV1x GeForce 256, GeForce 2, GeForce 4 MX NV2x GeForce 3, GeForce 4 Ti NV3x GeForce 5 NV4x(G7x) GeForce 6, GeForce 7 NV5x(G8x) GeForce 8

Where there are "N" and "G" naming the "N" variant (NV4x, NV5x) will be used. Further information can be found on the Nouveau site.

Graphic Stack Overview

Before jumping into the Nouveau driver, this section provides a short background on the mess that is the Linux graphics stack. This stack has a long history dating back to Unix X servers and the XFree86 project. This history has lead to a situation quite unlike the driver situation for any other device on a Linux system. The graphics drivers existed mainly in user space, provided by the XFree86 project, and little or no kernel interaction was required. The user-space component known as the DDX (Device-Dependant X) was responsible for initializing the card, setting modes and providing acceleration for 2D operations.

The kernel also provided framebuffer drivers on certain systems to allow a usable console before X started. The interaction between these drivers and the X.org drivers was very complex and often caused many problems regarding which driver "owned" the hardware.

The DRI project was started to add support for direct rendering of 3D applications on Linux. This meant that an application could talk to the 3D hardware directly, bypassing the X server. OpenGL was the standard 3D API, but it is a complex interface which is definitely too large to implement in-kernel. GPUs also provided completely different low-level interfaces. So, due to the complexity of the higher level interface and nonstandard nature of the hardware APIs, a kernel component (DRM) and a userspace driver (DRI) were required to securely expose the hardware interfaces and provide the OpenGL API.

Shortcomings of the current architecture have been noted over the past few years; the current belief is that GPU initialization, memory management, and mode setting need to migrate to the kernel in order to provide better support for features such as suspend/resume, proper cohabitation of X and framebuffer driver, kernel error reporting, and future graphics card technologies.

The GPU memory manager implemented by Tungsten Graphics is known as TTM. It was originally designed as a general VM memory manager but initially targeted at Intel hardware. On top of this memory manager, a new modesetting architecture for the kernel is being implemented. This is based on the RandR 1.2 work found in the X.org server.

GPU architecture

Graphics cards are programmed in numerous ways, but most initialization and mode setting is done via memory-mapped IO. This is just a set of registers accessible to the CPU via its standard memory address space. The registers in this address space are split up into ranges dealing with various features of the graphics card such as mode setup, output control, or clock configuration. A longer explanation can be found on Wikipedia.

Most recent GPUs also provide some sort of command processing ability where tasks can be offloaded from the CPU to be executed on the GPU, reducing the amount of CPU time required to execute graphical operations. This interface is commonly a FIFO implemented as a circular ring buffer into which commands are pushed by the CPU for processing by the GPU. It is located somewhere in a shared memory area (AGP memory, PCIGART, or video RAM). The GPU will also have a set of state information that is used to process these commands, usually known as a context.

Most modern GPUs only contain a single command processing state machine. However NVidia hardware has always contained multiple independent "channels" which consist of a private FIFO (push buffer), a graphics context and a number of context objects. The push buffer contains the commands to be processed by the card. The graphics context stores application specific data such as matrices, texture unit configuration, blending setup, shader information etc. Each channel has 8 subchannels to which graphics objects are bound in order to be addressed by FIFO commands.

Each NVidia card provides between 16 and 128 channels, depending on model; these are assigned to different rendering-related tasks. Each 3D client has an associated channel, while some are reserved for use in the kernel and the X server. Channels are context-switched by software via an interrupt (on older cards) or automatically by the hardware on cards after the NV30.

Now what to store within the FIFO? Each NVidia card offers a set of objects, each of which provide a set of methods related to a given task, e.g. DMA memory transfers or rendering. Those methods are the ones used by the driver (or on a higher level, the rendering application). Whenever a client connects, it uses an ioctl() to create the channel. After that the client creates the objects it needs via an additional ioctl() .

Currently we do have two types of possible clients: X (via the DDX driver) and OpenGL via DRI/MESA. An accelerated framebuffer using the new mode setting architecture (nouveaufb) will also be a future client to avoid conflicts with nvidiafb.

Let's have a look at a small number of objects:

object name Description Available on NV_IMAGE_BLIT 2D engine, blit image from one image into another one NV03 NV04 NV10 NV20 NV12_IMAGE_BLIT An enhanced version of the above NV11 NV20 NV20 NV30 NV40 NV_MEMORY_TO_MEMORY_FORMAT DMA memory transfer NV04 NV10 NV20 NV30 NV40 NV50

From this list, you can see that there are object types which are available on all cards (NV_MEMORY_TO_MEMORY_FORMAT) while others are only available on certain cards. For example, each class of card has its own 3D-engine object, such as NV10TCL on NV1x and NV20TCL on NV2x. An object is identified by a unique number: its "class". This id is 0x5f for NV_IMAGE_BLIT, 0x9f for NV12_IMAGE_BLIT and 0x39 for NV_MEMORY_TO_MEMORY_FORMAT. If you want to use functionality provided by a given object, you must first bind this object to a subchannel. The card provides a certain number of subchannels which correspond to a certain number of "active" (or "bound") objects.

A command in the FIFO is made of a command header, followed by one or more parameters. The command header usually contains the subchannel number, the method offset to be called, and the number of parameters (a command header can also define a jump in the FIFO but this is outside the scope of this document). Each method the object provides has an offset which has to be set in the command. In order to limit the number of command headers to be written, thereby improving performance, NVidia cards will call several subsequent methods in a row if you provide several parameters.

How do we refer to an object? The data written to the FIFO doesn't hold any info about that... Binding an object to a subchannel is done by writing the object ID as an argument to method number 0. For example: 00044000 5c00000c binds object id 5c00000c to subchannel 2. This object ID is used as a key in a hash table kept in the card's memory which is filled up when creating objects.

The creation of an object relies on special memory areas. RAMIN is "instance memory", an area of memory through which the graphics engines of the card are configured. A RAMIN area is present on all NVIDIA chipsets in some form, but it has evolved quite a bit as newer chipsets have been released. Basically, RAMIN is what contains the objects. An object is usually not big (128 bytes in general, up to a few kilobytes in case of DMA transfer objects).

Card-specific RAMIN areas Pre-NV40 Area of dedicated internal memory accessible through the card's MMIO registers. NV4x A 16MiB PCI resource is used to access PRAMIN. This resource maps over the last 16MiB of VRAM. The first 1MiB of PRAMIN is also accessible through the (now "legacy") MMIO PRAMIN aperture. NV5x A 32MiB PCI resource, which is unusable in the default power-on state of the card. It can be configured in a variety of different ways through the NV5x virtual memory. The legacy MMIO aperture can be re-mapped over any 1MiB of VRAM that's desired.

There are also a few specific areas in RAMIN that are worth mentioning:

RAMFC, the FIFO Context Table. It is a global table that stores the configuration/state of the FIFO engine for each channel. It doesn't exist in the same way on NV5x, where the FIFO has registers that contain pointers to each channel's PFIFO state, rather than a single global table.

RAMHT, the FIFO hash table. A global table, used by PFIFO to locate context objects, except on NV5x, where each channel has its own hash table.

Additional information can be found on the Nv object types and Honza Havlicek pages on the Nouveau site.



