DirectDraw OpenGL renderer

The performance of 2D DirectDraw games is not always that great in Wine. There are a number of reasons for it and most are out of the scope of Wine. Some games can take advantage of Wine's ability to offload DirectDraw to OpenGL and then there's a thing called a DIB engine. You might think OpenGL will solve all the performance problems but in fact this isn't the case and it can be confusing to a lot of users. That's why I decided to explain the issues and the meaning of the registry options.As will be seen the performance gain depends on the way the game is written AND on the OpenGL features offered by your video driver.

DirectDraw

DirectDraw is used by games for transferring surfaces (2D arrays of pixels) to the screen. It is implemented in the DDraw DLL and offers a few different ways of working with surfaces. Most games use a combination of three main mechanisms: locking, blitting and GetDC/ReleaseDC.

Locking

The most basic way is 'LockRect / UnlockRect' which gives a program DIRECT access to the framebuffer of the videocard using a pointer (this is similar to 'DGA'). Games that use this handle do most of their rendering in software and then use these calls to send the final frames to the videocard. Lots of older games like !StarCraft, Red Alert II and others use this.

Blitting

The second mechanism for rendering is blitting. Using these functions programs instruct DirectDraw to update a portion of the screen with the image they specify. You could say that this is similar to locking except that now DirectDraw does all the drawing instead of the game.

GetDC/ReleaseDC

DirectDraw only offers features for drawing surfaces and for instance no mechanism for drawing text. When a game wants to draw texts it should either upload bitmaps with text or they need to render the text on a surface. But as I said before DirectDraw can't render text, so how to do that? Well DirectDraw offers a mechanism 'GetDC / ReleaseDC' which can be used to pass a surface to GDI for rendering.

Performance bottlenecks

By default Wine's DirectDraw uses GDI which in turn uses X but this way of rendering has certain performance problems.

Depth

First of all X can only render at the depth at which your desktop is running. This means that when a game uses 16bit and your X is at 24bit, there is a problem. What Wine does in such a case is to convert all the colors to 24bit using a DIB. This is slow and it is especially bad if a game uses 8bit as in that case a palette lookup needs to be performed for each single pixel.

Direct framebuffer access

As explained DirectDraw is about 'direct framebuffer access'. The problem is that Xorg doesn't offer such a mechanism. Long ago this was possible using DGA but DGA had various limitations for instance it wasn't secure and second it also couldn't be mixed with X11 drawing and OpenGL. This would limit to a few games. Wine emulates the framebuffer with a memory buffer and when needed copies the data to the card using plain X calls. This isn't very efficient and remember there is also depth conversion which is needed most of the time.

GetDC/ReleaseDC

Using GetDC/ReleaseDC games can use standard GDI for drawing. The main problem is that all GDI drawing is done through X even offscreen drawing. Under normal conditions this already isn't nice as a roundtrip to the Xserver takes a short amount of time. It is especially problematic for games when there is a depth mismatch. All drawing operations need to be converted twice (first 16bit -> 24bit and then 24bit -> 16bit as the game uses e.g. 16bit).

DirectDraw OpenGL

As has been pointed out DirectDraw's gdi renderer has performance problems due to depth conversion, the lack of direct framebuffer access and GetDC/ReleaseDC. I will move along each of the three bottlenecks and explain what OpenGL can offer. If you want to use OpenGL for DirectDraw, set DirectDrawRenderer to OpenGL

Depth

OpenGL supports a large number of texture formats each with different depths. If we would upload the DirectDraw surfaces in textures, this would allow us to do the depth conversion for free. This works fine but not all DirectDraw formats exist in OpenGL on all videocards. A good example is support for 8bit palettes. Older Geforce cards offer 'GL_EXT_paletted_texture' which gives an 8bit format and when this is used (e.g. in case of StarCraft) the performance is excellent. On modern Geforce/Radeon cards with fragment shaders we do the same conversion using shaders but on other hardware we convert all surfaces in software which is SLOW :(

Direct framebuffer access

OpenGL doesn't provide direct framebuffer access but using GL_ARB_pixel_buffer_object we can get something which is almost similar to it. When this extension is present you can get a pointer to memory (DMA memory or even video memory but you don't know what it is). Its main use is that you can upload/download data asynchronously. This means that you don't have to wait until the data is sent to the card but you can already start doing other things. This heavily improves performance. Note GL_ARB_pixel_buffer_object is not around on all drivers, Nvidia offers it and ATI these days as well though not in all fglrx releases.

Even when PBOs aren't around OpenGL's uploading mechanism on most driver is at least as fast as X or faster. OpenGL offers two ways of moving data to the card either in the form of textures or by directly accessing the framebuffer using glReadPixels/glDrawPixels. Which mechanism is faster depends on the hardware. On modern Nvidia cards textures appear to be fast for uploads but on older ATI cards for instance glDrawPixels was faster. The registry setting 'RenderTargetLockMode' defines how to do the upload and the download of data. It works in all cases, so also without PBOs. The values are: readdraw [default] - use glReadPixels for download, use glDrawPixels for upload readtex - use glReadPixels for download, but use textures for upload texdraw - use textures for download, but use glDrawPixels for upload textex - use textures for both upload and download

GetDC/ReleaseDC

When a game uses GetDC for drawing text we are in big troubles, in even bigger ones than in the case of GDI. In case of GDI we basically use a software renderer and in that case we also have the latest framebuffer image somewhere in memory. We can then directly pass this memory to GetDC and the game can use GDI for drawing. When using OpenGL we most of the time don't have the latest framebuffer image in memory and we need to download it back from the card. This takes time and is slow. Then when we finally got a copy we need to pass it to GDI which in its turn might have to do all the depth conversion magic. After the game is done with GDI we need to reupload the image to the card. Summarized GetDC/ReleaseDC is very painful for GL.

Summary

As explained standard DirectDraw GDI has performance limitations. Depending on what DirectDraw rendering mechanisms a game uses, OpenGL can fix the bottlenecks when the right OpenGL extensions are around. In such a case the game can run close to its native speed. In case a game uses GetDC/ReleaseDC, OpenGL can only make the situation worse and it will worsen the performance. What's the final solution to the performance problems? For a part this is the DIB engine. This allows us to do all offscreen drawing in software without requiring a roundtrip to X. This will save a lot of time for GetDC/ReleaseDC purposes.