ODAY NVIDIA IS refreshing its GeForce FX line at the top end and in the mid range. The new mid-range GeForce FX 5700 Ultra is arguably the biggest news, but the burden of spending time with the $499 graphics cards has fallen to me this time out. (Geoff gets to refreshing its GeForce FX line at the top end and in the mid range. The new mid-range GeForce FX 5700 Ultra is arguably the biggest news, but the burden of spending time with the $499 graphics cards has fallen to me this time out. (Geoff gets to play with the $199 cards .) NVIDIA’s new flagship card is the top-of-the-line GeForce FX 5950 Ultra, which packs quite a wallop with 475MHz clock speeds and over 30GB/s of memory bandwidth. This upper-middle-class playtoy is the pinnacle of NVIDIA’s accomplishments in graphics. But is it fast enough to knock off ATI’s brand-new Radeon 9800 XT? We’re about to find out. The 5950 Ultra takes a bow

In case you’ve been living under a rock or, heaven forbid, away from the computer screen for a few months, I should mention that NVIDIA’s previous top-end graphics card was the GeForce FX 5900 Ultra. As you might imagine, the 5950 Ultra replaces the 5900 Ultra, which makes sense, because the 5950 Ultra is just an amped-up version of the 5900 Ultra. The NV38 GPU core on the 5950 Ultra is a tweaked version of the NV35 core found in the 5900 Ultra. Specifically, NVIDIA and its manufacturing partner, TSMC, have improved the NV38 through changes to the chip manufacturing process, including the use of a low-k dielectric to help improve clock speeds. As a result, the NV38 runs at 475MHz, up 25MHz from the 5900 Ultra. Beyond the manufacturing changes, there’s not much new to report in the NV38 chip proper. The GPU is still a four-pipe design with two texture units per pipe, and beyond that, much of the GPU’s internal architecture remains a bit of a mystery. The GeForce FX 5950 Ultra card, however, is rather different from its predecessor. The first thing you’ll likely notice is the new cooler, with its larger blower enclosed in a plastic shroud that pulls air in via a PCI slot opening and blows it across a heatsink on the NV38 chip. You’ve heard of Abit’s Outside Thermal Exhaust System, or OTES. Now meet OTIS, the Outside Thermal Induction System. Not that NVIDIA calls it that, of course. OTIS was the town drunk. NVIDIA claims the 5950 Ultra’s cooler is quieter than the one on the GeForce FX 5900 Ultra, and I suppose that’s true in most cases. I wasn’t displeased at all, though, by the noise levels of the 5900 Ultra cooler. (Both seem whisper quiet compared to the Dustbuster appendage on the old 5800 Ultra cards.) The 5950’s larger blower should move higher air volumes without creating as much noise, and this cooler design definitely channels air with more discipline. I am puzzled, however, by NVIDIA’s decision to use a simple metal plate to pull heat away from the memory chips on the front of the card. The 5900 Ultra’s memory heatsinks were more… heatsinky. The 5950 Ultra reference card was perfectly stable in my testing, but that green plate got scary hot. Then again, most graphics cards that cost upwards of 300 bucks are scary hot most of the time. The most notable improvement to the 5950 Ultra card isn’t the most visible, however. The biggest change is the faster clock rate on the DDR graphics memory, up 50MHz from 425MHz to 475MHzor, effectively, from 850MHz to 950MHz, once you factor in the double data rate action. That means memory bandwidth is up from 27.2GB/s to 30.4GB/swell above the Radeon 9800 XT’s comparatively wimpy 23.2GB/s. Incidentally, the 5950 Ultra won’t have a non-Ultra “GeForce FX 5950” counterpart. Instead, the current GeForce FX 5900 will continue as NVIDIA’s offering at the $299 price point.

Pushin’ pixels

Let’s have a look at how the GeForce FX 5950 Ultra stacks up against the present and past competition in the high end of the market. These cards cost five hundred bucks for a reason, and that’s because they are supposed to offer better performance in new games, at high resolutions, with all the eye candy turned up. Such speed costs moneymoney for the best chips, money for the fastest RAM. These cards are all about pixel-pushing power, so memory bandwidth and fill rate are two of the key performance factors. The table below will show us the lay of the land, at least in theoretical terms. Core clock (MHz) Pixel pipelines Peak fill rate (Mpixels/s) Texture units per pixel pipeline Peak fill rate (Mtexels/s) Memory clock (MHz) Memory bus width (bits) Peak memory bandwidth (GB/s) GeForce FX 5800 Ultra 500 4 2000 2 4000 1000 128 16.0 Parhelia-512 220 4 880 4 3520 550 256 17.6 Radeon 9700 Pro 325 8 2600 1 2600 620 256 19.8 Radeon 9800 Pro 380 8 3040 1 3040 680 256 21.8 Radeon 9800 Pro 256MB 380 8 3040 1 3040 700 256 22.4 Radeon 9800 XT 412 8 3296 1 3296 730 256 23.4 GeForce FX 5900 Ultra 450 4 1800 2 3600 850 256 27.2 GeForce FX 5950 Ultra 475 4 1900 2 3800 950 256 30.4 Let’s have a look at how the GeForce FX 5950 Ultra stacks up against the present and past competition in the high end of the market. These cards cost five hundred bucks for a reason, and that’s because they are supposed to offer better performance in new games, at high resolutions, with all the eye candy turned up. Such speed costs moneymoney for the best chips, money for the fastest RAM. These cards are all about pixel-pushing power, so memory bandwidth and fill rate are two of the key performance factors. The table below will show us the lay of the land, at least in theoretical terms. As you can see, the 5950 Ultra trails behind the 8-pipe Radeon chips in peak fill rate when only one texture is applied per pixel, but with multiple textures per pixel, the situation is rather dramatically reversed. Since games tend to use a mix of single and multitexturing, neither the GeForce nor the Radeon has a clear advantage here. We can test our theoretical numbers by running some simple fill rate tests on the cards. The results of our synthetic tests track pretty closely with our theory, at least in terms of relative performance. The 5950 Ultra leads the pack in multitexturing, while the 9800 XT leads in single texturing. Of course, fill rate isn’t the only gating factor for performance in modern graphics cards. Newer games and apps use programmable pixel and vertex shaders to achieve advanced visual effects, and proficiency at simple pixel filling won’t suffice in such cases.



NVIDIA’s new push with Detonator 50

NVIDIA is obviously aware of the negative vibe created by GeForce FX performance problems in some DirectX 9 applications, and the company is working to improve both the FX lineup’s image and its performance. One of the keys to that effort is the new Detonator 50 driver, which should be available for download starting today. The Detonator 50 series includes a new run-time compiler intended to produce code better optimized for the GeForce FX architecture, which seems to be especially sensitive to the way code is structured. The compiler embedded in a graphics driver translates API calls into machine code, as you can see in the simple block diagram on the right. All modern GPU drivers have such compilers, and they are especially important for programmable GPUs. NVIDIA claims its new compiler in Detonator 50 should be much more proficient at generating code friendly to the FX architecture. Graphics driver-based compilers present several opportunities for optimization that are especially relevant in the case of NVIDIA’s NV3x chip, which appears to need extra help sometimes. One of the FX’s primary weaknesses is performance in DirectX 9, because the FX hardware doesn’t seem to map very well to the requirements of the API. (Much of the 3DMark03 controversy and the hubbub over Half-Life 2 performance can be traced to this fact.) NVIDIA very carefully alludes to this situation in its whitepaper on its new compiler: Delivering industry-leading graphics solutions entails a broad set of challenges and even some fortune-telling. Hardware designers not only must continually push the performance and functionality forward, but also anticipate the future direction for the major software application programming interfaces (APIs). Even with attention to every detail, coupling a new architecture with the long list of emerging application requirements from the various APIs can be daunting. When a new GPU is released, its new architecture may not suit the latest software programming techniques for one API, yet it may be ideally suited for the programming techniques of another. Hence the GeForce FX’s apparent prowess in the OpenGL-based DOOM 3, and its relative weakness in the DX9-based Half-Life 2. Microsoft seems to have taken a different direction with some portions of DirectX 9 than NVIDIA anticipated. OpenGL is easier for the FX chips, because NVIDIA can map API calls to its hardware more directly by creating its own extensions to OpenGL. NVIDIA claims its new compiler can help bridge the gap in situations of API-hardware mismatch by automatically optimizing the machine code it produces. These optimizations come in several forms. One method is friendlier instruction ordering. When I spoke with NVIDIA’s Chief Scientist, David Kirk, a few weeks back, he indicated better instruction ordering is more important than datatype selection or any other optimization for the FX chips. NVIDIA offers one very simple example of instruction reordering for the FX: going from interleaved math and texture ops to serial operations of the same type (from math-texture-math-texture to math-math-texture-texture). The complier can also translate shorter pixel shader programs that require multiple passes into longer shader programs that require fewer passes. Because NV3x chips can process exceptionally long pixel shader programs, this adjustment makes lots of sense. Also, compilers can work to minimize register use (another rumored FX sticking point). The end result of such changes is pixel shader programs executed in fewer clock cycles. Done correctly, instruction reordering and shader optimization should improve performance without changing the value produced by the calculationthat is, image quality shouldn’t be affected. However, NVIDIA still hasn’t sworn off optimizations that reduce color precision and thus image quality. We know that the FX chips seem to perform much better with lower precision datatypesbest with integer, and better with 16-bit floating point than with 32 bits per color channel. NVIDIA’s complier could conceivably translate higher-precision calculations into lower precision if it decides more precision is unneeded. According to Dr. Kirk, the standard for Pixel Shader 2.0 calculations isn’t 16 or 24 or 32 bits of precision; it’s matching the output generated by Microsoft’s reference rasterizer. NVIDIA’s brief on the new compiler says it won’t reduce image quality, but color precision isn’t discussed. Of course, reductions in color precision can wreak havoc on shader output, especially when datatype selection is poor, so the compiler had best be very careful about making such changes. Some Pixel Shader 2.0 programs might make the transition to integer math gracefully, and others may not. Predicting such outcomes probably isn’t easy to do in all cases, even with a relatively intelligent compiler algorithm. I suspect color precision and datatypes are more important to NV3x performance than NVIDIA is letting on, and I also suspect the fact that DirectX 9’s Pixel Shader 2.0 doesn’t expose NVIDIA’s integer FX12 pixel shaders directly will be an abiding problem for the NV3x chips. (If we knew more about the NV3x’s actual internal structure, we might be able to project better how all of this will likely play out.) ATI probably made the smarter compromise by simply converting all pixel shader calculations to 24-bits of floating-point data per color channel, because its R300-series chips, by all indications, have more peak floating-point processing power than the NV3x series. However, NVIDIA’s hardware does offer higher peak color precision and a flexible, CPU-like set of datatypes for pixel shader calculations. On to the testing…

Please read this next bit carefully. We have plans to test the image quality produced by the 52.16 drivers in some detail, but unfortunately, we weren’t able to do so for this article due to time constraints. For that, I apologize. We have tested the GeForce FX 5950 Ultra and 5900 Ultra with 52.16 drivers in Pixel Shader 2.0 programs. We will address image quality properly and extensively in a future article. Now, on to the benchmarks. As we did in our Radeon 9800 XT review, we have tested the 5950 Ultra almost exclusively in fill rate- and memory bandwidth-limited situations, at very high resolutions and with 4X antialiasing and 8X anisotropic filtering enabled. Some of the newer games are limited by pixel shader power, and higher resolutions will push the pixel shaders, too.

Benchmark results

We’ll start with some older games and move toward some newer games and benchmarks as we go. We’ll start with some older games and move toward some newer games and benchmarks as we go. Quake III Arena Serious Sam SE The 5950 Ultra’s massive memory bandwidth and killer multitextured fill rate both serve it well in these older games.

Unreal Tournament 2003 Wolfenstein: Enemy Territory These somewhat newer first-person shooters present more of a problem for the 5950 Ultra. The Radeon 9800 XT looks especially strong with antialiasing enabled, where it seems to be a relatively better performer. That’s a bit of a surprise, because the 5950 Ultra packs 7GB/s more memory bandwidth than the Radeon 9800 XT.

Splinter Cell Interesting pattern here. Both GeForce FX cards are consistently faster than the Radeon 9800 XT, but the gap widens at 1280×1024, and then narrows again at 1600×1200.

Halo We tested Halo with the “-use20” switch to enable Pixel Shader 2.0 shaders where possible. The 5950 Ultra comes out on top. Gun Metal Tomb Raider: The Angel of Darkness The developers of this game have pulled the V49 patch that allows in-game benchmarking from their website, claiming the test doesn’t reflect gameplay. However, we’ve tested with it out of curiosity, because it seems to be a classic example of the GeForce FX struggling to cope with a DirectX 9 game. Make of the results what you will.

3DMark03 We know NVIDIA’s drivers are packed with application-specific optimizations for 3DMark03, but we’ve thrown in these results anyway, because they are an interesting test case of sorts. After a number of attempts, NVIDIA seems to have gotten its replacement shaders to duplicate the output of FutureMark’s original DirectX 9 shaders with a pretty decent degree of fidelity. The 5950 Ultra can’t quite keep up with the Radeon 9800 XT either overall or in any of 3DMark03’s component tests, but it consistently comes close.

AquaMark The 5950 Ultra does well in AquaMark3, but the tables turn when edge and texture antialiasing are applied.

ShaderMark 2.0

This new version of ShaderMark is an interesting animal. It’s a DirectX 9 benchmark that makes extensive use of Pixel Shader 2.0, and its various shaders are written in Microsoft’s High-Level Shading Language. The registered version of ShaderMark 2.0, which we’re using, can provide “partial precision” hints to tell NV3x cards to use 16 bits of precision per color channel in pixel shader calculations instead of 32. (ATI R3x0 chips always use 24 bits per color channel.) This program also includes some special “2.0+” shaders with extra features for the NV3x GPUs. This new version of ShaderMark is an interesting animal. It’s a DirectX 9 benchmark that makes extensive use of Pixel Shader 2.0, and its various shaders are written in Microsoft’s High-Level Shading Language. The registered version of ShaderMark 2.0, which we’re using, can provide “partial precision” hints to tell NV3x cards to use 16 bits of precision per color channel in pixel shader calculations instead of 32. (ATI R3x0 chips always use 24 bits per color channel.) This program also includes some special “2.0+” shaders with extra features for the NV3x GPUs. Some of the shaders would not run on the GeForce FX cards. As I understand it, one problem is the FX cards’ lack of support for floating-point texture formats. NVIDIA says the FX GPUs support floating-point texture formats in hardware, but adding this support to its drivers hasn’t been a priority. (The suspicious types among us may raise their eyebrows at that claim. At this relatively late date in the GeForce FX’s lifetime, ongoing lack of support for a key DX9 feature is starting to look like a potential hardware problem.) ShaderMark’s author told me he expected support for FP textures in the 50-series drivers, but it hasn’t happened as of version 52.16. Anyhow, you’ll see some blank scores below as a result. Even with the new compiler and NV3x-specific precision hints and tweaks, the 5950 Ultra looks to be only about 70% the speed of the Radeon 9800 XT for pixel shader programssometimes more, sometimes less. This is why I said that ATI probably made the smarter choice in designing the R300-series pixel shaders as it did; it achieves higher peak FP pixel shader performace. ShaderMark fills the screen with nearly nothing but Pixel Shader 2.0 output, and the Radeon 9800 XT comes out looking quite a bit faster than the GeForce FX 5950 Ultra. Real-time high-dynamic-range lighting effects I probably shouldn’t include this little demo, because it doesn’t look right on the GeForce FX entirely. Again, I think support for floating-point texture formats is the problem. Still, this thing looks so very cool, and the effects in it are already making their way into games. Here again, the Radeon 9800 XT seems more at home.