Well. The run-up to the release of this here graphics card has certainly been unusual. AMD revealed a bunch of details about the Radeon R9 290X and the new “Hawaii” chip on which it’s based at a press event one month ago. As a result, most folks have had a pretty good idea what to expect for a while. Today, the 290X should be up for sale at online retailers, and it’s finally time for us to review this puppy. Let’s have a look, shall we?

Say aloha to Hawaii

Not only is the Radeon R9 290X a beefy graphics card intended to compete with the likes of the GeForce GTX 780, but it’s also something else: the platform for a truly new chip, with updated technology inside. Most of the rest of the cards in the Radeon R7 and R9 series introduced recently are renamed and slightly tweaked cards based on existing silicon. Not so here. The Hawaii GPU that powers the 290X represents the next generation of GPU technology from AMD, with a number of incremental improvements over the last gen.

ROP pixels/ clock Texels filtered/ clock (int/fp16) Shader processors Rasterized triangles/ clock Memory interface width (bits) Estimated transistor count (Millions) Die

size (mm²) Fabrication process node GK104 32 128/128 1536 4 256 3500 294 28 nm GK110 48 240/240 2880 5 384 7100 551 28 nm Cypress 32 80/40 1600 1 256 2150 334 40 nm Cayman 32 96/48 1536 2 256 2640 389 40 nm Tahiti 32 128/64 2048 2 384 4310 365 28 nm Hawaii 64 176/88 2816 4 512 6200 438 28 nm

The main thing Hawaii is, though, is bigger: larger than the Tahiti chip in the Radeon HD 7970 (and R9 280X), with more of everything that matters. As you can see in the table above, Hawaii has very high counts of every key graphics resource. In fact, Hawaii matches up well on paper against the GK110 chip that drives the GeForce GTX 780 and Titan, even though it’s over 100 mm² smaller in terms of die area—and both of those GPUs are manufactured on the same 28-nm fab process at TSMC.

At its core, Hawaii is based on familiar tech: the Graphics Core Next architecture first introduced in the Radeon HD 7000 series. However, this is the next iteration of GCN, with some minor tweaks to the compute units and larger changes elsewhere. Also, AMD has overhauled the layout in this GPU in order to ensure the right performance balance at its larger scale.

The chip’s graphics processing resources are broken down into four separate “shader engines,” each one almost an independent GPU unto itself. Graphics tasks are load balanced between the four engines. Each shader engine has its own geometry processor and rasterizer, effectively doubling the primitive rasterization rate versus Tahiti. That upgrade should improve performance when there are more polygons onscreen, particularly with higher levels of tessellation. In addition, the geometry units have been tweaked to improve data flow, which makes sense. By all accounts, the geometry amplification that happens during tessellation remains a hard problem for GPUs to handle.

If you’ve been following these things for even a little while, looking at these shader engines will make you feel old. Each one of them has four render back ends capable of blending and outputting 16 pixels per clock cycle. That was pretty much a whole GPU’s worth of pixel fill and antialiasing power back in the day. And by “the day,” I mean two weeks ago, when we reviewed the Radeon R7 260X. Each shader engine also has 11 of the GCN compute units that give the GPU its number-crunching power. Every CU has four 16-wide vector math units. That works out to 704 “shader processors” per engine, again almost enough to match the scale of a mid-range GPU.

Now, multiply everything in the above paragraph by four, and you’ve got Hawaii, with a total of 2816 shader processors and 64 pixels per clock of ROP power. At an even 1GHz, Hawaii is capable of 5.6 teraflops of single-precision compute, making it easily the new leader in consumer graphics chips. For compute-focused applications, it can handle double-precision floating-point math at one quarter that rate, still well in excess of a teraflop.

All of this computing power is backed by a 1MB L2 cache. This cache is fully read/write capable and is divided into 16 partitions of 64KB each. The L2’s capacity is a third larger than Tahiti’s 768KB L2, and AMD says bandwidth is up by a third, as well. The firm claims Hawaii’s L1 and L2 caches can exchange data as fast as one terabyte per second, which is more than I can say for my USB 3.0 drive dock.

Oddly enough, the most intriguing thing about Hawaii’s basic architecture may be a fairly straightforward engineering tradeoff. The chip has eight 64-bit memory interfaces onboard, giving it, effectively, a 512-bit-wide path to memory. In order to make that wide memory path practical while keeping the chip size in check, the Hawaii team chose to exchange the complex memory PHYs in Tahiti for smaller, simpler ones. Complex PHYs, or physical interface devices, are necessary to drive GDDR5 DRAMs at peak clock frequencies, but they also eat up silicon space. AMD claims Hawaii’s 512-bit memory interface occupies 20% less die area than Tahiti’s 384-bit interface. As a result, Hawaii’s memory operates at lower speeds. The 290X’s GDDR5 runs at 5 GT/s, down from 6 GT/s for Tahiti-based cards like the Radeon HD 7970 GHz Edition. Still, overall memory bandwidth is up from 288 GB/s on Tahiti to 320 GB/s with Hawaii, thanks the wider data path.

Of course, Hawaii’s advantage of this front extends beyond Tahiti. Nvidia chose a 384-bit interface and 6 GT/s memory rates for its competing GK110 chip, too.

The Radeon R9 290X

The end product of all of this silicon wizardry is the Radeon R9 290X, which has a single Hawaii GPU clocked at 1GHz and 4GB of GDDR5 memory running at 5 GT/s. Here’s how it stacks up, at least in theory, versus the competition.

Peak pixel fill rate (Gpixels/s) Peak bilinear filtering int8/fp16 (Gtexels/s) Peak shader arithmetic rate (tflops) Peak rasterization rate (Gtris/s) Memory

bandwidth

(GB/s) Radeon HD

5870 27 68/34 2.7 0.9 154 Radeon HD

6970 28 85/43 2.7 1.8 176 Radeon HD

7970 30 118/59 3.8 1.9 264 Radeon

R9 280X 32 128/64 4.1 2.0 288 Radeon

R9 290X 64 176/88 5.6 4.0 320 GeForce GTX 770 35 139/139 3.3 4.3 224 GeForce GTX 780 43 173/173 4.2 3.6 or 4.5 288 GeForce GTX

Titan 42 196/196 4.7 4.4 288

The R9 290X leads the pack by a mile in several key graphics rates, including ROP pixel rate, shader arithmetic, and memory bandwidth; it trails the GTX Titan slightly in texture filtering and primitive rasterization rates, and it essentially ties the GTX 780 in those same categories. The 290X’s combination of a killer ROP rate and gobs of memory bandwidth should make it particularly well suited for multi-monitor and 4K resolutions, especially when combined with high levels of multisampled antialiasing. Surely that’s the sort of target that Hawaii’s architects had in mind.

These cards should be available for purchase today at online retailers for the low, low price of $549.99. That may sound like a lot, but it’s a hundred bucks less than the sticker on a GeForce GTX 780. Unlike many of AMD’s recent graphics cards, the 290X starts life without any sort of game bundle. I guess you can pick the games you want with that $100 savings.

You can see in the pictures that the 290X requires two aux power inputs, one six-pin and one eight-pin. The 290X’s circuit board is 10.5″ long, bog standard for this class of graphics card and a match for the GTX 780 and Titan. However, its plastic cooling shroud extends a little beyond the PCB, bringing the total length to just under 11″. Strangely enough, AMD hasn’t disclosed a power spec for the R9 290X, but the card’s connector config dictates a max power draw of 300W, so long as AMD has honored the PCI Express power limits.

Additional goodness: TrueAudio, displays, and XDMA

AMD has built several new technologies into its Hawaii chip, and some are complex enough I can’t do them justice in the time I have to finish this review. Only two of AMD’s current graphics chips, Hawaii and Bonaire, have these next-level capabilities built in. I suspect we may see more GPUs from this same family in the coming months.

The most notable of the new features is probably the TrueAudio DSP block for accelerated processing of sound effects. There’s much to be said on this subject, and I intend to address TrueAudio in more detail in a separate article shortly. For now, you might want to check out my live blog from the GPU14 event for some additional details on this feature. We don’t yet have any software to take advantage of the TrueAudio hardware, but I suspect we’ll spend quite a bit of time with TrueAudio once the first games that support it arrive.

AMD has also freshened up the display block in its latest GPUs. You can see the connector payload above. Both of the DVI ports are dual-link, and the DisplayPort output is omni-capable: it supports multi-stream transport (MST) and can sustain the pixel rates needed to drive a single-tile 4K display at 60Hz, once such mythical beasts become available. Furthermore, Radeons will support the DisplayID 1.3 standard, written by an AMD engineer, that allows for auto-configuration of tiled 4K displays—provided those displays also support this standard. The current 4K monitors from Sharp and Asus do not, but AMD intends to recognize those panels and take care of them automagically in its drivers.

Perhaps the biggest change here is the elimination of the requirement that multi-monitor Eyefinity configs include at least one DisplayPort connection. With the 260X and 290X, users can finally connect three monitors via the HDMI and DVI links alone. Huzzah.

You may have noticed the distinct lack of CrossFire connectors on the 290X. There are, uh, vestiges where the “golden fingers” connectors ought to be, but no actual fingers. That’s because AMD has replaced the CrossFire bridge connector with a new solution called XDMA. Rather than pass data from GPU to GPU over a dedicated bridge, XDMA incorporates a direct memory access (DMA) engine into the CrossFire image compositing block. This DMA facility can transfer data directly from GPU to GPU via PCI Express, without a detour into system memory.

XDMA is purportedly compatible with AMD’s frame pacing tech, which reduces the micro-stuttering problems associated with multi-GPU teaming. Even more importantly, the firm claims XDMA can handle resolutions above four megapixels, including Eyefinity multi-display configs and 4K monitors. Since current CrossFire configs have serious problems with such setups, this new data sharing method could bring a very notable improvement.

AMD insists XDMA carries no performance penalty compared to a dedicated CrossFire bridge. They make a good argument when they point out that the situation can’t get much worse than it is now with CrossFire at resolutions above four megapixels. The Radeon driver software shifts frame data from the secondary GPU into system memory and then to the primary GPU. The end result? At 4K resolutions, the transfers are too slow, and pretty much every other frame is dropped completely, never to be displayed. Quicker, more direct GPU-to-GPU data transfers can only help.

The firm is confident that lower-bandwidth CrossFire configs aren’t any worse off without the dedicated bridge, either. In fact, they wanted to be sure before deciding to go with XDMA as their only solution, which is why 290X boards retain those phantom fingers. Early boards included bridge connectors for comparative testing. Once AMD was convinced the solution was solid, the fingers were, uh, snipped off.

So how well does XDMA work? We’re dying to try it, and we’ve asked AMD for a second 290X card explicitly for the purpose of testing CrossFire with XDMA at 4K resolutions, but we don’t yet have a second card. We’re hoping to rectify that problem shortly.

We have lots of questions about what sort of PCI Express configurations will prove to be suitable for high-resolution CrossFire configs. XDMA seems well-suited for systems based on Intel’s X79 chipset, with 16 lanes of PCIe 3.0 bandwidth running to two expansions slots, or for dual-GPU cards like the Radeon HD 7990 with PCIe switch chips oboard. Those are notable configs for high-end CrossFire setups. But will XMDA play well in systems with less bandwidth, like those based on Haswell or Richland processors with dual x8 PCIe links? Or systems with higher PCIe latency, like AMD’s 990FX platform? What happens with three- and four-card setups? We’ll have to push the limits in order to find out.

PowerTune gets smarter

The one other new feature in AMD’s latest GPUs is a smarter version of the PowerTune dynamic power management mechanism. This revised PowerTune is made possible by some enabling hardware: an interface to the card’s voltage regulator known as SVI2. SVI2 is built into several recent AMD chips, including Hawaii, Bonaire, and the Socket FM2 APUs. It allows these chips to gather real-time voltage and current feedback very quickly—the sampling rate is 40KHz, and the interface has a data rate of 20Mbps. The SVI2 interface also enables fast and fine-grained control over the power coming into the chip. AMD says it can make voltage switches in about 10 microseconds in steps as small as 6.25 mV, and the interface allows for multiple voltage domains per VR controller.

Armed with faster instrumentation and control, the new PowerTune is able to pursue the best balance of power consumption, temperature, performance, and fan speed available within its defined limits. The algorithm behind it all has evidently grown pretty complex. For instance, the 290X knows better than to crank up fan speeds in simple steps, because doing so can be acoustically jarring. Instead, it ramps up fan speeds gradually in order to maintain a small perceptual footprint.

Also, somewhat like Nvidia’s GPU Boost 2.0 algorithm built into its GTX 700-series cards, one of PowerTune’s key parameters is the card’s GPU temperature limit. The algorithm will seek to maximize performance without letting the GPU’s temperature exceed the programmed peak. Functionally, that means the GPU clock will vary somewhat in response to different workloads and variance in ambient temperatures. In that way, it’s no different than the latest GeForces or most desktop CPUs. The performance of our biggest, fastest chips hasn’t been entirely deterministic for a while now.

The difference with the Radeon R9 290X is one of degree—or degrees. You see, the default temperature limit for the 290X is a steamy 95°C, which will definitely keep your toes toasty on a cool fall evening. Meanwhile, the card’s peak fan speed is 40% of its max potential. AMD is pushing the GPU pretty hard and asking PowerTune to keep things in check. Practically speaking, that means GPU temperatures generally remain pretty steady at nearly 95°C after a few minutes of gameplay—and GPU clock speeds vary more widely than we’ve seen in other graphics cards. Although the 290X can operate at its advertised “Boost” clock speed of 1GHz, it will often dip below that frequency when a game is running.

If you’d like to trade acoustics for performance, the 290X offers an “uber” mode where the fan speed limit is 55%. Just flip the little DIP switch next to the, er, missing CrossFire connector and reboot to get there. The 290X is running close enough to the edge that this adjustment to the fan speed profile can have a clearly measurable impact on performance. We’ve tested the 290X at both settings, so you can see the difference the “uber” switch had on our open-air test rig.

For a quick illustration, here’s a look at the data from our power and noise testing session in Skyrim, as logged by GPU-Z. The time represented on the graph is our warm-up period of approximately four minutes.

The 290X stays at the 1GHz Boost clock for most of the period, but near the end, with the default fan mode, its clock speeds begin to fluctuate, ranging as low as 943MHz for a moment. With the higher fan speed in uber mode, the 290X’s frequency mostly stays put, with only a brief drop to around 970MHz.

Long-time readers, set your OCD ticks at the ready, because performance testing just got a little more complicated. I conducted my first round of tests on the 290X before realizing how much impact GPU temperatures could have on performance. Going back over the data later, I saw that, in some games, the 290X’s performance dropped slightly with each successive test run. For example, in Tomb Raider, we saw FPS averages of 46.8, 45.3, and 44.1. We typically report the median score, which is 45.3 in this case.

To better understand the 290X’s behavior, I improvised a quick test over a longer time window. I ran our Tomb Raider test sequence twice starting with a cold card and then twice more at successive four-minute intervals. Here’s what I saw.

First run After

4 minutes After

8 minutes After

12 minutes FPS 47 44 44 44

At least in this case, the 290X looks to be pretty quick to reach its max temperature, and the card’s performance doesn’t change too much after it gets there.

Still curious about GPU speeds over the long run, I fired back up our Skyrim load test and used GPU-Z to log clock frequencies over a period of about 30 minutes. I’d graph it for you, but, well, the window was open in Damage Labs during that time—and the 290X remained rock steady at 1GHz throughout. Evidently, ambient temperatures have a pretty big impact on the 290X’s behavior. We’ll have to mull over how this information should affect our testing procedures going forward. We may need to raise the number of testing sessions back to five per card (which is better anyhow), institute some sort of warm-up period prior to testing, or install stricter climate controls in Damage Labs. Hmmm.

For tinkerers, AMD exposes quite a bit of control over the 290X’s PowerTune settings in the Overdrive section of the Catalyst Control Center. The card doesn’t have any additional thermal headroom available, but you can push even harder on the fan speeds, power limits, and max clock frequencies, if you wish. I doubt those tweaks will net much additional performance without a more robust cooling solution.

Test notes

To generate the performance results you’re about to see, we captured and analyzed the rendering times of every single frame of animation during each test run. For an intro to our frame-time-based testing methods and an explanation of why they’re helpful, you can start here. Please note that, for this review, we’re only reporting results from the FCAT tools developed by Nvidia. We usually also report results from Fraps, since both tools are needed to capture a full picture of animation smoothness. However, testing with both tools can be time-consuming, and our window for work on this review was fairly small. We think sharing just the data from FCAT should suffice for now.

Our testing methods

As ever, we did our best to deliver clean benchmark numbers. Our test systems were configured like so:

Processor Core i7-3820 Motherboard Gigabyte

X79-UD3 Chipset Intel X79

Express Memory size 16GB (4 DIMMs) Memory type Corsair

Vengeance CMZ16GX3M4X1600C9

DDR3 SDRAM at 1600MHz Memory timings 9-9-9-24

1T Chipset drivers INF update

9.2.3.1023 Rapid Storage Technology Enterprise 3.5.1.1009 Audio Integrated

X79/ALC898 with Realtek 6.0.1.6662 drivers Hard drive OCZ

Deneva 2 240GB SATA Power supply Corsair

AX850 OS Windows 7

Service Pack 1

Driver

revision GPU

base core clock (MHz) GPU

boost clock (MHz) Memory clock (MHz) Memory size (MB) GeForce GTX 660 GeForce

331.40 beta 980 1033 1502 2048 GeForce GTX 760 GeForce

331.40 beta 980 1033 1502 2048 GeForce GTX 770 GeForce

331.40 beta 1046 1085 1753 2048 GeForce GTX 780 GeForce

331.40 beta 863 902 1502 3072 GeForce GTX Titan GeForce

331.40 beta 837 876 1502 6144 Radeon

HD 5870 Catalyst

13.11 beta 850 – 1200 2048 Radeon

HD 6970 Catalyst

13.11 beta 890 – 1375 2048 Radeon

R9 270X Catalyst

13.11 beta ? 1050 1400 2048 Radeon

R9 280X Catalyst

13.11 beta ? 1000 1500 3072 Radeon

R9 290X Catalyst

13.11 beta 5 ? 1000 1250 4096

Thanks to Intel, Corsair, Gigabyte, and OCZ for helping to outfit our test rigs with some of the finest hardware available. AMD, Nvidia, and the makers of the various products supplied the graphics cards for testing, as well.

Also, our FCAT video capture and analysis rig has some pretty demanding storage requirements. For it, Corsair has provided four 256GB Neutron SSDs, which we’ve assembled into a RAID 0 array for our primary capture storage device. When that array fills up, we copy the captured videos to our RAID 1 array, comprised of a pair of 4TB Black hard drives provided by WD.

Unless otherwise specified, image quality settings for the graphics cards were left at the control panel defaults. Vertical refresh sync (vsync) was disabled for all tests.

In addition to the games, we used the following test applications:

The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Texture filtering

We’ll begin with a series of synthetic tests aimed at exposing the true, delivered throughput of the GPUs. In each instance, we’ve included a table with the relevant theoretical rates for each solution, for reference.

Peak pixel fill rate (Gpixels/s) Peak bilinear filtering int8/fp16 (Gtexels/s) Memory

bandwidth

(GB/s) Radeon HD

5870 27 68/34 154 Radeon HD

6970 28 85/43 176 Radeon HD

7970 30 118/59 264 Radeon

R9 280X 32 128/64 288 Radeon

R9 290X 64 176/88 320 GeForce GTX 770 35 139/139 224 GeForce GTX 780 43 173/173 288 GeForce GTX

Titan 42 196/196 288

Although the 290X has, in theory, much higher fill capacity than the Titan, this test tends to be limited more by memory bandwidth than anything else. None of the GPUs achieve anything close to their peak theoretical rates. The 290X’s additional ROP power will more likely show up in games using multisampled anti-aliasing.

The back-and-forth here is kind of intriguing. 3DMark’s texture fill test isn’t filtered, so it’s just measuring pure texture sample rates, and the Titan manages to outperform the 290X in that test. The results from the Beyond3D test tool are bilinearly filtered, and in the first of these, the 290X takes the top spot.

Once we get into higher-precision texture formats, a major architectural difference comes into play. Hawaii and the other Radeons can only filter FP16 texture formats at half the usual rate. Even the GK104-based GTX 770 is faster than the 290X with FP16 and FP32 filtering.

In all cases, though, the 290X offers a nice increase over the Radeon R9 280X—which is just a re-branded Radeon HD 7970 GHz Edition, essentially.

Tessellation and geometry throughput

Peak rasterization rate (Gtris/s) Memory

bandwidth

(GB/s) Radeon HD

5870 0.9 154 Radeon HD

6970 1.8 176 Radeon HD

7970 1.9 264 Radeon

R9 280X 2.0 288 Radeon

R9 290X 4.0 320 GeForce GTX 770 4.3 224 GeForce GTX 780 3.6 or 4.5 288 GeForce GTX

Titan 4.4 288

I’m not sure what to make of these results. I expected to see some nice gains out of the 290X thanks to its higher rasterization rates, but the benefits are only evident in TessMark’s x16 subdivision mode and with our low-res/extreme tessellation scenario in Unigine Heaven.

A couple of potential explanations come to mind. One, TessMark uses OpenGL, and it’s possible AMD hasn’t updated its OpenGL drivers to take full advantage of Hawaii’s quad geometry engines. Two, the drivers could be fine, and we could be seeing an architectural limitation of the Hawaii chip. As I noted earlier, large amounts of geometry amplification tend to cause data flow problems. It’s possible the 290X is hitting some internal bandwidth barrier at the x32 and x64 tessellation levels that’s common to GCN-based architectures. I’ve asked AMD to comment on these results but haven’t heard back yet. I’ll update this text if I find out more.

Shader performance

Peak shader arithmetic rate (tflops) Memory

bandwidth

(GB/s) Radeon HD

5870 2.7 154 Radeon HD

6970 2.7 176 Radeon HD

7970 3.8 264 Radeon

R9 280X 4.1 288 Radeon

R9 290X 5.6 320 GeForce GTX 770 3.3 224 GeForce GTX 780 4.2 288 GeForce GTX

Titan 4.7 288

Welp. This one’s unambiguous. That massive GCN shader array is not to be denied. The 290X wins each and every shader test, sometimes by wide margins.

Now, let’s see how these things translate into in-game performance.

Crysis 3





Click through the buttons above to see frame-by-frame results from a single test run for each of the graphics cards. You can see how there are occasional spikes on each of the cards. They tend to happen at the very beginning of each test run and a couple of times later when I’m exploding dudes with dynamite arrows.

You can see from the raw plots that the 290X looks good, with more frames produced and generally lower frame rendering times than anything else we tested. Every card encounters a few slowdowns, and the spikes on the 290X aren’t anything exceptional.

The traditional FPS average and our frame-latency-focused companion, the 99th percentile frame rendering time, pretty much agree here. That’s a good indication that none of the graphics cards are encountering any weird issues. When they don’t agree, as sometimes happens, bad things are afoot. What they agree on is simple enough: the 290X is the fastest graphics card in this test. The uber fan mode doesn’t seem to make much difference here.





We can get a broader sense of the frame time distribution by looking at the tail end of the curve. In this case, both brands of GPUs, faster and slower models, all suffer from a small number of high-latency frames in the last ~2% of frames rendered. I suspect the performance problem here is at the CPU or system level, not in the graphics cards themselves, since it’s fairly consistent.





Our “badness” index concentrates on those frames that take a long time to produce. For the first two thresholds of 50 and 33 ms, the results are pretty similar among the newer GPUs, which again suggests a CPU bottleneck or the like. However, for slinging out frames 60 times per second, once every 16.7 milliseconds, the R9 290X is easily the best choice.

Far Cry 3: Blood Dragon













Click through the frame time distributions above, and you’ll see very few frames that take beyond 50 ms to render from any of the cards—even with the geezer of the group, the Radeon HD 5870. However, we can still get a sense of gaming smoothness from the numbers, and all of them point to the 290X as the top dawg in this test scenario. There are a few minor frame time spikes on the GTX 780 and Titan, although they don’t really amount to much. Still, the R9 290X is glassy smooth, especially in its uber fan mode, which makes a real difference here.

GRID 2





This looks like the same CodeMasters engine we’ve seen in a string of DiRT games, back for one more round. We decided not to enable the special “forward+” lighting path developed by AMD, since the performance hit is pretty serious, inordinately so on GeForces. Other than that, though, we have nearly everything cranked to the highest quality level.













Everything from the GeForce GTX 770 on up turns in a near-flawless performance here, churning out nearly each and every frame in 16.7 ms or less. Only the 290X is the very definition of flawless, though, never once missing the beat at 60Hz.

Tomb Raider

















Those averages around 40-45 FPS for the high-end cards don’t seem terribly impressive until you look at the frame time plots or our latency-focused metrics. Then you realize that the fastest cards never once produce a frame in more than 33 ms. That’s a steady 30 FPS or better for each of them. Of course, the 290X again leads the pack.

Guild Wars 2













This is an odd one, because the faster cards tend to have some minor frame time spikes to about 30 ms. You can see it in the plots. The GTX 780 and Titan suffer the most, but the 290X also participates in this problem. Seems like the cards with the highest frame rates are the most affected.

Nevertheless, the GTX 780 just edges out the R9 290X across multiple metrics for a rare outright performance win.

Power consumption

The Radeons have a unique capability called ZeroCore power that allows them to spin down all of their fans and drop into a very low-power state whenever the display goes into power-save mode. That’s why they tend to draw less power with the display off.

Please note that our load test isn’t an absolute peak scenario. Instead, we have the cards running a real game, Skyrim, in order to show us power draw with a more typical workload.

The 290X’s power draw under load is… considerable at roughly 40W more than the GTX 780. The card’s cooler will have more heat to expel as a result.

Noise levels and GPU temperatures

Remember that 95°C PowerTune limit? Yeah, the 290X runs right up against it with either fan profile. AMD calls the card’s default fan profile “quiet” mode and the more aggressive 55% profile “uber” mode. You can see why I’ve resisted calling the default profile “quiet.” The 290X ain’t exactly that.

Switching the fan to uber mode pushes the 290X past 50 dBA, which is somewhere near my personal threshold of true annoyance. Premium graphics cards have been making strides toward good acoustic citizenship in recent years, and we lauded the Radeon HD 7990 for furthering that trend. The 290X sadly loses ground on this front. Yes, it’s possible the tweak PowerTune with a lower fan speed threshold, but you’re sure to lose performance if you do so.

As for the 290X’s penchant for blowtorch-like temperatures, well, AMD has definitely chosen a more aggressive tuning point than the 80°C GPU Boost target on the GTX 780 and Titan. All things considered, I’d rather not lose my fingerprints when going to swap out a video card. However, I can’t bring myself to fret over GPU temperatures of 95° too much, since Nvidia chose the same target for the GTX 480 several years back. Heck, even the old GeForce 8800 GTX ran relatively hot, and I think some of those are still going strong to this day.

What AMD has done, though, is squeeze all of the thermal headroom out of this card. Don’t expect much overclocking success on a 290X with the stock cooler.

Conclusions

Ok, you know how this goes. We’ll magically compress our test results into a couple of our famous price-performance scatter plots. The performance scores are a geometric mean of the results from all the games we tested. We’ve converted our 99th-percentile frame time results into FPS for the sake of readability. As always, the better values will be situated closer to the top left corner of the plot. The worse buys will gravitate to the bottom right of the plot. Since the Radeon HD 5870 and 6970 aren’t current products anymore, we’ve shown them at their starting prices for comparison.





Well, that was easy. The Radeon R9 290X is a bit faster than the GeForce GTX 780 and costs a hundred bucks less. Beats the Titan for nearly half the price, too. So yeah. AMD has substantially reduced the cost of graphics processing power in this category, and it has grabbed the overall performance crown from Nvidia in the process. What’s not to like about a faster-than-Titan graphics card for just over half the price?

Unfortunately, there are some pretty good answers to that question. I have to admit, I was more impressed with the Hawaii GPU’s architectural efficiency—it is a much smaller chip than the GK110, remember—before seeing the 290X’s power draw and temperature readings. Looks to me like AMD has captured the performance crown through a combination of newfangled architectural prowess and the time-honored tactic of pushing its silicon to the ragged edge. The Hawaii GPU brought the 290X to the cusp of success, but a bigger power envelope and a really aggressive PowerTune profile ensured the victory. That victory comes at a cost: a relatively noisy card, whether on the default or uber fan profiles, and positively toasty GPU temperatures. 290X owners will also see more variable clock speeds (and thus performance) than they’ve come to expect. These aren’t deal-breaker problems—not when there’s a $100 price difference versus the GTX 780 on the table—but they’re still hard to ignore.

More seriously, if you have any intention of using a Radeon R9 290X in a multi-GPU configuration at some point down the road, I’d advise you to put down the credit card and step away from the Newegg browser tab until we can test XMDA-based CrossFire thoroughly. Hopefully, we can do that soon, along with some additional single-card testing at 4K resolutions. If only we could raise the PowerTune limit on my frail human flesh, I’d have done some 4K testing already.