2014 has been a strange year for graphics chips. Many of the GeForce and Radeon graphics cards currently on the market are based on GPUs over two years old. Rather than freshening up their entire silicon lineups top-to-bottom like in the past, AMD and Nvidia have chosen to take smaller, incremental steps forward.

Both firms introduced larger chips based on existing GPU architectures last year. Then, two weeks ago, the Tonga GPU in the Radeon R9 285 surprised us with formidable new technology that’s still somewhat mysterious. Before that, this past spring, Nvidia unveiled its next-gen “Maxwell” graphics architecture on a single, small chip aboard the GeForce GTX 750 Ti. We could tell by testing that card’s GM107 GPU that Maxwell was substantially more power-efficient than the prior-gen Kepler architecture. However, no larger Maxwell-based chips were forthcoming.

Until today, that is.

At long last, a larger Maxwell derivative is here, powering a pair of new graphics cards: the flagship GeForce GTX 980 and its more affordable sibling, the GTX 970. These cards move the needle on price, performance, and power efficiency like only a new generation of technology can do.

The middle Maxwell: GM204

The chip that powers these new GeForce cards is known as the GM204. Although the Maxwell architecture is bursting with intriguing little innovations, the GM204 is really about two big things: way more pixel throughput and vastly improved energy efficiency. Most of what you need to know about this chip boils down to those two things—and how they translate into real-world performance.

Here’s a look at the basic specs of the GM204 versus some notable contemporaries:

ROP pixels/ clock Texels filtered/ clock (int/fp16) Shader processors Rasterized triangles/ clock Memory interface width (bits) Estimated transistor count (Millions) Die

size (mm²) Fab process GK104 32 128/128 1536 4 256 3500 294 28 nm GK110 48 240/240 2880 5 384 7100 551 28 nm GM204 64 128/128 2048 4 256 5200 416 (398) 28 nm Tahiti 32 128/64 2048 2 384 4310 365 28 nm Tonga 32 (48) 128/64 2048 4 256 (384) 5000 359 28 nm Hawaii 64 176/88 2816 4 512 6200 438 28 nm

Nvidia did well to focus on energy efficiency with Maxwell, because foundries like TSMC, which makes Nvidia’s GPUs, have struggled to move to smaller process geometries. Like the entire prior generation of GPUs, GM204 is built on a 28-nm process. (Although TSMC is apparently now shipping some 20-nm silicon, Nvidia tell us the 28-nm process is more cost-effective for this chip, and that assessment is consistent with what we’ve heard elsewhere.) Thus, the GM204 can’t rely on the goodness that comes from a process shrink; it has to improve performance and power efficiency by other means.

Notice that the GM204 is more of a middleweight fighter, not a heavyweight like the GK110 GPU in the GeForce GTX 780- and Titan-series cards. Nvidia considers the GM204 the successor to the GK104 chip that powers the GeForce GTX 680 and 770, and I think that’s appropriate. The GM204 and the GK104 both have a 256-bit memory interface and the same number of texture filtering units, for instance.

Size-wise, the GM204 falls somewhere in between the GK104 and the larger GK110. Where exactly is an interesting question. When I first asked, Nvidia told me it wouldn’t divulge the new chip’s die area, so I took the heatsink off of a GTX 980 card, pulled out a pair of calipers, and measured it myself. The result: almost a perfect square of 20.4 mm by 20.4 mm. That works out to 416 mm². Shortly after I had my numbers, Nvidia changed its tune and supplied its own die-size figure: 398 mm². I suppose they’re measuring differently. Make of that what you will.

The GM204’s closest competition from AMD is the new Tonga GPU that powers the Radeon R9 285. We know for a fact that not all of Tonga’s capabilities are enabled on the 285, though, and I have my own crackpot theories about how the full Tonga looks. I said in my review that I think it has a 384-bit memory interface, and after more noodling on the subject, I strongly suspect it has 48 pixels per clock of ROP throughput waiting to be enabled. Mark my words so you can mock me later if I’m wrong!

One reason I suspect Tonga has more ROPs is that it just makes sense to increase a GPU’s pixel throughput in the era of 4K and high-PPI displays. I believe the GM204’s ROPs are meant to be represented by the deep blue Chiclets™ surrounding the L2 cache in the fakey diagram above. At 64 pixels per clock, the GM204 has 50% more per-clock ROP throughput than the big Kepler GK110 chip—and double that of the GK104. That’s a sizeable commitment, an enormous increase over the previous generation, and it means the GM204 is ready to paint lots of pixels.

At 2048KB, the GM204’s L2 cache is relatively large, too. The GK104 has only a quarter of the cache, at 512KB, and even the GK110’s 1536KB L2 cache is smaller. Caches are growing by leaps and bounds in recent graphics architectures, as a means of both amplifying bandwidth and improving power efficiency (since memory access burns a lot of power.)

The larger cache is just one way Nvidia has pursued increased efficiency in the Maxwell architecture. Many of the other gains come from the new Maxwell core structure, known as the shader multiprocessor or SMM. The GM204 has a total of 16 SMMs. Each of them is broken into four “quads,” and each of those has a single 32-wide vector execution unit with its own associated control logic. Threads are still scheduled in “warps,” or groups of 32 threads, with one thread per “lane” executing sequentially on each vec32 execution unit. Nvidia says the SMM’s new structure makes scheduling tasks on Maxwell simpler and more efficient, which is one reason this architecture uses less energy per instruction than Kepler. Maxwell’s efficiency improvements come from several sources, though, and I hope to have time to explore them in more depth in a future article.

For now, let’s look at the new GeForce cards.

The GeForce GTX 970 and 980

Nvidia’s new silicon has spawned a pair of video cards, the GeForce GTX 970 and 980. Pictured above is the 980, the new high end of Nvidia’s consumer graphics card lineup (excepting the ultra-expensive Titan series.) Here’s the lowdown on the two new GeForce models:

GPU base clock (MHz) GPU boost clock (MHz) ROP pixels/ clock Texels filtered/ clock Shader pro- cessors Memory path (bits) GDDR5

transfer rate Peak power draw Intro price GTX

970 1050 1178 64 104 1664 256 7 GT/s 145W $329 GTX

980 1126 1216 64 128 2048 256 7 GT/s 165W $549

At $549, the GeForce GTX 980 ain’t cheap. What you’ll want to notice, though, is its lethal combination of clock speeds and power rating. The GTX 980’s full-fledged GM204 runs at a “boost” speed of over 1.2GHz—and that’s a typical, not peak, operating frequency in games. The card’s 4GB of GDDR5 memory runs at a nosebleed-inducing 7 GT/s, too. That’s one way to squeeze the most out of a 256-bit memory interface. Meanwhile, the GTX 980’s TDP is just 165W—well below the 250W rating of the GeForce GTX 780 Ti or the 195W rating of the previous-gen GTX 680. That’s quite a testament to the efficiency of the Maxwell architecture, especially since all of these chips are fabbed with 28-nm process tech.

Thanks to its frugal power needs, the GTX 980 requires only a pair of 6-pin aux power inputs—and it could darn near get by with just one of them. Although this card has the same familiar, aluminum-clad reference cooler as the last crop of GeForces, its port configuration is something new: a trio of DisplayPort outputs, an HDMI port, and a dual-link DVI connector. Given the ascendancy of DisplayPort for use with 4K and G-Sync monitors, this is a welcome change.

GeForce GTX 980 cards in the form you see above should be available from online retailers almost immediately, as I understand it. Nvidia had the first batch of cards produced with its reference cooler, and I expect custom designs from board makers to follow pretty quickly. Many of those are likely to be clocked higher than the reference board we have for testing.

I’ve gotta admit, though, that I’m more excited about the prospects for the GeForce GTX 970. This card has a much lower suggested starting price of $329, and rather than produce a reference design, Nvidia has left it up to board makers to create their own GTX 970 cards. Have a look at what Asus has come up with:

This is the Strix GTX 970 OC Edition, and it’s pretty swanky. The headline news here is this card’s 1114MHz base and 1253MHz boost clocks, which are quite a bit higher than what Nvidia’s reference specs call for. Heck, the boost clock is even higher than the GTX 980’s and could go a long way in making up for the loss of three SMMs in the GTX 970. Since the GTX 970 has the same 4GB of GDDR5 memory at 7 GT/s, this card’s delivered performance should be within shouting distance of the GTX 980’s. The price? Just $339.99.

Asus has tricked out the Strix with a bunch of special features, which I’d be happy to talk about if I hadn’t just received this thing literally yesterday. I have noted that the cooler’s twin fans only spin when needed; they go completely still until the GPU temperature rises above a certain level. For some classes of games—things like DOTA 2—Asus claims this card can operate completely fanlessly.

On the downside, I’m a little disappointed with the move back to dual DVI outputs and a single DisplayPort connector. I suppose the more conventional port setup will appeal to those with existing multi-monitor setups, but it may prove to be a frustrating limitation in the future.

On the, er, weird side, Asus has elected to give the Strix 970 a single aux power input of the 8-pin variety. That’s unusual, and Asus touts this config as an advantage, since it simplifies cable management. I suppose that’s true, and perhaps 8-pin power connectors are now common enough that it makes sense to use them by default. Still, I was surprised not to see a dongle in the box to convert two 6-pin connectors into an 8-pin one.

Here’s another version of the GTX 970 that just made its way into Damage Labs. The MSI GTX 970 Gaming 4G has the same clock speeds as the Strix, but its cooler is even flashier. MSI says this card will sell for $359.99. I haven’t yet managed to test this puppy completely, but we’ll follow up on it in a future article.

Nvidia is trimming its lineup to make room for these new GeForces. The firm is so confident in the Maxwell cards that it’s ending shipments of GeForce GTX 770, 780, and 780 Ti cards, effective now. Meanwhile, the GeForce GTX 760’s price is dropping to $219. That should be a pretty good clue about how the newest GeForces alter the landscape.

Sizing ’em up

Do the math involving the clock speeds and per-clock potency of the GM204 cards, and you’ll end up with a comparative table that looks something like this:

Peak pixel fill rate (Gpixels/s) Peak bilinear filtering int8/fp16 (Gtexels/s) Peak shader arithmetic rate (tflops) Peak rasterization rate (Gtris/s) Memory

bandwidth

(GB/s) Radeon

R9 285 29 103/51 3.3 3.7 176 Radeon

R9 280X 32 128/64 4.1 2.0 288 Radeon

R9 290 61 152/76 4.8 3.8 320 Radeon

R9 290X 64 176/88 5.6 4.0 320 GeForce GTX 770 35 139/139 3.3 4.3 224 GeForce GTX

780 43 173/173 4.2 4.5 288 GeForce GTX

780 Ti 45 223/223 5.3 4.6 336 GeForce GTX

970 75 123/123 3.9 4.7 224 Asus Strix GTX

970 80 130/130 4.2 5.0 224 GeForce GTX

980 78 156/156 5.0 4.9 224

The rates above aren’t destiny, but they do tend to be a pretty good indicator of how a given GPU will perform. Since the GM204 can run at higher clock speeds than the GK110, the GeForce GTX 980 is able to give even the mighty GTX 780 Ti a run for its money in terms of shader arithmetic—with a peak rate of five teraflops—and rasterization. The 980 trails a bit in the texture filtering department, but look at that pixel fill rate. Nothing we’ve seen before comes all that close.

Contrast that prowess to the GTX 980’s relatively modest memory bandwidth, which is no higher than the prior-gen GTX 770’s, and you might ask some questions about how this new balance of resources is supposed to work. The answer, it turns out, is similar to what we saw with AMD’s Tonga GPU a couple of weeks back.

Nvidia Senior VP of Hardware Engineering Jonah Alben revealed in a press briefing that Maxwell makes more effective use of its memory bandwidth by compressing rendered frames with a form of delta-based compression. (That is, checking to see whether a pixel’s color has changed from a neighboring pixel and perhaps only storing information about the amount of change.) In fact, Alben told us Nvidia GPUs have used delta-based compression since the Fermi generation. Maxwell’s compression is the third iteration. The combination of better compression and more effective caching allows Maxwell to reduce memory bandwidth use substantially compared to Kepler—from 17% to 29% in workloads based on popular games, according to Alben.

So what happens when we try 3DMark Vantage’s color fill test, which is limited by pixel fill rate and memory bandwidth, on the GTX 980?

Yeah, that works pretty darned well. The GTX 980 paints over twice as many pixels in this test as the GK104-based GTX 770, even though the two cards have the same 224 GB/s of memory bandwidth.

On paper, the GTX 980’s other big weakness looks to be texturing capacity, and in practice, the 980 samples textures at a lower rate than its competition. The GTX 970 even falls slightly behind the GTX 770 in this synthetic test, just as it does on paper. We’ll have to see how much of a limitation this weakness turns out to be in real games.

The GM204 cards have some of the highest rasterization rates in the table above, and they make good on that promise in these tests of tessellation and particle manipulation. The GTX 980 sets new highs in both cases.

In theory, the GeForce GTX 780 Ti has more flops on tap and higher memory bandwidth than the GTX 980, so it should perform best in these synthetic tests of shader performance. In reality, though, Maxwell delivers on more of its potential. Even with a big memory bandwidth handicap, the GTX 980 outperforms the GTX 780 Ti in both benchmarks. Only AMD’s big Hawaii in the Radeon R9 290X is more potent—and not by a huge margin.

Maxwell’s other innovations

In addition to the performance and efficiency gains we’ve discussed, Nvidia has built some nifty new features into Maxwell-based products. I’ve been awake for five days straight on a cocktail of pure Arabica coffee, Five Hour Energy shots, methadone, ginkgo biloba, and anti-freeze. The hallucinations are starting to get distracting, but I’ll attempt to convey some sense of the new features if I can. To make that happen, I’m resorting to an old-school TR crutch, the vaunted bulleted list of features. Here’s what else is new in Maxwell:

Something called Dynamic Super Resolution — Some of us graphics nerds have been bugging Nvidia for years about exposing supersampled antialiasing as an easy-to-access control panel option or something along those lines. They’ve finally found a way to make it happen, and they’ve taken the concept one step further. Supersampling generally involves rendering two or more samples per pixel and then combining the results in order to get a higher-quality result, and it was in use in real-time graphics as far back as the 3dfx days. Multisampled AA, which more efficiently targets only object edges, has largely supplanted it. DSR brings supersampling back by letting users select higher resolutions, via in-game menus, than their monitors can natively support. For instance, a gamer with a 1080p display could choose the most popular 4K resolution of 3840×2160, which is exactly four times the size of his display. The graphics card will then render the game at a full 3840×2160 internally and scale the output down to 1920×1080 in order to match the display. In doing so, every single pixel on the screen will have been sampled four times, producing a smoother, higher-quality result than what’s possible with any form of multisampling. DSR goes beyond traditional supersampling, though. Rather than just sample multiple times from within a pixel, it uses a 13-tap gaussian downsizing filter to produce a nice, soft result. The images it produces are likely to be a little softer and more cinematic-feeling. This filter has the advantage of being able to resize from intermediate resolutions. For instance, the user could select 2560×1440, and DSR would downsize to 1080p even though it’s not a perfect 2:1 or 4:1 fit.





Sounds good in theory, but I’ve not had the time to attach a lower-res monitor to my Maxwell cards to try it yet. (The images above come from Nvidia.) I’m sure we’ll revisit this feature in more detail later. Nvidia says DSR will begin its life as a Maxwell exclusive, but the company expects this feature to make its way to some older GeForce cards via driver updates eventually. MFAA — M-F’in’ AA? No, tragically, the name is not that epic. It’s just “multi-frame sampled antialiasing,” apparently. I think every new GPU launch requires a novel AA method, and we keep getting some interesting new attempts, so why not? MFAA seeks to achieve the quality of 4X multisampling at the performance cost of 2X multisampling. To do so, it combines several elements. The subpixel sample points vary from one pixel to the next in interleaved, screen-door fashion, and they swap every other frame. The algorithm then “borrows” samples from past frames and combines them with current samples to produce higher-quality results—that is, smoother edges. Nvidia showed a demo of this feature in action, and it does seem to work. I have questions about exactly how well it works when the camera and on-screen objects are moving rapidly, since borrowing temporally from past frames probably falls apart with too much motion. Unfortunately, Nvidia wasn’t willing to say exactly how the MFAA routine decides what samples to borrow from past frames, so it’s something of a mystery. One wonders whether it will really be any better than pretty decent methods like SMAA, which are already widely deployed in games and offer similar promises of 4X MSAA quality at 2X performance. MFAA isn’t yet enabled in Nvidia’s drivers, so we can’t test it. One plus of MFAA, once it arrives, is that it can be enabled via a simple on-off switch; it doesn’t require integration into the game engine like Nvidia’s TXAA does. More interesting than MFAA itself is the fact that Maxwell has much more flexibility with regard to AA sampling points than Kepler. On Maxwell, each pixel in a 4×4 quad can have its own unique set of subpixel sample points, and the GPU can vary those points from one frame to the next. That means Maxwell could allow for much more sophisticated pseudo-stochastic sampling methods once it’s been in the hands of Nvidia’s software engineers for more than a few weeks.

— M-F’in’ AA? No, tragically, the name is not that epic. It’s just “multi-frame sampled antialiasing,” apparently. I think every new GPU launch requires a novel AA method, and we keep getting some interesting new attempts, so why not? MFAA seeks to achieve the quality of 4X multisampling at the performance cost of 2X multisampling. To do so, it combines several elements. The subpixel sample points vary from one pixel to the next in interleaved, screen-door fashion, and they swap every other frame. The algorithm then “borrows” samples from past frames and combines them with current samples to produce higher-quality results—that is, smoother edges. Substantial new rendering capabilities for DX12 — Yes, DirectX 12 isn’t just about reducing overhead. It will have some features that require new GPU hardware, and Maxwell includes several of them. In fact, Direct3D Lead Developer Max McMullen from Microsoft delivered the news at the Maxwell press event. What’s more, a new revision of Direct3D 11, version 11.3, will also expose these same hardware features. The highlights included ROVs, typed UAV loads, volume tiled resources, and conservative rasterization. I’d like to explain more about what precisely these features are and what they do, but that will have to wait for a future article. Interestingly enough, many of these features are not present in the GM107 chip that debuted earlier this year. GM204 contains some significant new technology.

— Yes, DirectX 12 isn’t just about reducing overhead. It will have some features that require new GPU hardware, and Maxwell includes several of them. In fact, Direct3D Lead Developer Max McMullen from Microsoft delivered the news at the Maxwell press event. What’s more, a new revision of Direct3D 11, version 11.3, will also expose these same hardware features. The highlights included ROVs, typed UAV loads, volume tiled resources, and conservative rasterization. Accelerated voxel-based global illumination — There’s some overlap with the prior point and this one, but it’s worth calling out the fact that Nvidia has built hardware into Maxwell—some of which won’t be exposed via DX12—to accelerate a specific method of global illumination the company has been developing for some time. Maxwell can “multicast” incoming geometry to multiple viewports in order to facilitate the conversion of objects into a low-res series of blocks or 3D pixels known as voxels. Once the voxel grid is created, it can be used to simulate light bounces in order to create high-quality, physically correct indirect lighting. That could prove to be a huge advance for real-time graphics and gaming, and it deserves more attention than I can give it right now.

— There’s some overlap with the prior point and this one, but it’s worth calling out the fact that Nvidia has built hardware into Maxwell—some of which won’t be exposed via DX12—to accelerate a specific method of global illumination the company has been developing for some time. Maxwell can “multicast” incoming geometry to multiple viewports in order to facilitate the conversion of objects into a low-res series of blocks or 3D pixels known as voxels. Once the voxel grid is created, it can be used to simulate light bounces in order to create high-quality, physically correct indirect lighting. That could prove to be a huge advance for real-time graphics and gaming, and it deserves more attention than I can give it right now. VR Direct — This is essentially a suite of features Nvidia has been implementing in its drivers to better support virtual reality headsets like the Oculus Rift. Most of those features have to do with reducing the latency between user input (head movements, usually) and visual output (when images reflecting the input reach the screen). Nvidia has even implemented in its drivers an improved version of Carmack’s “time warp” method of repositioning a frame post-rendering. At least, that is the claim. We’ve not yet been able to try a Rift with this feature enabled.

Our testing methods

We’ve tested as many different competing video cards against the new GeForces as was practical. However, there’s no way we can test everything our readers might be using. A lot of the cards we used are renamed versions of older products with very similar or even identical specifications. Here’s a quick table that will decode some of these names for you.

Original Closest current equivalent GeForce GTX 670 GeForce GTX 760 GeForce GTX 680 GeForce GTX 770 Radeon HD 7950 Boost Radeon R9 280 Radeon HD 7970 GHz Radeon R9 280X

If you’re a GeForce GTX 680 owner, Nvidia thinks you may want to upgrade to the GTX 980 once you’ve seen what it can do. Just keep in mind that our results for the GTX 770 should almost exactly match a GTX 680’s.

Most of the numbers you’ll see on the following pages were captured with Fraps, a software tool that can record the rendering time for each frame of animation. We sometimes use a tool called FCAT to capture exactly when each frame was delivered to the display, but that’s usually not necessary in order to get good data with single-GPU setups. We have, however, filtered our Fraps results using a three-frame moving average. This filter should account for the effect of the three-frame submission queue in Direct3D. If you see a frame time spike in our results, it’s likely a delay that would affect when the frame reaches the display.

We didn’t use Fraps with BF4. Instead, we captured frame times directly from the game engine itself using BF4‘s built-in tools. We didn’t use our low-pass filter on those results.

As ever, we did our best to deliver clean benchmark numbers. Our test systems were configured like so:

Processor Core i7-3820 Motherboard Gigabyte

X79-UD3 Chipset Intel X79

Express Memory size 16GB (4 DIMMs) Memory type Corsair

Vengeance CMZ16GX3M4X1600C9

DDR3 SDRAM at 1600MHz Memory timings 9-9-9-24

1T Chipset drivers INF update

9.2.3.1023 Rapid Storage Technology Enterprise 3.6.0.1093 Audio Integrated

X79/ALC898 with Realtek 6.0.1.7071 drivers Hard drive Kingston

HyperX 480GB SATA Power supply Corsair

AX850 OS Windows

8.1 Pro

Driver

revision GPU

base core clock (MHz) GPU

boost clock (MHz) Memory clock (MHz) Memory size (GB) Radeon

HD 7950 Boost Catalyst 14.7 beta

2 – 925 1250 3072 Radeon

R9 285 Catalyst 14.7 beta

2 – 973 1375 2048 XFX Radeon

R9 280X Catalyst 14.7 beta

2 – 1000 1500 3072 Radeon

R9 290 Catalyst 14.7 beta

2 – 947 1250 4096 XFX Radeon

R9 290X Catalyst 14.7 beta

2 – 1000 1250 4096 GeForce

GTX 760 GeForce

340.52 980 1033 1502 2048 GeForce

GTX 770 GeForce

340.52 1046 1085 1753 2048 GeForce

GTX 780 GeForce

340.52 863 902 1502 3072 GeForce

GTX 780 Ti GeForce

340.52 876 928 1750 3072 Asus Strix

GTX 970 GeForce

344.07 1114 1253 1753 4096 GeForce

GTX 980 GeForce

344.07 1127 1216 1753 4096

Thanks to Intel, Corsair, Kingston, and Gigabyte for helping to outfit our test rigs with some of the finest hardware available. AMD, Nvidia, and the makers of the various products supplied the graphics cards for testing, as well.

Also, our FCAT video capture and analysis rig has some pretty demanding storage requirements. For it, Corsair has provided four 256GB Neutron SSDs, which we’ve assembled into a RAID 0 array for our primary capture storage device. When that array fills up, we copy the captured videos to our RAID 1 array, comprised of a pair of 4TB Black hard drives provided by WD.

Unless otherwise specified, image quality settings for the graphics cards were left at the control panel defaults. Vertical refresh sync (vsync) was disabled for all tests.

The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Thief

For this first test, I decided to use Thief‘s built-in automated benchmark, since we can’t measure performance with AMD’s Mantle API using Fraps. Unfortunately, this benchmark is pretty simplistic, with only FPS average(as well as a maximum and minimum numbers, for all that’s worth.)

Welp, this is a pretty nice start for the GTX 980. Nvidia’s newest is faster than anything else we tested at both resolutions, and the GTX 970 isn’t far behind. The fastest single-GPU Radeon, the R9 290X, can’t quite keep up, even when using AMD’s proprietary Mantle API.

The generational increase from the GK104-based GTX 770 to the GM204-based GTX 980 is enormous.

Watch Dogs





Click the buttons above to cycle through the plots. Each card’s frame times are from one of the three test runs we conducted for that card. Most of these cards run Watch_Dogs pretty well at these settings, with no major spikes in frame times.

The GTX 980 is the overall champ in the FPS average sweeps, and it backs that victory up by taking the top spot in our 99th percentile frame time metric. That means in-game animations should be generally smooth, not just a collection of high frame rates punctuated by slowdowns. Amazingly, even the Asus GTX 970 outperforms the GTX 780 Ti.





We can better understand in-game animation fluidity by looking at the “tail” of the frame time distribution for each card, which shows us what happens in the most difficult frames. As you can see, the GTX 970 and 980 perform well right up to the last few percentage points worth of frames.





These “time spent beyond X” graphs are meant to show “badness,” those instances where animation my be less than fluid—or at least less than perfect. The 50-ms threshold is the most notable one, since it corresponds to a 20-FPS average. We figure if you’re not rendering any faster than 20 FPS, even for a moment, then the user is likely to perceive a slowdown. 33 ms correlates to 30 FPS or a 30Hz refresh rate. Go beyond that with vsync on, and you’re into the bad voodoo of quantization slowdowns. And 16.7 ms correlates to 60 FPS, that golden mark that we’d like to achieve (or surpass) for each and every frame.

The GTX 980 almost stays entirely below the 16.7-ms threshold here, which means it’s not far from perfectly matching a 60Hz monitor’s desire for a new frame every refresh interval. When you slice it this way, the GTX 980’s lead over the competition looks even larger.

Overall, this is a nice set of results in that the frame-time-based metrics all seem to correspond with the FPS average. None of the cards are exhibiting the sort of bad behavior that our time-sensitive metrics would highlight. That said, the new GeForces perform very well, and the $339 Asus Strix GTX 970 very nearly matches the performance of AMD’s fastest single-GPU product, the Radeon R9 290X.

Crysis 3





A look at the frame time plots will show you that the Radeons encounter a couple of slowdowns during our test session. By the seat of my pants, I know that’s the spot where I’m shooting exploding arrows at the bad guys like one of the Duke boys. Nvidia cards used to slow down similarly at this same spot, but a driver update earlier this year eliminated that problem. As a result, GeForce cards take the top three places in our 99th percentile frame time metric, despite the R9 290X having the second-fastest FPS average.





Those slowdowns on the Radeons are evident in the last two to three percentage points worth of frames.





Our “badess” metric captures the difference most dramatically. The GeForces spend no time beyond our 50-ms cutoff and very little above the 33-ms mark, while the Radeons spend many tens of milliseconds waiting for those long-latency frames.

Battlefield 4

We tested those last few games at 2560×1440. Let’s switch to 4K and see how the new GeForces handle that.





I have to admit, I was barely able to play through our test sequence on the slower cards here. The Radeon R9 285’s Tonga devil magic didn’t do much for it at 4K in BF4 at its Ultra settings.









In fact, none of the cards handle this scenario particularly well. The GTX 980 is the least objectionable, followed by the GTX 970, which is a testament to these solutions’ pixel-painting prowess. You’d probably want to double-up on video cards or reduce the image quality settings in order to play this game at 4K for any length of time.

Tomb Raider













Although the GTX 980 has a slight edge in terms of average FPS, the Radeon R9 290X performs a bit better overall when running Tomb Raider at 4K according to our time-sensitive metrics. This isn’t the sort of difference one would tend to perceive, though—and even the fastest cards would be struggling to provide a frame to a 60Hz display on every other refresh cycle. In order words, this ain’t the smoothest animation.

Borderlands 2













Borderlands 2 isn’t an especially challenging game for GPUs of this class to handle at 2560×1440, but I’d hoped we could learn something interesting here—and I think we have. Nearly every card has a 99th percentile frame time below 16.7 ms, which means all but the last 1% of frames is produced at a silky-smooth 60 cycles per second. The GeForces struggle just a little more than the Radeons in that last 1% of frames, though, as indicated by our “badness” metric.

Also, notice that I’ve added a new wrinkle to our “badness” results for this review: time spent beyond the 8.3-ms threshold. If you can stay below that threshold, you can pump out frames at 120Hz—perfect for a fast gaming display. Going with a fast graphics card does help considerably on this front, and the GTX 780 Ti gets the closest to achieving that goal.

A caveat: when you’re looking at frame times in such tiny intervals, things like CPU bottlenecks and run-to-run variations tend to play a larger role than they otherwise would. I think we may need to conduct more than three runs per card in order to get really reliable results on this front.

Power consumption

Please note that our “under load” tests aren’t conducted in an absolute peak scenario. Instead, we have the cards running a real game, Crysis 3, in order to show us power draw with a more typical workload.

We already know the GeForce GTX 980 generally outperforms the GTX 780 Ti and the Radeon R9 290X. Now we know that it does so while consuming substantially less power—virtually no more than the prior-gen GK104 GPU does aboard the GeForce GTX 770. That’s a remarkable achievement.

Noise levels and GPU temperatures

Don’t get too hung up on the noise levels at idle (and with the display off) here. The noise floor in my basement lab tends to vary a bit depending on factors I can’t quite pinpoint. I think the speed of the CPU fan may be the biggest culprit, but I’m not sure why it tends to vary.

Anyhow, the bottom line is that all of these cards are pretty quiet at idle. The only one that seems really different to my ears is the GTX 760, whose cooler is cheap and kind of whiny. The fact that the Asus Strix card completely stops its cooler at idle is most excellent and would count for more in an utterly silent environment.

The Nvidia reference cooler on the GTX 980 performs well here, almost exactly like it does on the GTX 770. Meanwhile, the dual-fan cooler on the Asus Strix GTX 970 has just as much heat to remove as Nvidia’s stock cooler—our power results tell us that—but it does so while making less noise and keeping its GPU at a much cooler temperature.

Conclusions

As usual, we’ll sum up our test results with a couple of value scatter plots. The best values tend toward the upper left corner of each plot, where performance is higher and prices are lower.





By either measure, the GeForce GTX 980 is the fastest single-GPU graphics card we tested. With the possible exception of the pricey Titan Black, it’s also the fastest single-GPU graphics card on the planet. Although $549 is a lot to pay, the GTX 980 manages to deliver appreciably better value than the GeForce GTX 780 Ti, which it replaces. The new GeForce flagship outperforms AMD’s Radeon R9 290X, as well. If you want the best, the GTX 980 is the card to get.

That’s an assessment based on price and performance, but we know from the preceding pages that the GTX 980’s other attributes are positive, too. The GM204 GPU’s power efficiency is unmatched among high-end GPUs. With Nvidia’s stock cooler, that translates into lower noise levels under load than any older GeForce or current Radeon in this class. I’m also quite happy with the suite of extras Nvidia has built into the Maxwell GPU, such as DSR for improved image quality and a big ROP count for high performance at 4K resolutions. This graphics architecture takes us a full generation beyond Kepler—and thus beyond what’s in current game consoles like the PlayStation 4 and Xbone.

All of that’s quite nice, if you insist on having the best and fastest. However, the GeForce GTX 970 is what really excites the value nexus in my frugal Midwestern noggin. For $339, the Asus Strix GTX 970 card we tested is astonishing, with overall performance not far from the GeForce GTX 780 Ti at a fraction of the price. This thing checks all of the boxes, with good looks, incredibly quiet operation, and relatively cool operating temperatures that suggest substantial overclocking headroom. As long as you can live with the fact that it has only one DisplayPort output, the Strix 970 looks like a mighty tempting upgrade option for anyone who has that itch.

Then again, MSI’s GTX 970 Gaming seems pretty nice, too. I’m not sure there are any bad choices here.

The folks who have tough choices to make are Nvidia’s competitors in the Radeon camp. What do you do when your single-GPU flagship has been bested by a smaller chip on a cheaper card? Cut prices, of course, and a well-placed industry source has informed us that a price reduction on the R9 290X is indeed imminent, likely early next week. We expect the R9 290X to drop to $399 and to receive a freshened-up Never Settle game bundle, too.

The revised price would certainly improve the 290X’s place on our value scatter plot. AMD would then be in the position of offering slightly better performance than the GeForce GTX 970 for a little more money—and with 100W of additional power consumption and associated fan noise. Does a game bundle make up for the extra power draw? I’m not sure what to make of that question. I’m also not sure whether a fully enabled Tonga variant could do much to alter the math.

I am happy to see vigorous competition and innovation giving PC gamers a better set of choices, though. Feels like it’s been a long time coming, and I’m still wondering at the fact that Nvidia was able to pull off this sort of advance without the benefit of a process shrink. We’ll have to dig deeper into some of Maxwell’s features, including multi-GPU operation, in the coming weeks.

Enjoy our work? Pay what you want to subscribe and support us.