Navi’s July launch was a key highlight in a year full of AMD surprises. The Radeon RX 5700 and the RX 5700 XT are both among the most powerful GPUs AMD’s ever created. And, fabbed on the 7nm process, they’re also among the most power-efficient. And that’s progress for a company that, just a few years ago, told customers that 94 degrees was “normal operating temperature” for its flagship GPUs.

AMD Navi Deep Dive

With Navi, AMD introduced several key changes. Let’s take a look at each of them:

1.) The Move to 7nm

Navi 10 isn’t AMD’s first 7nm GPU silicon. That honor goes to the Radeon VII, a $700 halo product from last year that literally no one bought. And for good reason, too. The Radeon VII was a simple die-shrink of Vega, with faster HBM memory. AMD used the power efficiency gains from 7nm to crank clocks up on the Radeon VII, allowing it to run in the 1700-1800 MHz range.

Together with the increased memory bandwidth, it was noticeably faster than Vega 64. But that didn’t really mean anything: it choked in gaming workloads relative to the RTX 2080 and it consumed prodigious amounts of power. It really just existed for AMD to stake its claim to have introduced the first 7nm GPU to the market.

Navi 10 is the first real 7nm GPU. The move down from 14nm has been leveraged in the same way–AMD’s opted to crank clockspeeds up, as opposed to focusing on efficiency. Consequently, the RX 5700 XT sucks up 225W of power–not far from 250W flagship mark. However, AMD’s been able to run the GPU at very high clocks. If we get past the whole base/game/boost clock business, we’re looking at typical clockspeeds in excess of 1800 MHz on reference models.

Meanwhile, an overclock can take most Navi 10 parts up to 2 GHz. This is much higher than Vega, where reference clocks topped out in the 1600 MHz range for the liquid-cooled Vega 64. The move to 7nm alone gives AMD 20-25 percent of “free” performance by allowing them to dial up the clockspeeds. But that’s not all Navi has going for it.

2.) GDDR6–Fast memory goes mainstream

Over the past few years, AMD flagships have shipped with exotic–and very expensive HBM memory. I owned an R9 Fury. When that particular card launched in 2015, with 512 GB/s of memory bandwidth, Nvidia’s best–the 980 Ti–topped out at 336 GB/s. AMD again used HBM in the Vega line and Radeon VII. The trouble is that it adds a lot to the bill of materials, especially as the HBM stack size increases.

Consequently, there was no economical way of bringing that level of bandwidth down to mainstream price-points. Things changed once GDDR6 arrived on the scene. GDDR6 isn’t radically different from regular GDDR5. It just clocks a lot higher. By throwing 14 Gbps GDDR6 modules in Navi, paired to a 256-bit memory bus, AMD was able to deliver high bandwidth at 448 GB/s, at a relatively low price.

Navi’s surfeit of memory bandwidth lets them go toe to toe with the competition at higher resolutions. While 1440p is the sweetspot for both the RX 5700 and 5700 XT, high memory bandwidth somewhat reduces the performance hit when going higher. It makes 4K gaming at 40 FPS and above viable on Navi. This was something that just wasn’t possible on earlier mainstream AMD cards. Trust me. I had an RX 480 and 580, too. I tried. I cried.

3.) Architectural Changes: RDNA is GCN-Plus

It would’ve been easy for AMD to have die-shrunk GCN, run Navi at higher clocks, and just call it day. After all, that’s exactly what they did with Polaris and Vega. But with Navi, AMD’s finally moved beyond the now-ancient GCN microarchitecture. Navi 10 marks the first outing of RDNA. A big part of why the RX 5700 and 5700 XT perform so well, despite only having 2560 and 2304 shader cores respectively, comes down to efficiency gains thanks from RDNA.

Navi does a lot more with far less. RDNA, at least as seen in first-gen Navi, isn’t completely divorced from GCN–it’s easier to think of it as an extensively customized GCN, optimized more for gaming workloads than compute. So what exactly is going on here?

Fundamentals of GCN remain unchanged. However, AMD introduced several meaningful optimizations in RDNA. A multi-level cache means that the shaders have faster access to memory, with less bottlenecking as data is shuttled back and forth from VRAM.

Instead of using 4 SIMD-16 units, RDNA features dual SIMD-32 units per CU. This makes the architecture much more efficient. GCN’s SIMD configuration meant that it could only process one instruction per four cycles. In order to keep GCN CUs “fed,” you would need to pipeline 4 instructions per clock. This meant that in the case of simple, one or two instruction calculations, there was down-time as the CUs sat unused. RDNA’s SIMD configuration allows it to handle single instructions in one cycle–this means less downtime and greater efficiency.

Navi also doubles down on schedulers and scalar units. Each CU has an additional scalar unit for handling math, and an additional scheduler for increasing instruction throughput. Together, this means calculations happen faster, more of them happen at a time, and the CUs are running idle for less time.

All in all, this results in IPC gains of up to 1.25x. This means that, in a given workload, an identically specced Navi part will perform up to 25 percent better than its Vega counterpart. When combined with the higher clockspeeds and plentiful memory bandwidth, Navi runs rings around Vega performance-wise, even if its 1500 shaders short.

RDNA, at least its Navi iteration, isn’t fundamentally new. But enough has changed to allow AMD to compete with Nvidia in performance and efficiency. We’re looking forward to seeing what second-gen Navi offers, and not just in terms of hardware ray-tracing.