DDR4 officially debuted on the desktop in 2014, with the launch of Intel’s Haswell-E, but 2015 is when we should start to see the standard go mainstream. We’ve previously discussed how increases in DRAM clock don’t necessarily translate into increased overall performance — DDR3-2133 has better latency, for example, than DDR4-2133 — but conventional wisdom says that this trend decreases, in the long term, as the new RAM standard becomes faster.

Now, a new report from Anandtech tests to see how a modern Haswell-E platform’s performance increases with improvements to clock speed. The site tested multiple DDR4 kits at both DDR4-2133 and DDR4-3200.

Benchmark results — why doesn’t DRAM scale?

As Anandtech’s benchmarks show, the benefits of moving from DDR4-2133 to DDR4-3200 are tiny. The vast majority of consumer applications and games show gains of 0-5%. To be clear, Anandtech does find a few applications where this trend gets bucked — minimum frame rates are up a touch in several titles, particularly in SLI testing, and there’s one benchmark, the Redis memory key-store test, where moving from DDR4-2133 to DDR4-3200 gives a relatively huge benefit of 16% for a 50% clock rate boost. These gains, however, are erratic and unpredictable. The Redis test is designed to benchmark an online application database, and explicitly depends on high memory bandwidth and CPU performance. Outside of these tests, performance mostly doesn’t improve from faster main memory — so why not?

First, there’s the fact that modern CPUs and software are all designed to hide or diminish the impact of latency (how long it takes the CPU to retrieve data) as opposed to stressing bandwidth (how much information an application can transfer at the same time). In the early days of computing, when CPUs had very small L1 or L2 caches, main memory latency and bandwidth had huge impacts on performance since they dictated how quickly the CPU could retrieve and execute new data.

Second, there’s the fact that each new RAM standard tends to trade higher access latencies for greater total bandwidth. The chart above shows the absolute latency of SDRAM through DDR4. If you start at the left and work your way across the chart, you’ll see what happened — PC133 might have matched DDR when the latter ran at 266MHz (effective), but DDR eventually scaled to 400MHz and hit latencies lower than anything SDRAM could offer. Similarly, DDR2’s 1066MHz’s absolute latency was better than DDR-400. It took DDR3-2133 to simply equal DDR2-1066, and PCWatch estimated we’d need DDR4-4266 to match the latency of DDR3-2133. This chart is a bit old now, so I’m not claiming it’s 100% accurate, but the general relationship it depicts is still true. DDR4 is far more scalable than DDR3, but it’s still fighting back from behind the latency eight ball. Even if it wasn’t, the advent of large caches, speculative prefetching, and on-die memory controllers have all reduced the impact of faster DRAM in most cases.

The various caches on the CPU are designed to provide fast memory accesses and to limit the need to tap main memory in the first place. Meanwhile, bringing the memory controller on-die (as both AMD and Intel have done) slashed latency compared to previous systems. This diminishes the impact of reducing latency on the DRAM itself.

The end result of all these improvements is that DRAM clock rates rarely matter to desktop applications past a certain point. Additional layers of cache and sophisticated algorithms have blunted the impact of memory speed in the vast majority of cases. It’s now an unusual application (at least, as far as regular users are concerned) that shows a difference.

It’s worth noting that this rule doesn’t apply to AMD’s integrated GPUs (Intel’s on-die GPUs can also benefit from faster RAM, though typically to a smaller degree than AMD’s do). One reason that upcoming technologies like HBM and HMC are still expected to significantly improve PC DRAM performance is precisely because they attack some of the design limits on current DRAM that prevent simple speed boosts from mattering more than they do. The ability to pass multiple simultaneous requests to different RAM blocks, for example, could dramatically slash latency in multi-threaded operations, and that could make new DRAM systems much faster in real-world applications.

Why move to DDR4 at all?

The lack of any performance gain in conventional desktop applications explains multiple facets of the DDR4 shift. First, AT’s results shed light on why there’s no push towards conventional DDR5 — the lack of any expected gain in the majority of applications that the industry is looking towards new architectures for building improvements into future RAM standards. Second, it explains why we’re seeing LPDDR4 talked up more than conventional desktop DDR4. Both DDR4 and LPDDR4 offer power consumption and density improvements, but those improvements are most important in the tablet and smartphone spaces, where every milliwatt of power you can save translates into better battery life. DDR4 for desktops, servers, and laptops will still matter, but the gains are incremental and not likely to transform the value proposition of the hardware.

DDR4’s also include certain features that will allow for greater memory densities, and that’s always important to certain markets — most PC users don’t need to stack 16-64GB DIMMs in a system, but the systems that do need those improvements will get better DRAM density out of DDR4 in the long run than DDR3 provided.

In short, the gains for the PC space will be in integrated desktop GPU performance, a very few desktop applications, better overall costs per GB, and lower power consumption. Don’t expect DDR4, even at high bandwidth, to drive enormous performance improvements over the long term.

Now read: AMD’s next-gen CPU leak: 14nm, simultaneous multithreading, and DDR4 support