DRAM is pretty amazing stuff. The basic structure of the RAM we still use today was invented more than forty years ago and, just like its CPU cousin, it has continually benefited from the huge improvements that have been made in fabrication technology and density improvements. Less than ten years ago, 2GB of RAM was considered plenty for a typical desktop system — today, a high-end smartphone offers the same amount of memory but at a fifth of the power consumption.

After decades of scaling, however, modern DRAM is starting to hit a brick wall. Much in the same way that the CPU gigahertz race ran out of steam, the high latency and power consumption of DRAM is one of the most significant bottlenecks in modern computing. As supercomputers move towards exascale, there are serious doubts about whether DRAM is actually up to the task, or whether a whole new memory technology is required. Clearly there are some profound challenges ahead — and there’s disagreement about how to meet them.

What’s really wrong with DRAM?

A few days ago, Vice ran an article that actually does a pretty good job of talking about potential advances in the memory market, but includes a graph I think is fundamentally misleading. That’s not to sling mud at Vice — do a quick Google search, and you’ll find this picture has plenty of company:

The point of this image is ostensibly to demonstrate how DRAM performance has grown at a much slower rate than CPU performance, thereby creating an unbridgeable gap between the two system. The problem is, this graph no longer properly illustrates CPU performance or the relationship between it and memory. Moore’s law has stopped functioning at anything like its historic level for CPUs or DRAM, and “memory performance” is simply too vague to accurately describe the problem.

The first thing to understand is that modern systems have vastly improved the bandwidth-per-core ratio compared to where we sat 14 years ago. In 2000, a fast P3 or Athlon system had a 64-bit memory bus connected to an off-die memory controller clocked at 133MHz. Peak bandwidth was 1.06GB/s while CPU clocks were hitting 1GHz. Today, a modern processor from AMD or Intel is clocked between 3-4GHz, while modern RAM is running at 1066MHz (2133MHz effective for DDR3) — or around 10GB/sec peak. Meanwhile we’ve long since started adding multiple memory channels, brought the memory controller on die, and clocked it at full CPU speed as well.

The problem isn’t memory bandwidth — it’s memory latency and memory power consumption. As we’ve previously discussed, DDR4 actually moves the dial backwards as far as the former is concerned, while improving the latter only modestly. It now looks as though the first generation of DDR4 will have some profoundly terrible latency characteristics; Micron is selling DDR4-2133 timed at 15-15-15-50. For comparison, DDR3-2133 can be bought at 11-11-11-27 — and that’s not even highest-end premium RAM. This latency hit means DDR4 won’t actually match DDR3’s performance for quite some time, as shown here:

This is where the original graph does have a point — latency has only improved modestly over the years, and we’ll be using DDR4-3200 before we get back to DDR3-1600 latencies. That’s an obvious issue — but it’s actually not the problem that’s holding exascale back. The problem for exascale is that DRAM power consumption is currently much too high for an exascale system.

The current goal is to build an exascale supercomputer within a 20MW power envelope, sometime between 2018 and 2020. Exascale describes a system that has exaflops of processing power, and perhaps hundreds of petabytes of RAM (current systems max out at around 30 petaflops and only a couple of petabytes of RAM. If today’s best DDR3 were used for the first exascale systems, the DRAM alone would consume 54MW of power. Clearly massive improvements are needed. So how do we find them?

Next page: How to break the DRAM bottleneck