There are major changes coming in the memory interface world, and recent interest in AMD and Nvidia’s plans to adopt the new High Memory Bandwidth standard make this a good time to explain the three new standards: Wide I/O, HBM, and HMC. Let’s kick things off with a basic question — why do we need new memory standards in the first place?

DDR4 and LPDDR4 are both incremental, evolutionary improvements to existing DRAM designs. As we’ll explore in this story, both standards improve power consumption and performance relative to DDR3/LPDDR3, but they’re not a huge leap forward. Many of the underlying technologies baked into the standard were set a decade or more ago, when total system bandwidth was a fraction of current levels and CPUs were all single-core.

While the standard has evolved considerably from where it began, it’s worth remembering that the first modern SDRAM DIMMs debuted on a 66MHz interface and provided 533MB/s of bandwidth. DDR4-3200, in contrast, is clocked at up to 1600MHz and offers up to 25.6GB/s of memory bandwidth. That’s an increase of 48x over nearly 20 years, but it also means that we’ve pushed the standard a very long way. While there’s been debate over whether or not to define a traditional DDR5, the broad industry consensus is that new solutions are necessary.

Samsung’s Wide I/O: Ultra low-power bandwidth

Wide I/O and Wide I/O 2 have been backed by companies like Samsung and are designed to provide mobile SoCs with a maximum amount of bandwidth at the lowest possible power consumption. It’s a technology that’s been most interesting to companies building smartphones and embedded systems, where high resolution displays have put enormous pressure on bandwidth while low power requirements are critical to battery life.

Wide I/O is designed specifically to stack on top of SoCs and use vertical interconnects to minimize electrical interference and die footprint. This optimizes the package’s size, but also imposes certain thermal limitations, since heat radiated from the SoC has to pass through the entire memory die. Operating frequencies are lower, but a large number of I/O pins increases bandwidth by using a memory bus that’s up to 1024 bits wide.

Wide I/O is the first version of the standard, but it’s Wide I/O 2 that’s expected to actually reach the mass market — though some have argued that true adoption won’t come until Wide I/O 3, which should finally open a gap between itself and LPDDR4. The standard was ratified by JEDEC, but it’s often associated with Samsung due to that company’s extensive work on bringing it to market. Timing is unclear, but no major devices are expected to ship with Wide I/O in the first half of 2015. We may see some limited pickup in the back half of the year, possibly from Samsung’s own foundries.

Wide I/O is explicitly designed to be a 3D interface, but 2.5D interposer designs are possible. Since one of the major challenges of a 3D Wide I/O structure is cooling the CPU underneath the DRAM it’s possible that the first chips will be 2.5D interposer designs.

Intel and Micron: Hybrid Memory Cube

In Corner #2, we have Hybrid Memory Cube, the joint Intel-Micron standard. HMC is designed to emphasize massive amounts of bandwidth at higher power consumption and cost than Wide I/O 2. Intel and Micron have claimed that up to 400GB/s of bandwidth may be possible via HMC, with production expected in 2016 and commercial availability in 2017.

HMC is not a JEDEC standard but has multiple development partners, including Samsung, Micron, Microsoft, Altera, ARM, Intel, HP, and Xilinx. One of the major goals of HMC is to strip out the duplicative control logic of modern DIMMS, simplify the design, connect the entire stack in a 3D configuration, then use a single control logic layer to handle all read/write traffic.

The promise of Hybrid Memory Cube is an architecture that’s explicitly designed to respond to multi-core scenarios and deliver data with much higher bandwidth and lower overall latency. HMC is extremely forward looking, and it solves a number of problems related to exascale computing, but it’s also dependant on a number of profound improvements to semiconductor manufacturing. It’s the most expensive new standard, and the only one not ratified by JEDEC.

The slide above is from 2011, but the projections appear to still be accurate. At huge scale, memory power consumption from DDR3 and DDR4 is simply too high to allow for efficient scaling. Slashing memory power consumption by two-thirds would have a huge impact on supercomputing in the 2020 timeframe.

Next page: High bandwidth memory…