Micron announced today that it’s now shipping early samples of its Hybrid Memory Cube to customers and early adopters. Currently the company is offering a 2GB HMC device composed of four-high stacks of 4Gbit memory dies, with a total bandwidth of 160GB/s for the 31×31 mm package. There’s also a smaller chip, at 16×19.5 mm, that offers two links instead of four and tops out at 120GB/s. A 4GB device is scheduled for early 2014, which could hit up to 320GB/s depending on the final implementation.

This leap forward has been in the works for several years; Intel first demonstrated HMC on stage with Micron at IDF nearly two years ago. Since then, Micron has been at work refining the technology and increasing the overall performance. At its heart, the HMC is built on DDR technology, but it leverages a 3D array of chips connected by TSVs (through-silicon via) with a logic controller embedded into the wafer. The Cube is then attached directly to the CPU in what Micron calls a “Short Reach” configuration.

It’s not a RAM replacement… yet

In the short-term, HMC is expected to have no impact on DDR4. That technology is already ramping up, motherboard chipsets are queued for development, and work has already been done to create low-power versions and high-density server variants. Where Micron thinks HMC will be vital, for the next few generations, is in large-scale routers or other devices that move huge amounts of data across networks and can benefit from having high-bandwidth data storage for a few critical cycles while working out the optimal routing path.

The long-term implications for PCs, however, are extremely positive. Hybrid Memory Cube uses massive parallelism to reduce access latency and its on-die logic controllers can theoretically power-gate the structure extremely efficiently to save on power. Total bandwidth of up to 320GB/s would put the HMC’s sustained bandwidth on par with a modern L3 cache from Intel — but an L3 cache with 100x the storage space. But there are a few problems to solve before that can happen. First, there’s the simple issue of die size and location. An HMC has to be integrated into something, either on the motherboard or the CPU. The 31x31mm package that Micron mentions is 961 mm sq. That’s enormous for a microprocessor — it’s far larger than the GTX Titan, for example. Integrating the design into an Intel or AMD processor would require a great deal of engineering and the motherboard socket size would be astronomical.

The other question is the memory controller itself. The HMC implementation uses its own logic controller on-chip, but Intel and AMD already have DDR4 controllers baked into their own processors. Do you build a chip that interfaces with a second memory controller (thereby losing some of the benefits AMD and Intel have both enjoyed from having the IMC on-die), or do you wait until HMC has scaled to the point that it can completely replace DDR memory, and jump at that point?

Micron claims that HMC eliminates the so-called “memory wall” — the problem of the disparity between CPU clock speed/bandwidth and the memory systems that serve them. This isn’t entirely true, since that gap is inherent to the construction of any sort of memory. There’s just no way to build a huge data array that can be sorted in the same amount of time it takes to sort a tiny one. But what HMC could do is knock the memory wall back a few paces, and give chips a much faster, wider pool of data than they’ve previously had access to.

HMC is another technology I should’ve actually wrapped into my recent enthusiast article on the future of the PC, though its time frame for introduction is uncertain. Three to five years seems a reasonable bet, and when it does appear, it could have an impact on performance similar to the impact of integrated memory controllers in previous years. DDR RAM has had an excellent run — it’ll be more than twenty years old by the time HMC is ready for mass deployment — but it may be time to explore other options with far better long-term prospects.

Now read: IBM and 3M to stack 100 silicon chips together using glue