That's cool and all, but what's in it for me?

The result of all this engineering is memory with improved bandwidth. Compared to the 32-bit bus width per DRAM chip of GDDR5 (with 16 chips used on an R9 290X to achieve its 512-bit bus), the first version of HBM sports a huge 1024-bit bus width that enables over 100GB/sec of bandwidth per memory stack. With a reduced memory clock speed of 500MHz (1000MHz effective), and four stacks in use, that gives AMD's HBM a total memory bandwidth of 512GB/sec. Input voltage is also down from 1.5v to 1.3v.

Plus, there are massive physical space savings over GDDR5: 1GB of GDDR5 takes up 672mm sq of space versus a 1GB stack of HBM that takes up just 35mm sq. This means you're likely to see much smaller form factors for graphics cards. AMD says its first-generation design will feature a PCB that's 50 percent smaller than the PCB in a R9 290X—just think what that might mean for dual-GPU designs or mobile graphics cards.

AMD also claims overclocking should be much easier than with GDDR5, thanks to the simpler clock system, which requires less voltage. Notably, even a small increase in clock speed will result in a large increase in bandwidth, thanks to the wide bus, so it'll be interesting to see how far overclockers can push the technology.

Does all this sound too good to be true? Well, there's always a compromise somewhere, and in the case of the first-generation implementation of HBM it's total capacity. AMD's design places four 1GB stacks of memory around the GPU on an interposer, resulting in 4GB of on-package memory. That's not small as such, but with memory usage on the rise with resolution increases, those in the market for a high-end graphics card (where AMD says HBM will be making its debut in a few months) might hope for more in order to avoid a costly upgrade again in a year or so. The 4GB limit also makes HBM a less than ideal solution for the workstation market, where GPU rendering must be held entirely within the the memory buffer.

AMD's CTO, Joe Macri, explained the 4GB limitation to Ars in a telephone call:

"You're not limited in this world to any number of stacks, but from a capacity point of view, this generation-one HBM, each DRAM is a two-gigabit DRAM, so yeah, if you have four stacks you're limited to four gigabytes. You could build things with more stacks, you could build things with less stacks. Capacity of the frame buffer is just one of our concerns. There are many things you can do to utilise that capacity better. So if you have four stacks you're limited to four [gigabytes], but we don't really view that as a performance limitation from an AMD perspective." "If you actually look at frame buffers and how efficient they are and how efficient the drivers are at managing capacities across the resolutions, you'll find that there's a lot that can be done. We do not see 4GB as a limitation that would cause performance bottlenecks. We just need to do a better job managing the capacities. We were getting free capacity, because with [GDDR5] in order to get more bandwidth we needed to make the memory system wider, so the capacities were increasing. As engineers, we always focus on where the bottleneck is. If you're getting capacity, you don't put as much effort into better utilising that capacity. 4GB is more than sufficient. We've had to go do a little bit of investment in order to better utilise the frame buffer, but we're not really seeing a frame buffer capacity [problem]. You'll be blown away by how much [capacity] is wasted."

When can I get my hands on HBM?

AMD says that HBM will debut on a high-end graphics card, though it declined to say which one exactly. It's hoping that HBM will "spread across many markets," including consumer APUs, supercomputers, and servers, but it'll be a GPU that gains the tech first, with all signs pointing towards the much rumoured R9 390X graphics card. Whatever that product ends up being, AMD says it'll be available to buy within "the next couple of months."

As for the competition, while AMD won't be the first to market with stacked memory—Samsung and Hynix are already producing high-density DDR4 modules that make use of stacked DRAM chips and TSVs—AMD will be the first to market with a GPU that makes use of the technology. Rival Nvidia has been talking up HBM for a long while, but it's not planning to release anything featuring it until the debut of its Pascal architecture, which is slated for release in 2016.

Then there's Micron's rival Hybrid Memory Cube (HMC) technology, which also uses TSVs to stack DRAM with even better high-bandwidth results. But so far, HMC has been limited to server and supercomputing use rather than anything consumer related. Intel's Knights Landing family of Xeon Phi supercomputing chips is one of the few things to actually ship with the stuff.

It is impressive that AMD is taking the leap and bringing HBM to a mainstream consumer product. Of course, we'll reserve our full judgement until we can actually test HBM on a shipping product. There's also the question of whether AMD plans to pair HBM with a substantially improved GPU, to really make those most of that extra bandwidth instead of sticking with the now ageing GCN 1.1 architecture used in its flagship R9 290X. Even then, perhaps you'll need to be running games in 1440p or even 4K to notice the difference.

Whatever the outcome though, HBM is an exciting technology; a technology that's going to give Nvidia a run for its money and make the next few rounds of GPU releases very interesting indeed.