More than seven years in the making, AMD on Tuesday unveiled what it believes will be a game-changing technology: a superfast, stacked chip technology called High Bandwidth Memory. Even better, the company crowed: Nvidia is at least a year behind it.

Using stacked RAM chips that communicate through tiny holes, High Bandwidth Memory offers three times the performance of traditional GDDR5 that’s used by GPUs today and drastically reduces power consumption too, said AMD Chief Technology Officer Joe Macri, who oversaw HBM’s development.

Why this matters: A graphics card’s memory bandwidth matters as much to game performance as the graphics processor. An increase in memory bandwidth almost always means more performance, too, when coupled with changes to the GPU.

The Problem

Modern graphics cards drink memory bandwidth like a big-block V8 drinks gas. The problem is the current memory, GDDR5, is rapidly approaching the point of diminishing returns, Macri said. To add more memory bandwidth using GDDR5 would consume too much power to be an effective performance boost.

”Nvidia creates PowerPoints and talks in advance like they are the wonderful leader of everything,” Macri said. “While they’re talking, we’re working.”

Part of GDDR5’s problem is how the chips connect to the GPU. GDDR5 RAM uses contacts at the edges of the individual chips. To add more bandwidth, you add more chips. But those chips must be laid out on a videocard’s circuit board alongside each other, which leads to a suburban sprawl-like issue. Besides consuming a lot of space on the printed circuit board, it also means very long wires or traces must be run to reach the GPU. And it’s not just the RAM chips—the power plants or voltage regulators to run that suburban sprawl also have to be factored in. Because you’re pushing more signals along longer wires, you have to use more power, which means larger voltage regulators.

PCWorld This is the RAM layout of AMD’s GDDR5-based Radeon R9 290X and illustrates how much space is used and how far the wires must travel to reach the GPU in the middle.

HBM addresses the limitations of GDDR5 by going vertical like a high-rise. By stacking four memory chips, AMD can get the RAM closer to the GPU, which shortens the wire length drastically. Unlike GDDR5, HBM RAM uses a technique called through-silicon vias or TSVs, that string wires vertically through holes in a stack of chips. Each layer also connects to the next directly using tiny bump contacts.

Because the layers interconnect and the wires don’t have to go as far to reach the GPU, it’s possible to make the bus far wider without incurring the power consumption of GDDR.

PCWorld HBM lets AMD reduce power and increase speed by physically moving the chips much closer than a traditiional GDDR5-based graphics card.

The dividends, Macri said, are radical. A GDDR5 chip, today, for example will support a 32-bit-wide memory bus with up to 7GBps of bandwidth. A single stack of HBM RAM supports a 1,024-bit-wide bus and more than 125GBps of memory bandwidth. That HBM chip also runs at a far lower clock speed while achieving a magnitude more memory bandwidth.

Because power efficiency is very important, AMD says an HBM stack will hit 35GBps of memory bandwidth per watt consumed, vs. GDDR5’s 10.5GBps. Power efficiency isn’t just about mobile applications, either. By using less power to drive the memory, and thus creating less heat, you can take the savings and, say, increase the clocks of the GPU core.

PCWorld A conventional GDDR5 RAM chip connects through its edges while HBM will use tiny holes through the chips themselves to communicate.

There are several different ways to stack the chips. AMD’s approach to HBM is to use a “2.5D” technique using a passive interposer layer. This approach is different than a design method called “3D,” where the RAM chips are piled up on the GPU itself. Macri said people should not be be dissuaded that this isn’t in three dimensions.

“It is a true 3D design method,” he said. “We’re not just designing in X and Y any more, we’re designing in X, Y and Z.”

PCWorld AMD’s HBM implementation is called 2.5D and uses a passive interposer layer as the foundation.

Macri didn’t mince any words when speaking of his chief competitor. Nvidia flouted a chip-stacking technique on its upcoming—but now delayed—Volta GPU as early as 2013. With Volta now delayed, Nvidia’s first GPU with stacked memory likely won’t appear until 2016, when its Pascal GPU ships with a similar 2.5D stacking technique.

“Nvidia creates PowerPoints and talks in advance like they are the wonderful leader of everything,” Macri said derisively. “While they’re talking, we’re working.”

Macri said AMD has been the primary driver of HBM memory in the industry.

“We do internal development with partners, we then take that development to open standards bodies and we open it up to the world,” he said. “This is how Nvidia got GDDR3 and how they got GDDR5. This is what AMD does. We truly believe building islands is not as good as building continents.”

Macri is probably in a position to know: Besides being AMD’s Chief Technology Officer, Macri has long been a chairman at JEDEC, the group that blesses memory standards for the industry. AMD did indeed beat Nvidia in introducing graphics cards with GDDR3 and GDDR5, but that always brings concerns over yields with new technology.

Macri said HBM is new, but that doesn’t mean people should assume yield issues. Macri wouldn’t elaborate on yield from its chief partner in the project, Hynix, but said AMD wouldn’t adopt it for a consumer part if it didn’t think it could get enough HBM RAM to make GPUs.

HBM vs. Hybrid Memory Cube: Fight? Not.

HBM’s performance-to-power ratio is so good and appealing, Macri said he expects the new memory to be adopted in other areas besides big GPUs. We can expect HBM to appear on CPUs with integrated graphics chips, as well as servers and workstations. That would appear to put HBM on a collision course with a similar memory technology Intel and Micron are working on called Hybrid Memory Cube, or HMC. HMC is an advanced stacked memory design, but unlike AMD, Intel and Micron are trying to adopt it without the slow rule-by-committee of JEDEC.

Macri said the problem Intel and Micron are trying to solve with HMC is one that affects supercomputers and isn’t likely to be at odds with HBM, as the former is more an expensive technology for a different application. Intel officials would seem to agree.

“As a strong supporter of industry standards, Intel is a partner in the High Bandwidth Memory (HBM) development within JEDEC and is exploring the usage of HBM in Intel platforms,” an Intel spokesman said in a statement to PCWorld. “While Intel is not a member of the Hybrid Memory Cube Consortium (HMCC), we did collaborate with Micron Technology on a custom derivative of HMC optimized for Intel Xeon Phi “Knights Landing” processor.”

In other words, those expecting to see HBM and HMC re-litigate the brutal Rambus vs. DDR wars of the 1990s will be disappointed. Intel is down with HBM too.

Chiphell.com Chiphell.com claims this alleged leaked image could be the liquid-cooled version of AMD’s Radeon 390X card. With HBM memory and the space savings, it could very well be.

First HBM cards limited to 4GB?

Although Macri did not discuss any particulars of AMD GPUs with HBM, the Internet has been rampant with leaks and rumors about a card believed to be named the Radeon R9 390X using HBM. Many believe the 390X will put AMD back in contention with Nvidia, which has held the performance crown for some time. AMD’s only answer to Nvidia’s GeForce 9-series cards has been to continue to cut prices for its top-end Radeon cards.

Even worse, those Radeon cards have long had a reputation for running hot and consuming more power than Nvidia’s products. If AMD is to be believed, HBM will certainly help with power and thermal issues, but one perceived weakness Nvidia is likely to try to exploit is the amount of RAM.

Macri didn’t specifically say that the company’s first HBM card would be limited to 4GB of RAM, but with most of AMD’s illustrations, including one used here, showing four stacks each with 1GB, it isn’t hard to do the math. With Nvidia pushing 12GB in its GeForce Titan X, it’s easy to see total memory used as a marketing lever to work consumer buying emotions.

“This might be a marketing problem,” he said, “but the end user values performance, values form factor and values power consumption.”

Macri also said 4GB is an “enormous” amount of RAM and there are other ways to approach it than just ladling on more and more memory.

Highly customized GPUs from vendors will have fewer options with HBM memory.

Custom GPU designs might be a little less custom

One last concern Macri addressed was concern over highly customized video cards. Typically when a new graphics core is introduced, AMD and Nvidia make reference designs available to board partners. These are essentially the completed boards with RAM, GPU, coolers and voltage components ready to go. But the more advanced vendors will set their engineers to work creating customized board layouts with beefier components and cooling to appeal to the gamer and overclocking crowd.

Because HBM will use memory integrated directly onto the interposer and be sold as a single package, board partners could potentially be limited in how hard they can differentiate their cards from others.

Macri said that HBM won’t impact them too much.

“The degrees of freedom are still there,” he said. “But (board partners) will be turning the knobs a little differently.” He also said even though board partners have the capability to buy RAM from different vendors for custom designs, they typically don’t.