The new version of the high bandwidth memory standard promises greater speeds and feeds and that’s about it.

Samsung introduced the first memory products in March that conform to JEDEC’s HBM2E specification, but so far nothing has come to market—a reflection of just how difficult it is to manufacture this memory in volume.

Samsung’s new HBM2E (sold under the Flashbolt brand name, versus the older Aquabolt and Flarebolt brands), offers 33% better performance over HBM2 thanks to doubling the density to 16 gigabits per die. What’s changed is these devices are based on eight 16-Gbit memory dies interconnected using through-silicon vias (TSVs) in an 8-Hi stack configuration. That means a single package is capable of 410 GB/s memory bandwidth and 16GB of capacity. With double per-die pin capacity, DRAM transfer speeds per pin can reach 3.2 Gbps.

Using thousands of TSVs required some creative electrical engineering to avoid signal loss. To deal with this, multiple TSVs are used for each data bit. Samsung used TSVs, so instead of connecting DRAM chips with wirebond connections on the side, the DRAM chips could be stacked. Each chip has holes cut for connection to the chip below.

HBM was getting to the point where the length of the wires was causing inductance issues. With HBM2, Samsung increased memory frequency to 1.2 GHz and decreased the voltage to 1.2V at the same time. The company says that to lower the clock skew it had to decrease the deviation of data transfer speeds among the TSVs, and it increased the number of thermal bumps between the DRAM dies to distribute heat more evenly across each KGSD.

Samsung has not disclosed how it achieved this 33% jump in performance, but it is likely an evolution of the above processes, like everything else.

“When we built HBM2, we wanted to expand the market breadth the device could attack, but also add in two dimensions—capacity and more bandwidth,” said Joe Macri, corporate vice president and chief tech officer of the compute and graphics division at AMD. AMD is a major partner with Samsung in the development of HBM. “It’s still 1,024 bits wide, but doubled the frequency to two gigachannels and added Error Correction Code (ECC) to get into data center and AI and machine language, since the entire data center market is built on a trusted data model.”

With HBM2E, AMD, one of the co-developers of HBM, is turning the same levers again. “The only bits added to the interface were to increase addressability, but it’s the same interface, it just runs at a higher interface of 3.2 gigatransfers per second,” Macri said.

With the 1,024-bit data bus, HBM2E runs very wide, but not very fast. Two gigabits of throughput is DDR3 speeds, notes Frank Ferro, senior director of product management at Rambus. “By going wide and slow you keep the power and design complexity down on the ASIC side. Wide and slow means you don’t have to worry about signal integrity. They stack the DRAM in a 3D configuration, so it has a very small footprint,” he said.

An HBM stack of four DRAM dies has two 128‑bit channels per die, for a total of eight channels and a bus width of 1,024 bits in total.

From 2 to 2E

The change from HBM2 to 2E is not revolutionary. It is pretty much a speeds and feeds update, but that’s more than enough for now, said Samsung.

“The key motivation in market outreach with HBM2E is higher capacity, and HBM3 will have even higher bandwidth and more capacity,” said Tien Shiah, senior manager for memory at Samsung. “HBM2 is limited in scope to the memory of the co-processor, primarily for AI and machine learning applications. But the forthcoming higher capacities will allow system architects to consider HBM in a much greater number of applications for more powerful next-generation artificial intelligence, machine learning, and Exaflop/Post Exaflop supercomputing.”

Hugh Durdan, vice president of strategy and products at eSilicon, agrees. “It’s more evolutionary than revolutionary,” he said. “I see extensions and enhancement of existing types of designs. HBM2E is an extension of HBM2. It’s faster but more important, adds address bits that lets you build a memory set four times as large and increase capacity of what you can put next to your SoC.”

Samsung is positioning HBM2E for the next-gen datacenter running HPC, AI/ML, and graphics workloads. By using four HBM2E stacks with a processor that has a 4096-bit memory interface, such as a GPU or FPGA, developers can get 64 GB of memory with a 1.64 TB/s peak bandwidth—something especially needed in analytics, AI, and ML.

Although vendors can stack the HBM to up to 12-hi, AMD’s Macri believes all vendors will keep 8-hi stacks. “There are capacitive and resistance limits as you go up the stack. There’s a point where you hit where in order to keep the frequency high you add another stack of vias. That creates extra area of the design. We’re trying keep density cost in balance,” he said.

Boosting AI

Existing apps will see a benefit from HBM2E just because of faster speeds and feeds. “A 50% capacity increase allows you to add so much more to the working set. So things you couldn’t cut up to fit now fit. As we scale our capability the memory is scaling with us,” he said.

At the same time, HBM2E opens the door to artificial intelligence in general, and machine learning in particular, because it is massively data intensive and requires processing of terabytes of data to train the machine. This is where HBM2E is expected to shine.

“One of the challenges in AI (from edge to cloud) is getting sufficient memory near the compute to ensure highest performance,” said Patrick Dorsey, vice president of product marketing in the Programmable Solutions Group at Intel. “As AI network models continue to grow in complexity and performance needs, the inherent flexibility of FPGAs to scale in compute capacity with a higher bandwidth HBM roadmap enables FPGAs to address new algorithms that were not possible before.”

High-performance compute and AI applications often require high-performance data compression and decompression, and HBM-based FPGAs can more efficiently compress and accelerate larger data movements. Dorsey sees FPGA with HBM enabling AI, data analytics, packet inspection, search acceleration, 8K video processing, high performance computing, and security.

Other uses

HBM primarily has been a GPU play, but Intel said its Stratix 10 FPGAs uses HBM2, and its newly announced Agilex FPGAs will support next-generation HBM integration.

Intel isn’t alone. There has been a flood of AI chips in the works or on the market from a variety of companies, including Microsoft, Google and dozens of startups, and they all are looking at HBM, said Shiah.

“We see two architectures emerging. One is in-memory processing, to address the compute memory problems. The other is very near-memory processing. Virtually every AI chip company that we’ve come across is either looking at HBM or going to HBM. There might be some startups trying to do something different with SRAM, but every major AI chip company that is either in production or exploring designs is using HBM,” he said.

Shiah noted that many AI applications are memory bandwidth constrained, as opposed to compute constrained. Companies have used the roofline model to determine this is the case. “The current industry limitation is the speed and capacity of high speed memory. HBM2E and HBM3 will address the memory bandwidth problem with much faster and higher capacity memory,” he said.

Another new area where HBM2E is expected to make inroads is network packet switching, which needs the bandwidth to pump through the exabytes of network traffic the Internet handles every day.

eSilicon’s Durdan predicts that network switch chips will see line rates continue to increase, from the current 28Gbps up to 56Gbps, and they eventually will go up to 112Gbps. HBM2E and HBM3 will be required to keep up. “Every time you double the line rate you process twice as much data, so you need memory capacity to keep up with data to be processed. [HBM2E] will enable higher capacity chips. The processors get faster, and the links between processors and software will get faster. Memory ends up being a key piece of the equation,” he said.

Cost remains an issue

Making 2.5D packages is not a cheap process, and HBM DRAM prices have been high due to limited availability. Both have limited the use of HBM to high-end networking and graphics devices.

Jim Handy, principal analyst with Objective Research, estimates the cost of a DRAM wafer at $1,600. HBM2 adds another $500 to that cost, a 30% premium. “DRAM makers would charge a lot more than 30% for it,” he said. “I expect [HBM] will remain in the GPU space mostly, because it’s expensive. If the market gets big enough, then production will get cheaper and open the doors to new apps. I wouldn’t be surprised if in 10 years the majority of apps use HBM memory, but I also wouldn’t be surprised if they don’t.”

HBM is not widely used in GPUs, either. Despite its broad product line, Nvidia only uses HBM2 in four of its cards—Tesla P100, Tesla V100, TITAN V, and Quadro GV100. AMD uses it in its Radeon 7 and MI lines.

The price has remained high because there has not been a widespread adoption by companies to work on a new cost structure or increase supplies. Samsung is the only supplier of HBM2E DRAM for now.

Hynix has an HBM2 product, but not 2E, and it has not said when it will have a product on the market. A Micron spokesperson noted that Micron is developing HBM2E, as well, and is engaging industry enablers to understand their needs for implementation of the technology.”

(Micron has since abandoned the rival Hybrid Memory Cube architecture, which never achieved JEDEC standard status like HBM. The HMC domain, http://www.hybridmemorycube.org/, has since expired.)

On the horizon: HBM3

JEDEC is not standing still. HBM3 was announced by Samsung and Hynix at the 2016 Hot Chips conference with the usual changes—increased memory capacity, greater bandwidth, lower voltage, and lower costs. Bandwidth is expected to be 512 GB/s or greater. The memory standard is expected to be released next year.

Then there are HBM3+ and HBM4, reportedly to be released between 2022 and 2024, with more stacking and higher capacity. HBM3+ is supposed to offer 4 TB/s throughput and 1024 GB addressable memory per socket.

As it stands now, details on HBM3 are sparse. “It’s still a few years away, the standard is not defined yet. People are thinking of what they might like, but it’s too far out to think about it,” said Durdan.

Shiah said that in the past, HPC/AI was a major driver for the HBM roadmap, so speeds and feeds were the priority. But that will change. “Newer applications, however, are likely to require other attributes, such as extending the operating temperature, which we will be considering in our HBM2E and HBM3 designs,” he said.

Related Articles

DRAM Tradeoffs: Speed Vs. Energy

Which type of DRAM is best for different applications, and why performance and power can vary so much.

Using Memory Differently To Boost Speed

Getting data in and out of memory faster is adding some unexpected challenges.

HBM2 Vs. GDDR6: Tradeoffs In DRAM

Choices vary depending upon application, cost and the need for capacity and bandwidth, but the number of options is confusing.

GDDR6 – HBM2 Tradeoffs

What type of DRAM works best where.

HBM2e Offers Solid Path For AI Accelerators (Blog)

AI processor performance is rapidly growing, making memory architecture choice more important.