Semiconductor Engineering sat down to talk about new DRAM options and considerations with Frank Ferro, senior director of product management at Rambus; Marc Greenberg, group director for product marketing at Cadence; Graham Allan, senior product marketing manager for DDR PHYs at Synopsys; and Tien Shiah, senior manager for memory marketing at Samsung Electronics. What follows are excerpts of that conversation.

SE: What are the big challenges in memory today?



L-R: Frank Ferro, Graham Allan, Tien Shiah, Marc Greenberg. Photo credit: Semiconductor Engineering/Susan Rambo

Ferro: The main ones are latency and bandwidth, and those haven’t changed very much. But the key difference now is that we’re seeing a swing back from where we had plenty of bandwidth and the compute was the bottleneck to where memory is the bottleneck again. That has given rise to a number of different technologies. HBM basically came out of nowhere a couple years ago, and now GDDR is showing up on everyone’s radar. To some extent we’re still at the mercy of the memory vendors because we’re building physical layers based on their specs and the JEDEC specs. But architecturally, we are looking at ways to innovate. That’s the next step.

Allan: Our customers are faced with a large array of choices, and if you fit into the category of someone who doesn’t need a vanilla interface and you have something that can be challenging or a relatively high-bandwidth requirement, you’re left with all these choices and you don’t know which one to go for. It’s a very difficult market to navigate when you’re not one of the major CPU vendors because you don’t get a lot of insight from the DRAM vendors. There are a lot of secrets in the memory market, such as how much does a DRAM cost. You can find out what DDR4 costs in very large volumes, but if you want to know what HBM or GDDR6 sell for, you can’t get that answer. And it changes every month, because DRAM prices fluctuate quite a bit. One of our main activities is walking customers through all of these different choices and what the tradeoffs are.

Shiah: There’s an exponential increase in terms of the amount of data that is being processed. If you look at what we’ve gone through in terms of computers, we started with a single computer and then we connected those to the Internet, and then we moved to everyone essentially carrying a computer in their pocket. Now we’re in this AI wave. This transition to AI has happened faster than previous waves, and with each wave we’ve seen the exponential growth in the amount of data. So we’ve been accumulating data because storage is cheap. The key now is being able to process that data. The companies that can make sense of that data are going to be the ones that are successful. The big cloud guys initially have accumulated a lot of data and they are using AI to process it. Enterprises are trying to do that now, as well. But if you use a roofline model to determine whether applications are compute- or memory-constrained, the AI applications are being constrained by memory bandwidth. To address that, we’re seeing a lot of interesting in HBM. We’ve seen orders of magnitude growth in revenue in HBM.

Greenberg: The diversity of different memory choices is causing people to struggle. I have customers that come to me that say, ‘I’d really like to go to GDDR6, or maybe LPDDR5 or HBM2. We try to guide them. For the server-based equipment, they’re choosing two on the same die. I’ve seen people with DDR5 and HBM2 or GDDR6 on the same die. There was a time where the memory industry focused on two memory types, DDR and LPDDR. GDDR was out there, but only for graphics and a a small number of other applications. All of those products are now stepping outside of their boundaries. GDDR is stepping far out of its graphics origin. LPDDR is stepping far outside of mobile power memory. We see a lot of overlap and confusion.

SE: There are a number of new architectures, such as advanced packaging, in-memory and near-memory computing. Will those architectures solve the memory bottleneck?

Ferro: The diversity of memories are part of those architectures. There are architectures that are more application-specific. So if you have to do a specific type of compute function, you look at the best memory for that. In networking, you may see HBM doing packet buffering or DDRs doing some other kind of housekeeping function. The SoCs and the networking processors are starting to optimize memory subsystems based on the type of workloads they’re seeing. In AI, the training is moving toward HBM. Inference can’t afford HBM, so they’re leaning toward GDDR6. Even some of the low-end AIs are going LPDDR4 and eventually LPDDR5 because their network cards are constrained by a specific power budget. So you have one AI card using LPDDR, the next one up using GDDR6, and the high-end ones using HBM.

Greenberg: The AI market is very young, and a lot of the AI companies haven’t decided what they want to be when they grow up. So they’re all set on a particular memory type, and then you talk to them a month later and their roadmap has changed and they pick a different memory type for a different market. It’s not only the memory industry that’s driving this. It’s also AI, which is one of the prime users of these high-bandwidth memories. And because that’s a young market, these companies are twisting and turning in their plans, as well.

SE: Is that AI in the data center for training, or is it the inference, as well?

Greenberg: It’s both. There is definitely data center AI, which is a big market. But it’s also people looking to do AI at the edge. People want to do AI in between. And then there’s automotive, which is another market.

Allan: If you look at the combination of DRAM, or large-scale memory, and computing, on the order of 20 years ago I worked for a company that was a DRAM design company. We were working with the University of Toronto at the time, and we created computational RAM, or CRAM. It had small computational units right in line with the DRAM. It’s a tremendous way to access the bandwidth of the DRAM and get computing power, but it doesn’t work, and I don’t think it ever will. DRAMs are the only things that are ever going to get manufactured in volume. DRAMs are high-volume beasts. If they weren’t, they would be incredibly expensive. Their lines are full pumping out DDR4 and LPDDR4, and they’re bringing on DDR5 and GDDR. They already have these lines packed. To try to shoehorn something in there that’s unique, with an unproven business model, doesn’t work because they can make money right away with what they’re already making. It’s going to be very hard to change that. It would require a new application with very, very high volume.

SE: That’s especially true for in-memory compute, right?

Allan: Yes, and that’s a combination where you’re trying to avoid the interface by not having two different chips. It’s kind of the anti-chiplet approach.

Greenberg: The analogy we give people is, function (x) = x + 1. We do that all the time in computing. Normally with a memory today, what you do is send a command to memory, read this memory, transfer the data back to the CPU, and the CPU will do the dumbest thing it knows how to do, which is to add 1 to the number and then send it back to the DRAM again across a PCB. There’s a significant amount of time it takes to do that. It ties up the CPU for a couple clock cycles, and it takes a lot of energy. If you were able to address x incremented by 1, and you only send one transaction out to memory, that’s fine. That’s the simplest example of compute in memory. It’s very energy efficient, and people have been trying to do that for 20 years.

Allan: But you still have to read it in order to do anything with it.

Greenberg: At some point.

Allan: We are taking baby steps. With DDR5 you can write all zeroes.

Greenberg: That’s a memory clear function, which is useful.

Allan: Yes. Things that don’t need any computational power.

Ferro: Even with something like HBM3 coming down the pike, you have logic that sits there and it’s in the DRAM already. So there is an opportunity to have silicon that’s in standard DRAM. You’re not doing anything super-custom. But you can take advantage of that logic layer and see what you can do in there. It’s still DDR, and you can organize and read it.

Allan: That’s the base-layer die, and there’s quite of bit of empty room on there.

Ferro: Yes, and it has the best of both worlds. It still looks like a standard DRAM and has the footprint that’s compatible, but there’s area for logic. But it also has to be standard and high volume to drive the cost down.

Shiah: If you look at the roadmap for HBM, primarily it has been addressing the HPC (high-performance computing) and AI types of markets. With AI you need faster training times and more accurate training. Faster training times involve faster memory. With every generation of HBM we upped the speed. If you want more accurate training you need deeper neural networks. And with more layers and deeper neural nets, you need more capacity. So every generation of HBM we upped the capacity. With HBM2E, we upped the speed by 33% and doubled the density. So instead of having 4- and 8-gigabyte stacks, now we have 8- and 16-gigabyte stacks. We’re addressing both vectors that are important for AI, which are faster training and more accurate training. Those kinds of applications initially were driving the AI market. But we’re seeing that other spaces are starting to adopt HBM, too, just because of the need to process data and move it around. Another big area where we’re seeing HBM adoption is in networking, whether it’s in the terabit-class switches that need packet buffers, or some of the evolving smart NICs (network interface cards). The smart NICs do a lot of offloading, both in terms of the network stack offload as well as the direct connection to the storage. Those are new applications. It’s all about processing data and moving around data.

SE: Where is the bottleneck today? Is it in the memory? Is it moving the data to the memory? Or is it in the processor?

Greenberg: We see a memory bandwidth problem. That’s the desire to go to these very high-bandwidth memories, whether it’s HBM or GDDR6. That extreme bandwidth comes with a cost, and people are always trying to balance the cost of implementing the solution versus how much bandwidth can you get. That’s always a consideration. If cost isn’t an object, you would go for the highest bandwidth you can achieve, because generally memory bandwidth is always good for computational systems. But at some point you have to consider your budget.

SE: Is that where GDDR6 wins out?

Shiah: In certain applications, yes. If you look at things like HPC and AI, especially for the high-end applications, the dollar per performance is less of a factor in those systems. They’re going for the highest performance. HBM is enabling fast supercomputers. There are other applications where economics do come into play. They are looking at things like the performance per dollar per gigabit. There’s a break-even analysis you can do with HBM versus GDDR for dollar per gigabit of each technology compared to the bandwidth you can enable.

Ferro: Even with the emergence of GDDR6, we had a number of HBM customers who were a little bit surprised at the overall cost of the system. It’s still early. They need to get into the physical design of 2.5D manufacturing, and the cost of the interposer and warpage of the interposer and heat. There were a lot of physical designs that went back to PCB design to save costs. We had customers looking at HBM and opting to go to GDDR6, and we had some go the other way. There’s a gray area, but once you identify the application then GDDR6 or HBM becomes a clear choice.

Related Articles and Videos

DRAM Tradeoffs: Speed Vs. Energy

Part 2: Which type of DRAM is best for different applications, and why performance and power can vary so much.

Memory Options And Tradeoffs

What kinds of memories work best where and why.

Memory Tradeoffs Intensify In AI, Automotive Applications

Why choosing memories and architecting them into systems is becoming much more difficult.

In-Memory Computing Challenges Come Into Focus

Researchers digging into ways around the von Neumann bottleneck.

In-Memory Vs. Near-Memory Computing

New approaches are competing for attention as scaling benefits diminish.

GDDR6 – HBM2 Tradeoffs

What type of DRAM works best where.

Latency Under Load: HBM2 Vs. GDDR6

Why choosing memory depends upon data traffic.

Target: 50% Reduction In Memory Power

Is it possible to reduce the power consumed by memory by 50%? Yes, but it requires work in the memory and at the architecture level.

Hybrid Memory

Tech Talk: How long can DRAM scalIng continue?