AMD CEO Lisa Su and several of the company's famous past-and-current architects, like Jim Keller and Mike Clark, receive much of the public recognition for the company's amazing resurgence. But Mark Papermaster has served as the company's CTO and SVP/EVP of Technology and Engineering since 2011. He's been at the helm of developing AMD's technology throughout its David versus Goliath comeback against industry behemoth Intel, giving him incredible insight into the company's past, present, and future.

We sat down with Papermaster during the Supercomputing 2019 conference to discuss the company's latest developments, including shortages of Ryzen processors, what we can expect from future CPUs, the company's new approach of enabling a mix of faster and slower cores, thoughts on SMT4 (quad-threaded processor cores), and the company's take on new Intel technologies, like Optane Persistent Memory DIMMs and OneAPI.

Given AMD's success in the data center, we also discussed if EPYC Rome is impacting industry interest in competing x86 alternatives, like ARM.

How Many Cores Are Enough?

It takes a lot of engineering wizardry to enable, but a big part of AMD's success stems from the rather simple concept of delivering more for less. For enthusiasts and data center architects alike, that starts with more cores. AMD's Zen has spurred a renaissance in core counts, boosting the available compute power we can cram into a single processor, forcing Intel to increase its core counts in kind. That incredible density started with the EPYC lineup that now stretches up to 64 cores, besting Intel's finest in the data center.

On the consumer side, the Ryzen 9 3950X brings an almost-unbelievable boost to 16 cores on mainstream platforms, a tremendous improvement over the standard of four cores just a mere two years ago. As AMD moves forward to smaller processes, that means we could theoretically see another doubling in processor cores in the future. That makes a lot of sense for the data center, but begs the question of how many cores an average consumer can actually use. We asked Papermaster if it would make sense to move up to 32 cores for mainstream users:

"I don’t see in the mainstream space any imminent barrier, and here's why: It's just a catch-up time for software to leverage the multi-core approach," Papermaster said. "But we're over that hurdle, now more and more applications can take advantage of multi-core and multi-threading.[...]"

"In the near term, I don’t see a saturation point for cores. You have to be very thoughtful when you add cores because you don’t want to add it before the application can take advantage of it. As long as you keep that balance, I think we'll continue to see that trend."

Are Processors Going to Get Slower as they Shrink?

Over the years, we've become accustomed to higher clock speeds with smaller nodes. However, we've reached the point where smaller nodes that enable more cores can also suffer reduced frequencies, like we've seen with Intel's Ice Lake family. As potent as TSMC's engineering team is, there's possibly a diminishing point of frequency returns, if not frequency declines, on the horizon as it moves to the smaller 5nm process. Papermaster is confident in AMD's ability to offset those challenges, though.

"We say [Moore's Law] is slowing because the frequency scaling opportunity at every node is either a very small percentage or nil going forward; it depends on the node when you look at the foundries. So there's limited opportunity, and that's where how you put the solution together matters more than ever," Papermaster said.

"That's why we invented the Infinity Fabric," he explained, "to give us that flexibility as to how we put in CPU cores, and how many CPU cores, how many GPU cores, and how you can have a range of combinations of those engines along with other accelerators put together in a very efficient and seamless way. That is the era of a slowed Moore's Law. We’ve got to keep performance moving with every generation, but you can't rely on that frequency bump from every new semiconductor node."

AMD will also evolve its Infinity Fabric to keep up with higher-bandwidth interfaces, like DDR5 and PCIe 5.0. "In an era of slowed Moore's Law where you are getting less frequency gain, and certainly more expense at each technology node, you do have to scale the bandwidth as you add more engines going forward, and I think you're going to see an era of innovation of how in doing so you design to optimize the efficiency of those fabrics," Papermaster said.

Ryzen 3000 Shortages

AMD's boosted core counts come as a byproduct of TSMC's denser 7nm process, but the company initially suffered from nagging post-launch shortages of its high-end SKUs and had to delay its flagship desktop processor, leading to questions about AMD's ability to satiate demand. Those questions are exacerbated by reports that TSMC has extended lead times for its highly-sought-after 7nm process, and because AMD competes for wafer output with the likes of Apple and Nvidia.

"We're getting great supply from our partner TSMC." Papermaster said, "Like any new product, there is a long lead time for semiconductor manufacturing, so you have to guess where the consumers are going to want their products. Lisa [Su] talked about the demand simply being higher than we anticipated for our higher-performance and higher-ASP [products], the Ryzen 3900 series. We've now had time to adjust and get the orders in to accommodate that demand. That's just a natural process; in a way, it’s a good problem to have. It means the demand was even higher than we originally thought."

As a natural result of semiconductor fabrication, each wafer has dies with different capabilities, which are then binned (sorted) according to their capabilities. AMD's faster processors require the cream-of-the-crop dies, and the company simply wasn't receiving enough of those premium dies. We asked if getting more high-end die is simply a function of ordering more wafers:

"We work closely with the foundry to get the right mix on any chip. You have various speed ranges that come out of the manufacturing line. You have to decide in advance what you think is the distribution of chips and work with the foundry partner to make sure you call the demand right," Papermaster elaborated.

Unlocking Faster Performance With New Boost Technology

The looming frequency scaling challenges can be addressed through a range of techniques, but AMD already has a new innovative technology that helps wring out the utmost performance from every core.

Just like the capabilities of each die harvested from a wafer will vary, every core on a chip has differing capabilities. Like all processors, AMD's chips come with a mix of faster and slower cores, but we discovered that the company uses an innovative technique to extract higher frequencies from the faster cores, which stands in contrast to the standard approach in the PC industry of adjusting to the lowest common denominator. We asked Papermaster about the rationale behind the new technology:

"There's typically a fairly small variation of the performance across cores," Papermaster responded, "but what we enable on our chips is the opportunity to boost and maximize the performance of any given chip. We're enabling these boost technologies to the advantage of our end customers, to make sure that we are optimizing power, yet delivering the best performance."

Does SMT4 Make Sense?

There have been persistent rumors and reports in the media that AMD will adopt SMT4, which involves enabling each core of the processor to run four threads as opposed to the standard dual-thread implementations. Knowing that AMD won't reveal direct information about its forthcoming chips, we asked Papermaster about his opinion of the technology coming to the desktop:

"We've made no announcements on SMT4 at this time," Papermaster responded. "In general, you have to look at simultaneous multi-threading (SMT): There are applications that can benefit from it, and there are applications that can't. Just look at the PC space today, many people actually don’t enable SMT, many people do. SMT4, clearly there are some workloads that benefit from it, but there are many others that it wouldn’t even be deployed. It's been around in the industry for a while, so it's not a new technology concept at all. It's been deployed in servers; certain server vendors have had this for some time, really it's just a matter of when certain workloads can take advantage of it."

Papermaster's Thoughts of Persistent Storage (Optane) on the Memory Interface

Intel lists Optane Memory among its technological advantages over its peers, but like all processors that use standardized interfaces, AMD's EPYC also supports Optane when used as a storage device.

However, Intel also offers its Optane Persistent Memory DIMMs that are used as memory after dropping them into memory slots. Intel has a proprietary interface that enables the functionality, so AMD's EPYC platforms don’t support the feature. We asked Papermaster about AMD's take on persistent memories, and if we could see similar DIMM support from AMD in the future using Optane memory from its ally Micron.

"Eventually, the way the industry is heading is to enable storage class memory to be off the I/O bus." Papermaster said, "That's where they really want to be because that's where it is more straightforward from the software stack to leverage these dense storage class memories (SCM). So, you're seeing an evolution there, you're seeing the industry working on SCM solutions. There's been a number of industry standards to align on that interface, and now CXL has taken off. We've joined it along with many other members of the industry, and so you're starting to see convergence on that interface for these types of devices. It's going to take a little time because they're going to have to get out there, and then the applications have to be tuned and qualified to run and really leverage this."

We dove in a bit deeper, asking if Papermaster thinks there is more interest in the industry for standards-based I/O interfaces (like NVMe) as opposed to using the memory bus, to which he responded, "I believe so. I think that's where you're really going to see SCM become pervasive in the industry."

Would AMD Adopt Intel's OneAPI?

Intel's OneAPI is a collection of libraries that enable programmers to write code that is portable between different architectures, thus allowing programs that run on CPUs to seamlessly transfer over to other architectures, like GPUs, FPGAs, and AI accelerators.

Interestingly, Intel recently announced that OneAPI will work with other vendors' hardware and that they are free to adopt the technology. We asked Papermaster if AMD would consider adopting OneAPI.

"We've already been on a heterogeneous software stack strategy and implementation for some time. We already released the Radeon Open Compute stack at a production level two years ago, so we have a path that is open and allows a very straightforward path to compiling workloads that are heterogeneous across our CPUs, our GPUs, and also interface with standards like OpenMP so you can then create high performance compute cluster capabilities."

"This is a path that we've already been on in AMD for some time, and we're glad to see the endorsement from our competitor that they see it the same way."

Is EPYC Sucking the Oxygen out of ARM?

As we've seen with Intel's recent shortages, a monopoly-like hold on the processor market isn’t good for pricing or sourcing stability. As such, the industry has long pined for alternative processors, but in reality, it really isn't searching for an x86 alternative. Rather, the industry wants an Intel alternative.

ARM and other architectures require expensive and time-consuming re-coding and validation of existing software, while AMD's EPYC Rome is plug-and-play with the x86 instruction set, thus reducing the additional expenses associated with moving to a different architecture.

Many have opined that EPYC Rome is sucking the oxygen out of industry interest in other architectures, like ARM, due to those advantages. We asked Papermaster for his take:

"x86 is the dominant architecture for computing today, and there's just such a massive amount of software code for x86, and such a massive toolchain that makes it easy for developers on this platform. So, we just see such a long and healthy opportunity, and frankly for AMD, with the strength of our roadmap, a tremendous share gain opportunity for us," Papermaster said.

"We're very focused on our strategy to ensure that every generation we have brings tremendous value to our customers, and in doing so, I do think it makes it harder for new architectures to enter. You'll see specialized applications that are less architecture-dependent. Because they’re specialized, they don’t care as much about that broad x86 base. So I do think, as you already see today, a small market for specialized architectures that'll continue, but we couldn’t be more excited about the future prospects for x86, and for our AMD roadmap in that market."

AMD to Support BFloat 16

The industry has broadly adopted the Google-inspired BFloat16, a new class of numerical format that boosts performance for certain AI workloads. The industry is inexorably shifting to AI-driven architectures, and large hyperscalers have signaled that they require hardware that supports the new format. Papermaster revealed that AMD would support BFloat16 in future revisions of its hardware.

"We're always looking at where the workloads are going. BFloat 16 is an important approximation for machine learning workloads, and we will definitely provide support for that going forward in our roadmap, where it is needed."

On a Personal Note...

In a turnaround of fortunes that hardly anyone could have predicted several years ago, AMD has taken the process lead from Intel and has an innovative architecture that is pressuring its competitor in every segment the company competes in. We asked Papermaster if he personally thought the plan would be this successful when it was laid out four years ago.

"We set out a roadmap that would bring AMD back to high performance and keep us there. It is independent of our competitors roadmaps and semiconductor node execution on 10nm. And we'll continue to drive our roadmap in that way. We called a play, we've been executing as we called it, and that's what you're going to see at AMD, just tremendous focus on execution. If we do that, then it is less about focusing on our competition, and about being the very best we can be with every single generation."

Papermaster has been at the helm of developing nearly all of AMD's newest technologies, so we asked what makes him the proudest about the turnaround:

"It's the team at AMD. The AMD commitment to win is unsurpassed. We're a smaller player in the industry, and the company as a whole just punches above its weight class, if you were to make a boxing analogy. It's so exciting to be a part of that team and to see that personal dedication, that willingness to really listen to customers, understand what problems they want solved, and then go to the drawing boards and innovate and really surprise the industry."

"And then the other piece I'm proud of is that focus on that execution. It’s the ability to be a street-fighter, and then focus and hunker down and execute and deliver what we promised."