After more than two decades with Hewlett Packard Enterprise, Darren Cepulis came to Arm in 2013, becoming part of the small cadre of people working in the chip designer’s high-performance computing (HPC) business. As Cepulis – now the datacenter architect and HPC segment manager for Arm – recalls, there were a couple of researchers, one or two software developers and himself. The company had made its bones for decades designing small, power-efficient CPUs for such devices as cell phones and tablets.

Around the time Cepulis came to Arm, the company had already begun the push to get its processor architecture into the datacenter and chip away at Intel’s long dominance in servers, hoping to take advantage of the growing importance among enterprises of energy efficiency in their datacenter systems, with many making it as high a priority as performance. The goal was to become the top alternative to Intel’s x86 Xeon processors, an initiative that has yet to become fully realized.

The company also saw an opportunity in HPC, where processors from Intel and Advanced Micro Devices were being joined by GPU accelerators from Nvidia and AMD. Things in HPC were changing and that opened up the space for Arm, though it had a long way to go, Cepulis says.

“When we first came in, there was no clear idea even about the ecosystem we needed to have in place for a successful deployment,” he tells The Next Platform this week in an interview at Arm’s TechCon show in San Jose, Calif. “That was the biggest thing, wrapping our arms around that. Also, understanding what we had in terms of silicon partners and how we needed to engage with them, understanding the different strategic sites and what the ecosystem priorities and demands were, both bare minimum and what we needed to be successful. Also, five years ago, there wasn’t really any [Arm] silicon that was HPC-applicable or on par with existing offerings from other entities out there. From a software ecosystem standpoint, there wasn’t much of a view and we had a lot of gaps. Now you fast-forward to now and you see some actual deployments.”

Arm is beginning to see some fruits from a half-decade of work of trying to muscle into the HPC space, as The Next Platform recently outlined. There’s Astra, a system from Hewlett Packard Enterprise built for Sandia National Labs and based on the OEM’s Apollo 70 systems, running on more than 5,000 Arm-based ThunderX2 chips from Marvell. It was the only Arm-based system to make it onto the most recent Top500 supercomputer list, coming in at 156. Fujitsu is continuing to build what will be Japan’s first exascale supercomputer, Fugaku – the successor to the K system – that will be powered by the system maker’s own Arm-based A64FX, complete with 52 cores, and is expected to be read to go in 2021.

Europe continues to be a friendly place for Arm in HPC. The Barcelona Supercomputing Center’s Mont-Blanc project including multiple generations of pre-exascale prototypes powered by Arm chips, with the most recent one an Atos BullSequana supercomputer running ThunderX2 chips. The Catalyst UK project – which includes the likes of HPE, Marvell, SUSE and Mellanox – will deliver Arm-powered HPC clusters to three UK universities and the French Atomic Energy Commission has a BullSequana systems. The UK also houses Isambard, the first Arm-based production system based on Cray’s XC50 systems.

Others are on the way. There might not be many production supercomputers out there right now, but there are a lot of test systems, Cepulis says.

“If look back just one year, there wasn’t a lot of HPC hardware out there that was in people’s hands,” he says. “If you fast-forward just twelve months, we have testbeds everywhere and that enables people to get a feel to how their software will run on it and builds the confidence that they need, so when the next real RFP or real deployment comes around, they can potentially be bidding Arm-based platforms. Also, we have multiple silicon partners in play, with Marvell and with Fujitsu and others potentially coming down the pipe.”

Right now, choices in silicon are somewhat limited. Marvell’s ThunderX2 is the only real commercial Arm-based server chip, with some system makers – Fujitsu being an example – building their own processors atop the Arm architecture. Other chip makers, like startup Ampere, are also developing Arm server chips, though it’s uncertain when it will reach any real volume scale in the market. Other chip makers that had at one time looked to build Arm server processors, including AMD, Broadcom, and Qualcomm, eventually backed off, with AMD opting instead to focus its efforts on developing the Epyc chip portfolio based on the company’s Zen microarchitecture. This strategy has clearly worked for AMD, and has hindered the Arm collective that was, to a certain degree, banking on AMD not getting its Epyc chips out the door and competing well even if it did.

A Strategic View Of HPC

When Arm first began eyeing the HPC space, it was seen as a key business pillar, separate from other infrastructure areas like telecommunications, hyperscale environments and the cloud, he says. Arm referred to it as “strategic HPC.”

“We call it strategic in that we’re not trying to tackle everything HPC in the world,” Cepulis says. “There’s a lot in the enterprise spaces and things and you can go into a lot of different areas with that. We’re focused on those strategic sites – government-driven, public-driven sites – where they care about ecosystem and collaboration and co-design, things that Arm is traditionally good at working and building and collaborating on.”

That collaboration includes both with other groups within the company as well as with other tech vendors – hardware and software alike – and HPC sites, like the national labs and other research facilities. There is a research group within Arm that helps with building outside relationships, and the collaboration also can be seen as the demand for HPC capabilities – tools chains and optimized libraries, for example – come from outside traditional areas, such as Arm’s automotive unit, he says.

The HPC group “runs like a lot of the other business units,” Cepulis says. “The one thing about HPC is that you see HPC sort of creep across all of the areas. You look at automotive, you can call that embedded HPC but it’s still HPC. You see the same thing in the edge or even the mobile space, where they’re starting to take advantage of certain optimized libraries and things like that that will speed up their own different applications. It can be things like machine learning and artificial intelligence even coming along. Some of the ML and AI wants to play on a big HPC-like system, but lately it seems like how much can you push that out towards the edge – how much can we put that on the edge device or the mobile devices – so there is a lot of overlap in terms of ecosystem and focus as things go forward.”

The HPC space is a good one for Arm to play in, he says. The organizations tend to be comfortable embracing new technologies and are always looking for more choice in the products they use. Silicon partners also have done well in improving their offerings, such as optimizing memory, throughput and performance. In 2016, Arm introduced the Scalable Vector Extension (SVE) to the Armv8-A architecture to improve parallelism and performance, all good things for HPC workloads. In April, the company unveiled SVE2, extending the benefits beyond HPC and into such areas as the edge.

“If you look at HPC from a business standpoint, it is been a fairly commoditized sort of thing,” Cepulis says. “It’s been traditionally two-socket Xeon boxes and maybe you throw in a bunch of GPUs and that can only get you so far and it’s not clear you can get to exascale just doing that. You’ve got to do some additional things like what Fujistu is doing. They took a ground-up design that was targeting large-scale HPC, ML [and] AI spaces and just executed quite well in that regard. That’s a good example of where I think we’ll see good success and our partners will have to keep working toward that same densities and speeds as well.”

The Growth Of Arm HPC

Growth in the company’s HPC group has gone hand-in-hand with growth in its presence in the HPC space. During weekly calls with the HPC group, there can be as many as 50 people on the line, Cepulis says. At the SC supercomputing show in New Orleans in 2014, Arm sent six people. For the upcoming SC19 coming up next month in Denver, Arm will have about 40 at the event.

He expects the business to continue trending up. Thanks to its dominance in the mobile chip space, the company has seen 150 billion Arm-based chips shipped over the years, so the volume is there. However, when the company began its push into HPC, it was still two manufacturing nodes behind Intel, sitting at 28 nanometers compared with Intel’s 14 nanometers. However, Arm architecture has accelerated while Intel’s has stalled, so as some Arm partners are coming out with 7nm chips and are looking at 5 nanometers, Intel is still trying to get to 10 nanometers. Much of that innovation was fueled by the mobile side of the business, but the HPC business “can draft on that,” which benefits end users, Cepulis says.

“Based on our architecture, our cores tend to be smaller,” he says. “You can drive a much better compute density if your cores are a fifth the size of an X86 equivalent core. You can put a lot more in a particular die, so you can drive compute density there. It’s those synergies. We didn’t actually build that, but we’re certainly going to leverage that and our partners are going to leverage that so that their next-generation designs can come in at 7 nanometers or 5 nanometers, depending on where their own roadmaps show up, and that’s a huge advantage in terms of performance-per-watt and things like that and how much compute you can fit on a die. There’s a lot of work going on that can be leveraged and is very much applicable across the board.”