At STH, we firmly believe that alternative architectures help spur the technology industry’s innovation. The new entrant for 64-bit Arm servers is the Cavium ThunderX2. We have long held that Cavium is the only vendor publicly selling Intel Xeon alternative. With the ThunderX2 there is now a dual socket capable 64-bit Arm CPU that has up to 32 cores and 128 threads in each socket. The Cavium ThunderX2 that we see today has its origins in the Broadcom Project Vulcan and so many of the features we saw in Cavium ThunderX, such as 40GbE ports, are not present. Instead, we have an Arm chip that can go toe-to-toe with Intel and AMD and come out ahead in some cases. Best of all, the list price of the 32 core top-bin CN9980 part is $1795 about half of the competitive Intel and AMD chips.

In this article, we are going to take you on a comprehensive journey exploring different aspects of the Cavium ThunderX2. We are going to look at how the ecosystem and platforms have evolved and why ThunderX2 is usable by a broader set of organizations than previous generations. There is a set of performance benchmarks where we explore how the 256 threads in a dual ThunderX2 system performs well against Intel Xeon and AMD EPYC. We also have a few numbers exploring the SKU stack in terms of 24, 28, 30 and 32 core versions. Finally, we are going to end with a look a power consumption and the competitive landscape. Suffice to say, grab a cup of coffee and dive in.

Previous Cavium ThunderX2 Pieces

We have covered the ThunderX2 for some time so for our readers, the launch of ThunderX2 may seem like déjà vu. Here is a sample of ThunderX2 coverage on STH to date:

Cavium ThunderX2 CPU and SKU Stack

While the original ThunderX was a BGA design, the Cavium ThunderX2 comes in both BGA and LGA form factors. The impact is tangible. It can be deployed soldered onto motherboards as with previous generations or now as a socketed part. That has major advantages for the supply chain as it allows a server OEM to stock platforms then socket ordered CPUs as needed. With this generation, Cavium has a SKU stack that can interchangeably utilize a standard socket. Here is a picture of that socket:

Just to give a sense of scale, here the ThunderX2 package is alongside four of the most popular x86 package types today: AMD EPYC, Xeon Scalable, Xeon E5-2600 V3/ V4 and Xeon E5-2600 V1/V2.

One can see that AMD and Intel have components on the bottom of the socket and a very interesting pin/pad layout. The square bottom of a ThunderX2 package is just a large pad grid.

When taking the above photo, one thing became clear. ThunderX2 it is significantly slimmer than its counterparts and we are using the 32 core ThunderX2 in this photo.

Cavium ThunderX2 Features, SKUs and Specs

The Cavium ThunderX2 is very competitive with both Intel Xeon Scalable and AMD EPYC 7000 series parts in terms of performance, but also in terms of features. Here is the key features overview of the Cavium ThunderX2:

There are 32 cores per socket and up to 128 threads. Unlike competitive Arm development chips, ThunderX2 is a dual socket capable design and indeed, we tested a dual socket server with a total of 64 cores and 256 threads. Cache is 32KB L1, 256KB L2 per core and then 32MB distributed L3 cache. Cavium also has a 600Gbps interconnect (CCPI2). Interconnects are hard, especially with multi-socket designs. That is a key feature that separates ThunderX2 engineering from some of the single-socket only Arm options.

Memory bandwidth is excellent with up to 8x DDR4-2666 memory controllers which is equivalent to AMD EPYC and more than Intel Xeon Scalable. These memory channels even support RAS features and NVDIMMs.

PCIe support is for up to PCIe 3.0 x16 slots with a total of 56x PCIe 3.0 lanes. One can bifurcate the PCIe lanes down to x1 and there are a total of 14 PCIe controllers for system vendors to utilize. Other features like SR-IOV are supported which helps maintain parity with the x86 ecosystems.

PCIe is a big deal since it allows for the platform to be utilized with high-speed devices like GPUs, FPGAs, NVMe SSDs, and high-speed networking. This level of connectivity puts the ThunderX2 squarely between the AMD EPYC and Intel Xeon Scalable lines which is a major achievement in itself.

In terms of actual launch SKUs, the list Cavium has around 40 SKUs ranging from 16 to 32 cores and sent us specs for five ranging from 24 to 32 cores and 96 to 128 threads. Cache comes in at 32MB per chip.

If you want to see the full SKU stack and its positioning relative to Intel Xeon Scalable, from Cavium’s point of view, here is the current SKU stack we were provided with:

We are going to go into how each Cavium ThunderX2 core can handle 4 threads. The performance of the chip and the power consumption soon. The old adage that more is usually, but not always better holds true here as does the saying “TDP does not equal power consumption.” We are going to get to that, but we are going to first set the stage in terms of the context behind why the Cavium ThunderX2 is the most important Arm data center release this year.