A project we work on with every new processor generation is to build a comparative dataset around intra and intergenerational CPU performance. Today we are providing some initial benchmarks of the Quad Intel Xeon Platinum 8180 configuration since we wanted to start with the top end of the Intel Xeon Scalable Processor Family. Over the next few days, we will release plenty of additional data. We are also working to run some of our larger benchmarks and also a comparison to AMD EPYC, when those systems are ready. While you can buy this Quad Intel Xeon Platinum 8180 configuration today (July 11, 2017) from several vendors, AMD EPYC is weeks or months away from availability.

Since this is STH after all, we wanted to get some numbers up for launch week.

Test Configuration

Our test configuration used an Intel platform.

CPU(s): 4x Intel Xeon Platinum 8180 28 core/ 56 thread CPUs (112 cores/ 224 threads total) with 2.5GHz base and 3.7GHz turbo clocks, 38.5MB L3 cache each

Platform: Intel S4PR1SY2B

RAM: 768GB in 24x SK.Hynix 32GB DDR4-2666 2RX4 DIMMs

OS SSD(s): 1x Intel DC S3700 400GB

OSes: Ubuntu 14.04 LTS, Ubuntu 17.04, CentOS 7.2

This is the platform that Intel sent us for review. For those wondering, the max power we saw on this system was 1336W on our 208V lab racks.

The numbers we have should be comparable to what you will see with a quad Intel Xeon Platinum 8180M setup as the clock speeds are the same. The slight difference may be that one would be using a different memory configuration with that SKU which may have a minor impact on performance.

Quad Intel Xeon Platinum 8180 Benchmarks

For our testing, we are using Linux-Bench scripts which help us see cross platform “least common denominator” results. We are using gcc due to its ubiquity as a default compiler. One can see details of each benchmark here. We are already testing the next-generation Linux-Bench that can be driven via Docker and uses newer kernels to support newer hardware. The next generation benchmark suite also has an expanded benchmark set that we are running regressions on. For now, we are using the legacy version that now has over 100,000 test runs under its belt.

Python Linux 4.4.2 Kernel Compile Benchmark

This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read.

The quad Intel Xeon Platinum 8180 setup is an absolute beast here. It easily bests the fastest Broadwell system we tested, the quad Intel Xeon E7-8890 V4 Dell PowerEdge R930.

c-ray 1.1 Performance

We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads.

C-ray 1.1 generally fits within caches so it will scale just about as fast as one can throw cores at it. Here is the key takeaway: we are adding an 8K resolution to the next batch of testing. The quad Intel Platinum machines obliterate what we had as a “hard” test in 2012. What a difference five years make.

7-zip Compression Performance

7-zip is a widely used compression/ decompression program that works cross platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.

In terms of compression performance, one can see that the quad Xeon Platinum machines perform well as is expected. There is a nice bump over the Xeon E7-8890 V4 generation.

NAMD Performance

NAMD is a molecular modeling benchmark developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. More information on the benchmark can be found here. We are going to augment this with GROMACS in the next-generation Linux-Bench and are building a dataset of those results for future publication.

In terms of NAMD, it is not a huge surprise that the fastest chips perform the best. We added the 8x Intel Xeon E7-8870 V2 results in there just to show the impact for those looking to upgrade. One can get by with half of the CPU sockets of just three generations ago.

Sysbench CPU test

Sysbench is another one of those widely used Linux benchmarks. We specifically are using the CPU test, not the OLTP test that we use for some storage testing.

We left in an older Nahelem result in this set along with some Sandy bridge results. If you were using a lower-end quad E5-4620 V1 machine, we are at the point where you can consolidate multiple machines onto dual socket Xeon Platinum and Gold.

OpenSSL Performance

OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:

Here are the verify results:

The bottom line here is simple, we are seeing a fairly massive improvement in OpenSSL speeds. These tests were done without using an onboard Intel QuickAssist PCH. In this generation, PCH capabilities can greatly enhance some OpenSSL performance.

UnixBench Dhrystone 2 and Whetstone Benchmarks

One of our longest running tests is the venerable UnixBench 5.1.3 Dhrystone 2 and Whetstone results. They are certainly aging, however, we constantly get requests for them, and many angry notes when we leave them out. UnixBench is widely used so it is a good comparison point.

And whetstone:

We left the single thread results in here just to show how comical it is getting. Remember, the Intel Xeon Platinum 8180 is a 224 thread machine. At some point, single threaded performance matters, but TCO business cases are going to be made largely on consolidation. Having more threads helps.

Single Redis Instance Benchmarks

We unleash a single Redis instance for these benchmarks and generate set/ get requests against the instance. This is more of a frequency plus memory bandwidth bound workload rather than a CPU speed bound result.

Lots of memory bandwidth help the Platinum 8180 stay on top of these results. The speeds are fairly well grouped. We will be using this as a base for one of our multi-application Docker tests in the next-generation Linux-Bench.

Inter-Socket Latency with Intel Memory Latency Checker

We did want to touch upon one hot topic, especially in light of our recent NUMA piece with Intel and AMD. We are going to have more on this soon with dual socket results and AMD EPYC results. Here is a teaser of what one can expect:

Putting this into perspective, Intel actually has inter-socket idle latency that is better than our AMD EPYC 7601 system with DDR4-2400 is currently putting out. That is a phenomenal result and will help explain some of the performance findings we have later.

Final Words

Overall, this system is a beast, and it should be. The list price for this configuration is likely around $50,000 and up, so it is not inexpensive. On the other hand, if you need a scale up node, perhaps due to licensing costs, this system is hard to beat. Even with the 28 core die Intel is able to raise the TDP of the chips to 205W and maintain very respectable clock speeds.

In the near future, we are going to have several other performance data points on some of the larger applications we test. The above should provide at least a relative sense of performance on Intel’s top end 4-socket Skylake-SP configuration.

You can read more about the new chips at our Intel Xeon Scalable Processor Family launch coverage headquarters.