This week Intel released their initial Meltdown and Spectre performance impacts. The ranges were essentially 0-25% in what is a complete gift to AMD. Unlike years past, Intel has a real competitor in the market with AMD EPYC which has more cores, more RAM capacity and more PCIe lanes in 1P and 2P configurations. Meltdown and Spectre are the hot topics of January 2018 and Intel has been providing updates on how the design flaw that led to the vulnerability impacts performance.

Now that Intel has initial performance results, we are going to sanity check them against Red Hat’s numbers and then explain why this is important for AMD EPYC.

Intel’s Initial Response to Meltdown and Spectre Design Flaw Performance Impacts

Before we get into this too deeply, we wanted to say that we are happy that Intel is showing that in most cases performance is going down with the patches. The first step in the process is acknowledging the slower performance. Here are Intel’s results:

Here is the source document with all of the testing specs.

Validating Intel’s Numbers With Red Hat’s Numbers

Our readers will notice that this somewhat jives which what Red Hat suggested it saw with the design flaw patches:

Measurable: 8-19% – Highly cached random memory, with buffered I/O, OLTP database workloads, and benchmarks with high kernel-to-user space transitions are impacted between 8-19% . Examples include OLTP Workloads (tpc), sysbench, pgbench, netperf (< 256 byte), and fio (random I/O to NvME).

Highly cached random memory, with buffered I/O, OLTP database workloads, and benchmarks with high kernel-to-user space transitions are impacted between . Examples include OLTP Workloads (tpc), sysbench, pgbench, netperf (< 256 byte), and fio (random I/O to NvME). Modest: 3-7% – Database analytics, Decision Support System (DSS), and Java VMs are impacted less than the “Measurable” category. These applications may have significant sequential disk or network traffic, but kernel/device drivers are able to aggregate requests to moderate level of kernel-to-user transitions. Examples include SPECjbb2005, Queries/Hour and overall analytic timing (sec).

Database analytics, Decision Support System (DSS), and Java VMs are impacted less than the “Measurable” category. These applications may have significant sequential disk or network traffic, but kernel/device drivers are able to aggregate requests to moderate level of kernel-to-user transitions. Examples include SPECjbb2005, Queries/Hour and overall analytic timing (sec). Small: 2-5% – HPC (High Performance Computing) CPU-intensive workloads are affected the least with only 2-5% performance impact because jobs run mostly in user space and are scheduled using cpu-pinning or numa-control. Examples include Linpack NxN on x86 and SPECcpu2006.

HPC (High Performance Computing) CPU-intensive workloads are affected the least with only 2-5% performance impact because jobs run mostly in user space and are scheduled using cpu-pinning or numa-control. Examples include Linpack NxN on x86 and SPECcpu2006. Minimal: Linux accelerator technologies that generally bypass the kernel in favor of user direct access are the least affected, with less than 2% overhead measured. Examples tested include DPDK (VsPERF at 64 byte) and OpenOnload (STAC-N). Userspace accesses to VDSO like get-time-of-day are not impacted. We expect similar minimal impact for other offloads.

Linux accelerator technologies that generally bypass the kernel in favor of user direct access are the least affected, with less than 2% overhead measured. Examples tested include DPDK (VsPERF at 64 byte) and OpenOnload (STAC-N). Userspace accesses to VDSO like get-time-of-day are not impacted. We expect similar minimal impact for other offloads. NOTE: Because microbenchmarks like netperf/uperf, iozone, and fio are designed to stress a specific hardware component or operation, their results are not generally representative of customer workload. Some microbenchmarks have shown a larger performance impact, related to the specific area they stress.

At the top of Intel’s table are Integer, Floating Point, Linpak and STREAM results which showed essentially a 0-1.3% impact. Red Hat put these into the category of “Small” impact with 2%-5% impact.

The Server Side Java workload is a SPECjbb2005 workload. Red Hat has this in the “Modest” 3-7% impact category so we see Intel and Red Hat diverge here.

For the “Energy Efficiency” benchmark Intel uses SPECpower_ssj2008 which is a good one for Intel to use here as it shows no loss from patches. This is not a benchmark Red Hat addressed.

Intel says fio will see 0-22% impact with fio while Red Hat says 8-19%. Red Hat also mentions that fio “results are not generally representative of customer workload.” This is a big deal for storage customers and the entire hyper-converged industry.

With that sanity check out of the way, let us look for a second at what Intel previously presented in terms of Skylake-SP performance.

Intel Xeon Scalable v. AMD EPYC

When we turn our attention to the Intel Xeon Scalable v. AMD EPYC numbers Intel published late last year, we see why this is a gift to AMD. Here is our full discussion of the benchmarks. First, we wanted to pull Intel’s third-party workloads:

When we look at SPECrate2017_fp_base, AMD was already ahead on this benchmark by a narrow margin. Adding a 0.8% performance decrease will slightly increase AMD EPYC leadership here. Likewise, on the SPECrate2017_int_base number, Intel will lose ground to AMD EPYC as it will see a 1.3% decrease in performance.

From that we are going to pull Intel’s summary chart of enterprise workloads from Intel’s internal testing:

The light blue bar shows the Intel Xeon Platinum 8160 compared to the AMD EPYC 7601 in dual socket configurations. They are in the same ballpark price wise. Note here that the Intel Xeon Platinum 8160 was exactly 1.00 in the Server Side Java benchmark. Conveniently, this is also an area where Intel claims no Meltdown or Spectre impact. Here is what is interesting, Red Hat says a modest 3-7% impact in this type of workload. If you use Intel’s estimate, it is still on part with AMD EPYC. If you use Red Hat’s number, then AMD EPYC is now ahead.

Intel suggests that the OLTP workload is about 4% lower performance. OLTP workloads tend to be the ones we are seeing, and Red Hat sees as a high-impact area. Still, if the Intel Xeon Platinum was 27% faster pre-patch, we would expect it to be no faster than 22% faster now.

The one we really want to see the results of is the “Storage bound workload” for the NoSQL database. Intel claimed a 27% lead over AMD EPYC but it is showing that some storage bound workloads using fio can see a 22% decrease. If Intel had a 22% decrease there, it would actually push Intel below AMD EPYC on that benchmark. Unfortunately, Intel did not release this result.

Final Words

The real takeaway here is that in Intel’s first set of performance impact numbers, it focused heavily on workloads that Meltdown and Spectre patches are not as meaningful. We want to see the Intel v. AMD EPYC chart again updated with Meltdown and Spectre patches. Until Intel produces this, it is open season for EPYC. We understand why Intel has not done this. Proper benchmarking takes weeks and the software and tuning sides are still being worked out. That is to be expected with impacts of this magnitude so we think it is prudent for Intel to hold off on publishing too many numbers. With that said, this is a game changer in the market.

Essentially all of the numbers produced for Intel v. AMD are going to be reset and that is great for AMD. AMD has a strong I/O story and it can now point to Intel’s numbers as another reference point as to why it has a better platform. It now has more PCIe lanes, more RAM capacity and more cores that Intel cannot point to as easily and say “our cores are faster” and dismiss.

As the performance impact evidence piles up, this is an absolute gift to AMD.

Feel free to share your perspective on the STH forums.