Throughput and per-thread performance

The Qualcomm Centriq 2400 processor, based on the Qualcomm Falkor CPU, QDT’s own Armv8-based custom CPU core design, delivers leading-edge aggregate performance, as shown by SPECint_rate20062 score estimates. These scores are based on the open source gcc compiler, using -O2 flags, consistent with how cloud developers compile their own code3.

Many cloud applications require real-time responsiveness, necessitating single-thread performance while the machine is running multiple threads at high utilization. For this, the single-thread SPECint_2006 benchmark is not the relevant choice, as it measures performance when the machine is at its minimum loading. Instead, we looked at the aggregate performance of the machine using SPECint_rate2006, and dividing by the number of hardware threads active — a reflection of the single-thread performance of any individual thread when the server is operating at its design point of maximum multi-threaded performance. By that metric, the Qualcomm Centriq 2400 has not only reached high aggregate performance, but it has done so without compromise on per-thread performance.

Many CSPs require predictable performance to meet their customer demands and SLAs. The specified peak frequency for the Qualcomm Centriq 2400 family is independent of the number of cores that are active. This means that CSPs can minimize performance variability as more cores are switched on to handle increased load.

Power efficiency

The Qualcomm Centriq 2400 delivers better performance per watt than competing x86 server processors4. We’ve taken a typical Qualcomm Centriq 2460 processor and run SPECint_rate2006, measuring the average power for each sub-test. All tests ran at the full 2.6 GHz peak frequency. As a first-order view, the average (both mean and median) power of those measurements was 65W. Running the same test on an Intel Xeon Platinum 8176, which has similar SPECint_rate2006 performance when compiled with gcc -O2, the power we measured was significantly higher — running at 100% of its 165W thermal design power (TDP) and burning over 2.5x as much electricity for similar performance!

Another important metric is the processor TDP, as servers will be designed based on the specified TDP. Stepping back from the highest bin parts, we can compare the Qualcomm Centriq 2452 processor with the Intel Xeon Gold 6152. Using SPECint_rate2006 performance divided by TDP, the Qualcomm Centriq 2452 has 33% better performance per watt. Looking at the inverse, with racks typically limited in power capacity, that translates to a significant increase in the amount of compute capacity that can be packed into a rack. (Actual increase depends on server overhead power, server utilization, and rack capacity, among other things.)

Idle power is also an important metric for many datacenter customers, as unnecessary power draw during idle periods can result in significant energy consumption costs over the period of an infrastructure’s useful life. The Qualcomm Centriq 2400 family delivers extremely low idle power. We’ve measured power during OS idle at 8W even when the deepest idle state is limited to C1 in order to minimize idle exit latency. With deeper idle states enabled, measured power plummets to below 4W, using Qualcomm Centriq 2400’s fast power collapse with hardware save/restore logic. In environments where server utilization is low, this combination of low power during both active and idle states translates to significant energy savings and a much greener datacenter.

Total cost of ownership

The biggest factor in the TCO of running a datacenter, however, is the acquisition cost of the servers, and the processor is one of the most expensive components on the server. The Qualcomm Centriq 2400 processor delivers a phenomenal performance-per-dollar. With a list price5 of $1,995, the 48-core Qualcomm Centriq 2460 processor delivers 4X better performance-per-dollar versus Intel’s highest-performance Skylake processor, the Intel Xeon Platinum 8180. With a list price of $1373, the 46-core Qualcomm Centriq 2452 processor offers 3X better performance-per-dollar versus Intel Xeon Gold 6152. And, with a list price of $888, the 40-core Qualcomm Centriq 2434 processor offers 2X better performance-per-dollar versus Intel Xeon Silver 41166.

Qualcomm Centriq 2400 delivers many other key benefits for the cloud, such as quality of service management, in-line memory bandwidth compression, and secure root of trust at the silicon level, which we detailed here and here.

Driving an open ecosystem

Driving an open ecosystem around the Qualcomm Centriq 2400 processor is a critical pillar of our strategy. To us, open ecosystem means embracing open standards and collaboration with hardware, software, and system vendors. Through these collaborations, we’re delivering best-of-breed solutions for our customers to deploy on Qualcomm Centriq 2400 processors.

Over the past few years, the Arm-based processor ecosystem has made tremendous progress in enabling server software for the cloud. Most open source software is already available on Arm-based server processors. Foundational software such as firmware, operating systems, compilers, virtualization and containers is supported on Arm processors, and infrastructure software such as language runtimes, databases (NoSQL and SQL), web front end, data analytics, and orchestration is also supported on Arm processors.

Key cloud workload targets

With leading-edge performance, innovative features, and an open ecosystem, the Qualcomm Centriq 2400 family is optimized for cloud native workloads. Workloads that are a good fit for Qualcomm Centriq 2400 processors include web front end, NoSQL databases, big data analytics, content delivery networks, video and image processing applications, image recognition, health-and life-sciences applications, and software defined NVMe storage farms. At our launch event today, we’re demonstrating many of these cloud workloads running on Qualcomm Centriq 2400 processor based servers.

In optimizing for cloud workloads, there is understandably a set of workloads that we are not currently targeting. Some traditional enterprise IT workloads that don’t scale with cores fall into this category. A good example here would be transactional databases that use scale-up servers to be able to handle large databases.

Summary

We’re excited about bringing to market the world’s first and only 10nm server processor. Qualcomm Centriq 2400 delivers exceptional throughput performance, leadership performance-per-watt and performance-per-dollar, and drastically shifts the economics of ownership and operation for cloud datacenter operators. We’re looking forward to continuing to work with our customers and partners to drive further innovations into datacenter infrastructure.