Intel has finally unveiled Xeon Phi (codenamed Knights Corner), a range of more-than-50-core 22nm coprocessors built with the new Many Integrated Core (MIC) architecture. According to industry sources, Xeon Phi made such an impression at last week’s International Supercomputing Conference (ISC) that it has stolen numerous upcoming 100-petaflops supercomputer installations away from Nvidia’s Tesla coprocessor.

Xeon Phi is the end result of a project that began with the Larrabee architecture in 2006. Larrabee was initially meant to be a GCGPU, much like Nvidia’s Fermi and Kepler cores, but based on the x86 instruction set. Larrabee was eventually scrapped in 2010, but Many Integrated Core emerged from its ashes — and this time, MIC would simply be a high-performance computing accelerator. The graphics guts were stripped out, and all that remained were 50+ Pentium 1 (P54C) cores, with the addition of some juicy floating-point and vector processors. Intel has confirmed that each MIC core, like Larrabee, has a monstrous 16-wide ALU capable of 512-bit SIMD.

These Xeon Phi coprocessors (which come in a PCIe add-in card form factor) will be available in a few flavors, probably starting at 50 cores and with 8-16GB of GDDR5 RAM. Intel is targeting real-world performance of 1 teraflops per coprocessor, which is well above the Tesla M2090 (a Fermi-based card) and AMD’s HD 7970. The key difference, though, is that Xeon Phi uses the mature and very-well-understood x86 architecture, and is supported by Intel’s best-in-class compiler toolchain. Nvidia’s Kepler-based Tesla cards might be faster than 1 teraflops — but that’s theoretical performance. The fact of the matter is that writing and compiling software to effectively use hundreds of CUDA cores is incredibly hard.



And therein lies the crux: Xeon Phi might not have the edge on raw performance, but it’s infinitely easier to deploy. The vast majority of current HPC installations use Intel or AMD x86 chips and software. Moving to CUDA or OpenCL is hard, expensive, and time-consuming work. According to VR-Zone, Xeon Phi is apparently so desirable that it has replaced Tesla as the coprocessor of choice in many upcoming 100-petaflops supercomputers, due for completion in the next few years. The same sources told VR-Zone that porting code to Intel’s MIC architecture took days; while Nvidia’s CUDA took months.

A 100-petaflops x86 supercomputer would have somewhere in the region of 80,000 Ivy Bridge Xeons and 80,000 MIC Xeon Phis, for a total of around 5 million x86 processor cores — and, assuming the Phi has a TDP of around 200 watts, a power draw of 25 megawatts. Compared to the current world’s fastest computer, IBM Blue Gene/Q Sequoia, which draws 7.9 megawatts at 16 petaflops, a Xeon Phi installation would be incredibly power efficient.

The Intel Xeon Phi is expected to be commercially available towards the end of the year. Technically, you’ll be able to plug one into your desktop’s PCIe slot — but it will be incredibly expensive, and probably not worthwhile unless you’re into modeling nuclear explosions at home.