We recently had a chance to sit down with Intel, talk about the company’s upcoming Xeon Phi/Many Integrated Core (MIC) hardware, and discuss how the card is a vital step along the path to exascale computing. Intel isn’t ready to disclose the card’s full technical specifications, but a number of scientific and research institutions have released papers detailing their experience with Knights Ferry, Xeon Phi’s predecessor. Just so we’re clear on terminology, MIC is an architecture, Knights Ferry and Knights Corner are specific implementations of that architecture, and Xeon Phi is the brand name under which KNC will be sold.

From Larrabee to Knights Ferry

Intel’s MIC (pronounced “Mike”) began as a hybrid GPU/HPC (high-performance computing) product known as Larrabee. Intel officially announced Larrabee in 2007 and soon claimed that the card would usher in a new era of ray-traced video games and incredible performance. Intel eventually shelved its GPU ambitions once it became clear that Larrabee wasn’t going to be able to match the high-end hardware then available from Teams Green and Red, and rebranded Larrabee as an HPC-only part. The new design was dubbed Knights Ferry (KNF), and Intel began shipping it out to HPC developers in 2010.

So how much of Larrabee is left in Intel’s MIC? It depends on where you look. All of the display hardware and integrated logic necessary to drive a GPU is gone, but the number-crunching capabilities of the cores themselves appear largely unchanged. One difference we know of is that while Larrabee and KNF focused on single-precision floating point math, the upcoming Knights Corner will offer strong double-precision support as well. Compare Larrabee’s block diagram, above, with Knights Ferry, below.

A Knights Tale

So let’s talk about Knights Corner/Xeon Phi. Xeon Phi ramps up Knights Ferry; Intel isn’t giving many details yet, but we know the architecture will pack 50 or more cores and at least 8GB of RAM. In this space, total available memory is an important feature. Knights Ferry, with its 32 cores and max of 2GB of RAM, could only offer 64MB of RAM per core; a 50-core Xeon Phi with 8-16GB of RAM would offer between 163-327MB per core.

It’s logical to think Intel’s core counts and RAM load will vary depending on yields and customer needs. Customers with large regular datasets might see better scaling from a 50-core chip with 16GB of RAM, while small data sets might do best with an 8GB card and 64 cores. The layout of the Aubrey Isle die at the heart of Knights Ferry, pictured above, makes a 64-core target chip a strong possibility, with varying numbers of cores disabled to improve yields.

The cores at the heart of Intel’s first Xeon Phi are based on the P54C revision of the original Pentium and appear largely unchanged from the design Intel planned to use for Larrabee. Despite some squabbling from Team Green, we recommend not conflating the phrase “based on,” with “hampered by.” Intel returned to the P5 microarchitecture for Larrabee because it made good sense to do so — but Knights Corner isn’t a bunch of early 1990s hardware glued on a PCB. Intel has added 64-bit support, larger on-die caches (the Pentium Classic never had an on-die L2, or an L1 with 1-cycle latency), a 512-bit, bi-directional ring bus that ties the entire architecture together, advanced power consumption management circuitry, and 32 512-bit vector registers. It’s the latter that give Xeon Phi its oomph; A top-end Core i7 today has 16 256-bit AVX registers.

From a computational perspective, calling Knights Corner a “modified Pentium” is like calling the starship Enterprise a modified Space Shuttle. The updated P54C core is better thought of as a launch gantry; it’s the framework Intel used for creating something new, not the vehicle itself.

Is Knights Corner x86 compatible? Mostly — or, perhaps more accurately, it’s x86-compatible enough. Intel’s software blog states the following: “Programs written in high-level languages (C, C++, Fortran, etc.) can easily remain portable despite any ISA or ABI [application binary interface] differences. Programming efforts will center on exploiting the high degree of parallelism through vectorization and scaling: Vectorization to utilize Knights Corner vector instructions and scaling to use more than 50 cores. This has the familiarity of optimizing to use a highly-parallel SMP system based on CPUs.”

There are a handful of x86/x86-64 instructions, including a few fairly common ones, that KNC won’t support. The vectorization and scalar instructions that KNC/KNF introduced are also unique — KNC doesn’t support traditional SIMDs like MMX, SSE, or AVX… yet. That “yet” is important, because it’s virtually guaranteed that Intel will cross-pollinate its instruction sets at some point in the future. The Transactional Synchronization Extensions (TSX) set to debut in Haswell might be extremely useful for Knights Corner’s successor.

Next page: But will Xeon Phi perform?