In a move that could have broad implications for the high-performance computing (HPC) market, Intel and Cray have announced a broad collaboration that will see engineers from the two companies work together on future products and projects.

With the first Intel-Cray products appearing in the 2010-2011 timeframe, it's clear that three Intel technologies have caught Cray's eye: the native 32nm Sandy Bridge microarchitecture, the QuickPath Interconnect (QPI) scheme, and the forthcoming discrete, x86-based graphics product, codenamed Larrabee. Cray will plug all of these components into its SeaStar interconnect fabric, and when combined with Cray Linux they'll make for an HPC and floating-point monster.

The first piece of the Intel-Cray puzzle is Sandy Bridge, which will debut in 2010 as Intel's first native 32nm processor. Unlike its predecessor, Westmere, which is a 32nm die shrink of the 45nm Nehalem, Sandy Bridge will be designed from the ground up for the 32nm node. One of Sandy Bridge's most widely talked-about features is a brand-new set of 256-bit vector extensions called Advanced Vector Extensions (AVX). In addition to vector registers that are double the size of the current 128-bit SSE registers, AVX will also introduce a nondestructive three-operand format for the first time in the x86 ISA's history. (For more on the operand format issue, see either Chapter 8 of my book.)

In all, AVX's vector length and new operand format will massively boost the peak theoretical floating-point performance of Intel's microarchitecture, making it a vector beast that can eat up as much bandwidth as Cray can throw at it. (Note that AltiVec coders are turning their nose up at AVX, but it still seems like it has to be an improvement. I hope to take up this issue and others in an AVX vs. Larrabee Vec-16 vs. SSE5 article at some point.)

Speaking of bandwidth, Intel's QuickPath Interconnect, with debuts with Nehalem later this year, will be firmly established by the time Sandy Bridge drops. QPI will give Cray the ability to do with Intel's processors what it already does (and will apparently continue to do) with AMD's, that is, plug them into the company's high-bandwidth interconnect fabric, SeaStar, via a bridge chip.



Cray XT3 nodes with SeaStar interconnect and AMD64/HyperTransport hardware

The graphic above is taken from a Cray XT4 brochure, and it shows two Opteron nodes attached via HyperTransport to Cray's 3D SeaStar interconnect fabric. An Intel-based Cray supercomputer will probably look similar, but with QPI swapped for HyperTransport and Sandy Bridge and/or Larrabee processors swapped for Opterons.



Cray SeaStar node

The HyperTransport interface in the SeaStar router node shown above would be swapped with a QPI interface, an effort that would require some engineering help from Intel and is likely to be the first place that the two companies collaborate.

Moving on to Intel's Larrabee, which is also slated for the 2010 timeframe, Cray CEO Peter Ungaro told TGDaily that Cray expects to integrate the many-core GPU product into its Intel-based supercomputers.



Intel's Larrabee

What's not clear is the method that Cray would use to integrate Larrabee into its products. The most obvious option would be to put Larrabee-based PCIe daughtercards into some of the Sandy Bridge nodes via a PCIe-to-QPI bridge. Another, possibly cheaper and more power-efficient option would be available if Larrabee supports QPI; in this case, Larrabee coprocessor sockets could be dropped right into the QPI fabric along with Sandy Bridge processors. Intel hasn't said, however, if Larrabee has a QPI interface or not; it's clear from an earlier leaked slide that it does have a PCIe interface, though.

The reason for including Larrabee in the new Cray supercomputers is pretty straightforward: each Larrabee core has a 512-bit vector floating-point unit that can grind through twice as many FLOPS as Sandy Bridge. The combination of Sandy Bridge's 256-bit vectors and Larrabee's 512-bit vectors will make for a very potent x86 vector processing machine, and it's the kind of thing that NVIDIA is already trying to get out in front of with its escalating verbal war against Intel CPUs.

One thing that worries some developers as they look at Larrabee and Sandy Bridge is that they'll have different vector extensions, and both of these will be different from AMD's forthcoming SSE5 extensions. There are no details yet on how Intel will get AVX and Larrabee's 512-bit extensions to work together, but I'm sure that whatever they have in mind involves this EXOCHI "accelerator exoskeleton with an IA 'look-n-feel'" that I haven't quite figured out yet.

For their part, NVIDIA and AMD aren't standing still in the HPC space by any means. Word recently got out that NVIDIA Tesla GPUs will play a prominent role in an upcoming French supercomputer, and by the time 2010 rolls around those Tesla GPUs will look a lot like Larrabee in that they'll have a fairly robust and flexible ISA that exposes a ton of vector floating-point resources.

AMD is also moving ahead with its HPC-oriented GPGPU plans, and the company already has a well-established place in Cray's XT3 and XT4 supercomputers that is likely to persist. Still, Sunnyvale can't be happy with Intel moving in this close on a piece of critical turf in the HPC arena that has been so kind to Opteron.

Further reading