At Hot Chips this past week, AMD unveiled more details of the upcoming 12-core "Magny-Cours" processor that it hopes will help it stay competitive in the server game. Due in 2010 on AMD's 45nm SOI process, Magny-Cours uses the same basic core microarchitecture as the current Shanghai quad-core server processor, so if there's any improvement in per-thread performance it will have to come from better system design.

The basic idea behind Magny-Cours is simple: take two six-core Istanbul processors, downclock them a bit to reduce power, and squeeze them into a multichip module (MCM) so that they can fit into a single socket. By using an MCM, AMD will be able to fit 12 cores into the same thermal and power envelope as Istanbul.

Making this work requires a few tradeoffs, and one of them is the MCM itself. AMD had previously ridiculed Intel's use of an MCM for its first dual-core effort, the Pentium 4-based Smithfield, as not "true" dual-core. They repeated the charge with Intel's first quad-core, which was also an MCM. But with Nehalem cleaning up in per-core absolute performance, AMD is having to hustle to maintain a credible server presence, and part of that hustle is adopting the MCM strategy that it had formerly ridiculed.

For system architecture reasons, AMD's MCM picture is a little more complex than was Intel's, because each Istanbul chip has its own on-die dual-channel DDR3 memory controller, along with four HyperTransport links. Obviously, you can't push each chip's full interconnect bandwidth through a single socket, so AMD had to cut out some links.

The company's MCM 2.0 design has four total HT ports (two per chip) and four DDR3 memory ports (two per chip) on each MCM. For each individual chip, one of the links is x16 and another is x8. The two chips are connected inside the module by a x16 HT link.

Even with four HT links and four memory channels to keep the MCM fed, 12 cores is still a lot to pack into a single socket, and bandwidth starvation is a concern. To help alleviate the bandwidth pressure AMD's Istanbul made a very smart tradeoff in the form of HT Assist, and this tradeoff is carried over to Magny-Cours, where it's even more necessary.

One of the big challenges in multiprocessor system design is keeping the various processors' caches in sync with one another; solutions to this problem all involve some amount of communication among the processors, and this "snoop" traffic eats up valuable bus bandwidth. The solution that AMD has adopted with Istanbul and Magny-Cours involves setting aside 1MB of each chip's 6MB cache to store a directory of the contents of the other chips' caches, so that by consulting this local directory each chip can avoid broadcasting a significant number of traffic-increasing snoop requests to the other chips.

The reason HT Assist works is that die area is relatively cheap compared to bus and socket bandwidth, so any trick that lets you trade some on-die transistors for a boost in real-world bus bandwidth is a win. This is in fact the basic idea behind caches of all kinds, and HT Assist's directory is really just another type of cache. Per-socket bandwidth will become increasingly precious as the number of cores in each socket goes up, and as a result we're going to see multicore floorplans swing back to the way things were at the end of the single-core era, i.e., where a processor die is mostly a large chunk of very fast memory with some compute blocks embedded in it.

After Magny-Cours, AMD intends to keep upping the per-socket core count while maintaining backwards compatibility with Magny-Cours' and Istanbul's socket, power, and thermals.