AMD and Intel are both taking to the stage at this week's Computex to talk up their plans for the next major turn of Sutherland's wheel of (graphics) reincarnation, a turn that happens at 32nm for both companies. At this point, graphics processing moves back onto the same die as the processor, although it still keeps most of the specialized hardware that characterizes its less-integrated incarnations. (A full turn of the wheel, when graphics hardware becomes more fully generalized and less distinguishable from general-purpose CPU hardware, is still a bit further off.)

AMD gave a demo of its upcoming Llano part, which will boast a fully DirectX 11-compatible GPU integrated onto a 32nm CPU die come early 2011. Llano will be part of AMD's first wave of 32nm parts, and in this respect AMD lags Intel significantly, with the latter being on 32nm already. Llano represents the first real "Fusion" product for AMD, but it appears to be fusion in its crudest form—take one GPU and one CPU, and put them on the same piece of silicon, along with a memory controller and some I/O.

AMD's Fusion slides, reproduced here at Engadget (I saw a similar presentation at the Netbook Summit last week), frame the GPU part as an example of heterogeneous multiprocessing—a large array of small vector cores alongside a small number of much larger general-purpose cores. But the fact that these vector cores are non-x86 and will look to the OS like an ATI GPU indicate that this really is a GPU that has been put on the same die. (Until the vector cores and the general-purpose cores can run the same code, all this heterogeneous chip multiprocessing stuff is fancy talk for "we put a CPU and a GPU on the same die.")

(As an aside, what's most remarkable to me about Fusion is how little the big picture has changed since 2006. Everything about what they've announced so far has been in the cards since shortly after the ATI acquisition, and while the dates have been pushed back on AMD's roadmaps, my initial analysis of Fusion still stands.)

Intel is taking a similar approach with Sandy Bridge, which is its second-generation family of chips on 32nm (the "tick" in its tick-tock model). Also due out in early 2011, Sandy Bridge will combine Intel's first native 32nm microarchitecture—supposedly a major advance vs. Nehalem—with a GPU and a northbridge. Architectural details are scarce for both the CPU and GPU sides of Sandy Bridge, but Intel is making big claims for performance boosts in both components.

A major repartition

For both AMD and Intel, GPUs will move on-die in budget and mobile clients first, because those segments are more cost-sensitive and less performance-sensitive. In other words, the market wants cheaper, not necessarily faster. But in the case of the transition to Sandy Bridge and Llano, the market will actually get both.

For the first time in the history of the mainstream x86 client, the CPU, GPU, and memory controller will all live on the same die. This is important, because it's going drive up bandwidth and drive down latency, so that the CPU and GPU will both benefit from a closer coupling between both one another and main memory. This should give both Intel's and AMD's mobile platforms a real boost compared to the current generation.

Right now, Intel's current 32nm client platform is a bit of a downgrade from the 45nm server platform in at least one respect, because the latter benefits from a superior system architecture—the memory controller is on the die with the CPU, and a discrete GPU can be tightly coupled as a coprocessor by virtue of the fact that it's connected directly to the CPU socket. The mobile architecture, in contrast, is a standard CPU + Northbridge/IGP architecture of the kind that we've had years, but with both the CPU and Northbridge/IGP in the same package and socket. As a result, the Westemere client platform doesn't get the same memory latency and bandwidth advantages as the Nehalem platform, because it doesn't have an on-die memory controller.

But when the memory controller moves back onto the CPU die and takes the GPU with it, that's going to give an instant, one-time boost to the overall platform's performance and efficiency. In the case of both Intel and AMD, this boost should be large enough that, if you can hold off on your next laptop upgrade until next year, you should. These kinds of discontinuities, where a major, disruptive repartitioning of the standard system architecture drives a one-off performance boost, are quite rare. They're worth holding out for if you can manage it.

Intel's Mooley Eden has claimed that Sandy Bridge will bring greater than a 4x boost to performance, all at once. As with all claims about unreleased hardware, this is to be taken with a grain of salt, but the fact that Eden is even willing to put these claims out there is indicative of the impact that will come from having all of those components under one roof. (Some of that boost is also supposed to come from various unspecified yet allegedly dramatic microarchitectural tricks. More details on that when we get them, though.)

NVIDIA left out? Not quite.

When the music stops and the GPU and CPU land on the same die, you might think that there's one company that will be left without a chair. I'm talking, of course, about NVIDIA. But interestingly enough, what technology taketh away, it can giveth back—and NVIDIA will probably get a new lease on life.

Sandy Bridge is rumored to have an on-die PCIe controller (2.0 in mobile, 3.0 in desktop), and this will be perfect for discrete graphics, both mobile and desktop. If it does turn out that there is on-die PCIe across segments with Sandy Bridge, then Intel has just rolled out the red carpet for NVIDIA to come in with a solid GPU-as-coprocessor that doesn't need much in the way of Optimus-style trickery to provide a very efficient performance boost. Indeed, for my money, a Sandy Bridge + discrete NVIDIA combo will be the premium mobile platform to beat next year, and I personally can't wait until it comes to the MacBook Air (my laptop of choice).

AMD could surprise us though, because the platform repartitioning will give AMD the chance to do some good system-level engineering and really integrate its CPU/GPU combo with its discrete GPU in ways that are only feasible when both the CPU and discrete GPU maker are one and the same.

Between Llano and Sandy Bridge, 2011 will be a great year for laptops, so if you can hold out until then, you'll be rewarded with a major upgrade over whatever you're carrying now.