AMD's Godfrey Cheng has a post on the company's blog, saying that AMD's upcoming CPU + GPU "Fusion" chips will not, in fact, kill the venerable and much-loved discrete GPU.

Cheng claims that he's frequently asked if AMD's "APU" plans—where APU stands for "Accelerated Processing Unit," and is AMD's term for a chip that includes a CPU and GPU on the same die—will result in the eventual demise of discrete graphics. It's plausible that he's asked this quite a bit, because this has been a common misconception since the idea was first floated with the AMD/ATI merger.

Cheng's detailed answer is quite good, and it's likely that anyone from Intel or NVIDIA would've written substantially the same thing. In a word, no, the discrete GPU will not die out because its particular arrangement of vector hardware and memory represent a very efficient use of transistors for certain types of very important workloads. Some variant of that arrangement will always be better for those workloads than a general-purpose CPU, which is why Intel's erstwhile Larrabee project was aimed at pushing x86 a bit further in that direction.

(Of course, Larrabee's hardware didn't go far enough in the GPU direction, and the repeated hardware delays meant that the software guys couldn't close the rest of the 3D game performance gap between Larrabee and a traditional GPU within the delayed launch timeframe... but that's another story.)

So, in the medium-term, the fusion strategy—in both its Intel and AMD incarnations—will result in an expanded palette of computing options. The range of hardware configurations at price/performance/power points appropriate for different tasks—gaming, supercomputing, desktop 3D rendering, mobile computing—will grow, and finding an optimum configuration to fit a specific task will be a bit easier.

But in the long term, we still can't help but wonder if the GPU won't get absorbed back onto the main processor die, just like all other math and image coprocessors have before it. Maybe this time around, however, it's the CPU that will get absorbed.

What if the GPU eats the CPU, instead?

Discrete GPUs currently sit on a daughtercard at one end of a PCIe bus, and the reason this bus doesn't currently act as a bottleneck for most 3D gaming and rendering is that the GPU is surrounded by a very large pool of fast GDDR DRAM. This pool of fast DRAM essentially makes a discrete graphics card its own system-within-a-system and, because it's so self-contained, it's able to limit the PCIe traffic between the CPU/DDR and GPU/GDDR. This is an awkward, inefficient, expensive arrangement, but it's currently the best one if you want raw performance.

From a system design standpoint, it would be much cheaper and more efficient to ditch the daughtercard entirely, and put all of the compute hardware, both scalar (CPU) and vector (GPU), into a more tightly coupled arrangement. This is essentially AMD's pitch for APUs in supercomputing—that the tighter CPU/GPU coupling will make Fusion APUs a better fit for high-performance computing (HPC). It's not entirely clear that this will, in fact, be true for HPC any time soon, but the cost and efficiency advantages of this arrangement can hardly be denied.

But what about performance? At some very high transistor count, a future Fusion part could also end up with a raw performance advantage, as well. Specifically, it may turn out to be the case that few workloads really benefit from more than four cores, and most of those that do will run better on GPU hardware.

If this happens, then why not put those four CPU cores on a high-end GPU? In other words, in a world where Moore's Law continues to drive transistor counts up but where exceeding four CPU cores offers rapidly diminishing returns vs. a four-core + GPU combination, the best arrangement would seem to be one that looks essentially like a large GPU with four CPU cores attached to it.

Thinking about the ultimate x86 gaming system of 2015, a processor that combines four general-purpose CPU cores with a massive amount of GPU vector hardware and cache sounds ideal. With this arrangement, the relative amount of die area that goes to those four CPU cores can shrink as the (infinitely scalable) cache and vector hardware grow with transistor counts, to the point where you ultimately end up with a "GPU" that has four little CPU cores embedded in it.

Of course, you wouldn't be able to physically turn on all that hardware at once, so dynamic power optimization would be key to making such a part work. But in terms of cost, efficiency, and raw performance, it would probably beat the pants off of a 12-core x86 chip + discrete GPU combination for games and most of the other tasks people care about.