Researchers at North Carolina State University have developed a technique to take advantage of the "fused architecture" emerging on multicore CPUs that puts central processing units and graphics processing units on the same chip. The technology, called CPU-assisted general purpose computation on graphics processor units (CPU-assisted GPGPU) uses software compiled to leverage the architecture to allow the CPU and GPU to collaborate on computing tasks, boosting processor performance on average by more than 20 percent in simulations.

The approach, outlined in a paper by NC State Associate Professor of Electrical and Computer Engineering Dr. Huiyang Zhou, Ph.D. candidates Yi Yang and Ping Xiang, and AMD GPU Architect Mike Mantor, is designed for fused architecture chipsets with a shared L3 cache and shared off-chip memory for CPUs and GPUs. The approach developed by the team leverages the computing power of the GPU, while taking advantage of the CPU's more flexible data retrieval and better handling of complex tasks.

The current generation of hybrid CPU/GPU systems, including Intel's "Sandy Bridge" and AMD's "Llano" has helped create more energy-efficient systems and reduce manufacturing costs, Zhou said. "However, the CPU cores and GPU cores still work almost exclusively on separate functions. They rarely collaborate to execute any given program, so they aren’t as efficient as they could be. That’s the issue we’re trying to resolve.”

GPUs are obviously designed for handling graphics, but they are also very good at handling large numbers of parallel processes, particularly in applications where the same process needs to be applied to large amounts of data. Traditionally, one of the the biggest problems when using GPUs for general purpose computing has been that they don't handle complex, branchy, pointer-heavy code very well at all—which is the strength of CPUs. The long pipelines of most GPUs instead favor sequential, streaming reads, and applications where there's a high ratio of arithmetic operations applied to data relative to the amount of data that has to be moved to and from memory. Hybrid chips like Sandy Bridge have less main memory bandwidth than typical discrete GPUs (albeit with lower latency), so keeping the fast level 3 cache filled with data is essential if developers want to avoid starving the GPU of data.

CPU-assisted GPGPU uses the CPU's faster L3 cache pre-fetching to feed data to the GPU, cutting out performance drags that come with GPU code accessing memory. A program compiled for CPU-assisted GPGPU launches a "pre-execution" program at startup on the CPU to pre-fetch data to be processed by GPU code and load it into the level 3 cache onboard the chip. That allows process threads running in the GPU to hit the L3 cache directly, rather than fetching from memory, reducing latency and significantly boosting performance. In some cases, the performance of simulated applications improved by up to 113%, the researchers claimed.

Why simulated? AMD's current hybrid processor, the Llano, lacks a shared L3 cache, so it won't support the approach. And Intel's Sandy Bridge offers only limited GPGPU functionality. In a phone interview with Ars, Dr. Zhou explained that in theory the research could be applied to Intel's current Sandy Bridge architecture, which provides a shared last-level cache for CPUs and GPUs in its architecture. But he said that Sandy Bridge's GPU "isn't that powerful" and Intel's current software support "doesn't include support for OpenCL and other GPGPL stuff." However, he said, he expects that the hardware support for CPU-assisted GPGPU applications will be in upcoming generations of hybrid platforms from both Intel and AMD, and software support will follow. And, he added, "it’s already assumed that the GPU (in Intel's Ivy Bridge processors) will be much more powerful than Sandy Bridge."

Real World Technologies editor David Kanter said that he expects to see "a lot more work in this area, as engineers and researchers must improve performance significantly, while maintaining or reducing power consumption." But he noted that there wasn't information in the research about the power consumption impact of the technology. Zhou said that the research hasn't yielded any hard numbers on what the power consumption impact would be.

Zhou said that his team's research had been funded by grants from the National Science Foundation and AMD, and was just the latest collaboration with Mantor. But the research up until now has been fundamental scientific research, and he couldn't say how it might be commercialized by AMD or Intel.