New CPU architectures don’t come along very often — which is why more details on the Harmony Unified Processing Architecture being built by Chinese developer ICube are so interesting. Historically, instruction set architectures (ISAs) are risky bets. Not only are they exceptionally difficult to design, it takes an enormous additional effort to create tools that can leverage new capabilities. Even then, companies face an uphill fight to persuade vendors and software developers to recompile existing software to take advantage of the new design.

ICube is led by Fred Chow and Simon Moy. Chow is primarily a software designer and was chief architect of the Open64 compiler and the specific Pathscale iteration of that product, while Moy was a top-line engineer with Nvidia for seven years and worked on both the first GPUs as well as the G80. Details on ICube’s silicon are still limited, but the expertise of the two men helps shed a bit of light on what the chip looks like.

The Harmony Unified Processing Architecture (and the first iteration of that architecture, the IC1) are described as consisting of “the Multi-Thread Virtual Pipeline parallel computing core (MVP), an independent instruction set architecture, an optimizing compiler, and the Agile Switch dynamic load balancer.” Elsewhere, the chip is described as a “parallel computing stream processor core.” We also know, based on available literature, that the chip uses both SMP (Symmetric Multi-Processing) and SMT (Simultaneous Multi-Threading).

VR-Zone describes the chip as an “elegant 32-bit RISC core, not unlike the original MIPS.” The IC1 implements 4-way SMT; each core can operate on up to four threads. The UPU approach means that execution resources, memory space, and register data is shared across the entire chip — there’s no such thing as a “CPU workload” versus a “GPU workload.”

The IC1 is designed for handheld and mobile devices and runs Android. The company’s efforts in this area could be seen as the “other” arm of China’s initiative to develop its own competitive CPU architectures. Much of the research to date has focused on the country’s Loongson/Godson-3 processors, which can be found in China’s homegrown supercomputers, but these are chips intended for mainstream PC form factors and homegrown supercomputers. ICube’s IC1 gives China a homegrown alternative for building its own phones and devices rather than being beholden to foreign companies for hardware.

Where are the x86 versions?

In AMD’s case, on the way — but not for a few years. Intel’s plans on this front are less clear. Larrabee, Intel’s onetime GPU project that became the basis for the Knights Corner Many Integrated Core (MIC) co-processor, was a CPU-GPU hybrid. There’s no reason Intel couldn’t eventually integrate a MIC-style design alongside a conventional CPU architecture.

The question of whether or not AMD and Intel would ever adopt a homogeneous approach to CPU and GPU calculations is interesting — but we’re inclined to think they wouldn’t. The entire reason GPUs evolved in the first place is that it makes more sense to do certain types of work with specialized architectures.

Shrinking process technology may have made it cost efficient to reintegrate those functions on-die, but no one has yet designed a traditional x86 CPU that delivered high-end GPU performance. It simply may not make sense to do so.