As the barriers to CPU scaling have risen with each successive node shrink, the number of scientists looking for alternate methods of driving higher performance and/or saving power has also steadily risen. We recently covered three of the most intriguing areas for boosting compute performance, including adopting new methods of CPU cooling, new semiconductor manufacturing technologies, and the use of entirely new types of CPU cores. One other idea — and something we’ve touched on before — is the idea of using circuits that are intentionally manufactured to be inexact and imprecise; circuits that are deliberately allowed to get things wrong, some of the time.

That’s exactly the opposite of how computers are typically built. Semiconductors today are manufactured to tolerances of a nanometer or less, fab air quality is controlled to the point where contaminants are measured in parts-per-trillion, and we’re working on building chips using wavelengths of light just 40nm wide. But it’s precisely because manufacturing to such tight tolerances is so difficult that scientists are working to find ways to build chips that can handle failure gracefully — and in some cases, even embrace imprecision.

Christian Enz, the new Director of the Institute of Microengineering, believes that the time for “good enough” is now, and is pushing research into the new field. According to Enz, “the ‘good enough’ approach has been getting some traction in the corporate sector, because chipmakers can’t see any real alternative. Intel, for example, is interested in ‘good enough’ engineering. In addition, there are teams of research scientists working on it all over the world.”

Perfect imperfection

First, the good news. If an application can tolerate “3.14” as opposed to “3.14159265358,” you can save quite a bit of power in computation. In some cases, you may be able to improve performance by leaving off significant figures, though this is highly application dependent. Power savings is the major goal — and simplified circuit design. Today, a great deal of work is done to ensure that circuits will return the proper results every single time. Since chips are extremely complex and defect densities are notoriously difficult to control, engineers compensate with additional circuitry that adds die size and reduces the performance and power consumption benefit of moving to a smaller process node.

Start jumping into processor manuals, and you quickly find evidence that this process doesn’t always work properly. Here’s a few examples, drawn from the CPU manuals of analyst Agner Fog:

Intel’s Ivy Bridge prefetches one instruction every 43 cycles. Sandy Bridge prefetches two instructions per clock cycle.

AMD claims Bobcat and Jaguar can decode up to 32 bytes per clock cycle. In reality, the two chips top out at 16 bytes per clock cycle.

Intel’s 45nm Atom, which debuted in 2008, has always had an FPU bug. Two consecutive FPU instructions fail to pair and instead are executed with a delay of one clock cycle between each instruction.

AMD’s Bulldozer and Piledriver handle 256-bit AVX instructions very poorly. Piledriver has a latency of 17-20 cycles when storing 256-bit AVX instructions.

And these are just a handful of the high level problems. Both Intel and AMD errata manuals contain pages of documentation on even lower-level bugs that occur in every version of a chip.

Next page: How to build an intentionally imprecise computer chip