At the ACM International Conference last week, researchers from Rice University, Singapore’s Nanyang Technological University, Switzerland’s Center for Electronics and Microtechnology, and UC Berkeley unveiled a microprocessor that’s designed to get the answer wrong — at least, some of the time. The idea of building microprocessors that deliberately allow for incorrect results has been kicking around for years; the prototype silicon demonstrated last week is the first time anyone has demonstrated the idea in native hardware.

The problems facing conventional CMOS scaling are well-documented — could PCMOS (Probabilistic complementary metal-oxide-semiconductor) provide an answer? As the name implies, it’s an approach that’s compatible with conventional CMOS manufacturing. The Rice paper claims a 15x power improvement over normal silicon — let’s take a look at why.

Nearly all of the problems facing conventional semiconductor scaling are rooted in the need to control variance, or noise. Noise, in this context, is defined as “unwanted additions to a signal.” It’s impossible to eradicate and literally universal; we measure the apparent age of the universe by measuring the distribution of cosmic background radiation. In semiconductor manufacturing, unwanted variance has become a major problem in multiple ways. Electrical noise forms a lower boundary to CPU voltage — below a certain threshold, the “On” signal that tells a transistor to flip from a 0 to a 1 becomes lost in the background. Thus, lower voltages can only be achieved in concert with lower noise levels — and achieving those levels becomes exponentially more difficult as process nodes shrink.

PCMOS flips (PDF) conventional thinking on its head. Instead of trying to generate and measure a distinct signal that stands out clearly from noise, why not measure the noise distribute and, from that, detect the presence or absence of a signal? Such an approach can only be effective if the system is designed to be capable of tolerating incorrect conclusions, but the power savings in such cases can be significant.

We borrowed from one of the simpler graphs to show the potential energy savings of a probabilistic design. The y-axis shows the amount of power required to flip a probabilistic switch, the value p denotes the chance that the switch flips accurately. The legend reflects base voltage and the type of workload being run.

At the far end, the graph demonstrates the “cost” of perfection. Ensuring that p always flips properly requires an input of ~1.2E^10-13, while flipping p at a 90% accuracy rate requires nearly a full order of magnitude less power. This is legitimately cool stuff, and we’re going to talk more about deployment and usage — but it’s not the fundamental breakthrough or Moore’s law savior it’s been portrayed as in some circles.

Perfection has always been expensive

Back in the early days of computing, if you wanted a dedicated hardware unit for handling floating-point math (math with a decimal point), you had to buy a separate chip that fit into a socket on the motherboard. Intel designed an FPU for both the original 8086 and the 286 that followed; these were dubbed the 8087 and 80287, respectively.

When it built the 80387, Intel elected to do something unusual — even though it took the company until 1987 (two years after the 386 launched) to bring the FPU to market. The 80387 FPU was the first x86 chip to be fully compliant with the IEEE-754 standard for floating point arithmetic. It’s a useful example in this case, because Intel was willing to go to significant trouble to design a chip that would meet the standard at a time when die space was at a much higher premium than it is today.

The x87 FPU provided (and still provides) single-precision (32-bit), double-precision (64-bit) and extended precision (80-bit) operating modes. What may surprise you is that by default, the x87 FPU used all 80 bits in order to guarantee sustained precision over many operations. One of the chief designers of the IEEE-754 standard, William Kahan, noted that the standard was designed to “to serve the widest possible market… . Error-analysis tells us how to design floating-point arithmetic, like IEEE Standard 754, moderately tolerant of well-meaning ignorance among programmers.”

One of the reasons CPU designers have pursued tighter tolerances and greater robustness is that it’s almost always cheaper to get the answer right the first time than it is to have to re-do the problem starting from scratch. Even simple, single-bit error correction adds a degree of complexity; true validation logic that performs independent analysis and verification of results generated elsewhere is extremely expensive and difficult to integrate.

The work being done at Rice won’t change that. It’s not going to re-write Moore’s law or fundamentally change the shape of CPU design. That doesn’t mean it’s not important.

Next page: When getting things wrong can be right…