The International Solid-State Circuits Conference (ISSCC) is this week, and it’s a time when companies and researchers meet to discuss cutting-edge advances in semiconductor technology. Intel is giving several presentations at the conference this year, with new details on the future of low-power computing and some previously unknown information about Haswell CPUs.

Haswell: The devil’s in the details

When Intel launched a version of Haswell with 40 GPU execution units and 128MB on-package EDRAM, codenamed Crystal Well, it played coy with many of the details. Die size, clock speed, and organizational structure were all swept under the rug — until now. We now know that the Crystal Well EDRAM is a separate (but on-package) 77-square-millimeter chip clocked at 1.6GHz with a 1V operating voltage. The interface between the CPU/GPU and Crystal Well is called the OPIO (on Package I/O) and it’s a simple, flexible design that Intel has deployed in two forms. On Haswell-ULT (ultra-low power) chips, the OPIO link is a 4GB/sec bridge between the on-die Platform Controller Hub and the rest of the core. When deployed alongside Crystal Well, the OPIO can transfer 102GB/s — at a nominal cost of just 1W of power.

Other disclosures the company made confirmed some of our speculation from a year ago. When Intel announced that Haswell would have an on-die voltage regulator, we speculated that the FIVR (Full Integrated Voltage Regulator) was a step Intel took in order to speed its transition time from idle to full load and back again. 0W has become the new 1GHz — the faster a chip can move in and out of idle, the more horsepower it can bring to bear on specific tasks and the more power it can save in the transitions.

As Anandtech reports, our speculation on this front appears to have been correct. FIVR is highly efficient (90% under load) and can enter/exit sleep in 320 nanoseconds and clock to full Turbo in just 100 nanoseconds.

The final Haswell-specific tidbits are the die sizes that Intel has finally revealed for its various parts.

The quad-core GT3e parts (that’s the full 40-core EU combined with 128MB of memory) are fairly large, at 260mm sq for the CPU/GPU, and an additional 77mm sq for the memory. A conventional quad-core with a GT2 design (20 EUs) is just 177mm sq. Eyeballing the 4+2 core against the 4+3 core, it’s obvious that Intel has given up some transistor density to hit its integration targets at the top of the stack.

The “Iris Pro” GT3e configurations have only shipped in a handful of systems to date, but we expect to see the core debut more widely with the advent of Broadwell. Intel may not be pursuing AMD’s goal of HSA and full GPU integration, but moving to a smaller process node will still give it more die space to devote to boosting GPU performance — and Broadwell is expected to be a considerable leap forward compared to Haswell.

Long-term efficiency jumps

One continuing research area for all semiconductor companies is the issue of power efficiency. With conventional voltage scaling no longer moving at anything like pre-2005 levels, companies like Intel and AMD have created increasingly sophisticated clock gating and power management systems to ensure that total system power is kept as low as possible. In 2012, Intel showcased Claremont, an Intel Pentium design implemented in 32nm that used Near Threshold Voltage technology to drastically reduce its operating power.

Today, Intel showcased some of its efforts in this area since Clairemont debuted. The company has designed a graphics core that’s capable of 2.7x the gigaflops-per-watt efficiency of a conventional GPU core, while maintaining a peak performance advantage of 1.4x GFLOPS/W. In other words, the chip is 2.7x more efficient at standard operating voltages at 1.4x more efficient at the peak operating voltage (presumably this is closer to modern-day Turbo Mode.)

These technology papers are delivered in dry, technical prose, but they touch on concepts that are vital to the long-term future of computing. If wearables are ever going to become more than semi-functional curiosities, they need CPU cores that can run 2-3x longer and perform far more work in the same envelope. Building exascale-level supercomputers and simulating the human brain in real time requires circuits far more efficient than anything we’ve built to date.

The iterative improvements debuting in 2014 with products like Broadwell or AMD’s Beema and Mullins platforms may seem a far cry from the advances predicted by papers like these, but there’s a direct link between the two that stretches back years. The technologies unveiled in 2014 will be the technologies that make exascale computing possible by the early 2020s.