By Kevin Krewell, principal analyst, Tirias Research 0

SAN FRANCISCO — This year’s processor session at the ISSCC led off with two presentations by AMD (for the first time) followed by presentations from Samsung and MediaTek talking about their latest 5G smart phone chips, a research project/proof of concept design from CEA technology, an automotive system on chip (SoC) from Texas Instruments (TI), and the latest IBM Z series mainframe processor.

And because this conference is primarily a circuit design conference, each vendor focused on one or more specific aspects of circuit design that was unique in their processors.

The International Solid-State Circuit Conference (ISSCC) is one of the longest-running technical conferences in the semiconductor industry; it takes place every February here. The conference has a mix of academic and industry participants to discuss the latest challenges in chip circuit designs.

This year’s conference covered a deep vein of topics that included phase lock loops, low power circuits, memory, SerDes, DSP, and processor design. In particular, the processor section has the leading vendors appearing, but there are also projects from research institutes and academia. The sessions are jam-packed with dense chip design details. This is a highlight of the more interesting details from the processor session.

AMD Zen 2 and EPYC chiplets

Two AMD sessions dovetailed with each other with a discussion of the design of the Zen 2 CPU core used in the newest EPYC server processor, and a discussion of the EPYC chiplet architecture that allowed AMD to deliver 64-CPU cores in one socket without a massive die. The chiplet design also allowed AMD to apply three die designs to a plethora of products and markets.

The AMD Zen 2 presentation described the challenges of making the first x86 processor using TSMC’s 7 nm process. The design goals for the EPYC server processor was to double the number of CPU cores within the same socket, without exceeding the socket power envelope. In addition, each CPU core was designed to give a 15% instructions-per-cycle performance uplift on the SPECint 2006 benchmark. Many of the architectural changes in Zen 2 have been discussed before. In the ISSCC talk, AMD focused on the circuit design challenges.

The AMD design is very modular. The basic element is the CPU Complex (CCX) with 4 CPU cores, L2 and L3 caches, and an Infinity Fabric system interconnect. With a 4-core module, AMD could scale the design from notebooks (4-8 cores) up to servers (with up to 64 cores). The CCX module was shrunk from 44 mm2 in the prior generation to 31.3 mm2 in Zen 2, despite adding more L3 cache.

The 7 nm process design required adding more metal levels. As a result, there were changes to the metal layer routing rules, and the design migrated from 10.5 tracks to 6 tracks. The lower track number offered challenges (less height and less drive strength) but had the benefit of lower leakage, reduced capacitance-per-cycle by 9% and produces a smaller die area.

AMD used a variety of design techniques such as clock shaping and had five different flip-flop designs, which were important for critical timing loops. The designers also moved more of the power budget to combinatorial logic by 3%, to get more performance. With these and other circuit optimizations, AMD could increase clock speed up to 4.7 GHz and lower the operating voltage when run at clock speed comparable to the original Zen core.

The second AMD presentation described the changes in AMD’s chiplet strategy for Zen 2-based server products. One of the key benefits for AMD was that with just three die tape outs, AMD could build products to support multiple markets. There were also thermal benefits to using chiplets as the chips were spread out across the package.

AMD’s goal was to deliver substantially more performance per socket and the result was double the number of CPU cores in the second-generation EPYC processor. This puts AMD on track to double performance every 2.5 years (SPECint 2006). The new EPYC processor also achieved improved memory latency. Using chiplets let AMD build server chips that were not feasible nor economical in a monolithic chip as it would have hit reticle limits to get to 64 cores.

AMD also optimized the cost structure and improved die yields by using much smaller chiplets. AMD used the expensive 7 nm process for the Core Cache die (CCD) and moved the DRAM and PCIe logic to a 12 nm I/O die fabricated by GlobalFoundries. Each CCD is composed of two CCX modules with four Zen 2 cores and the L2 and L3 caches, where 86% of the CCX is dedicated to CPUs and L3 cache. Each CCD still needs to be a mini SoC that includes power management, Infinity Fabric system interconnect, clocks, etc., on die.

With all these requirements, there were many challenges. With the memory controller now centralized on a separate chip for all CCX modules, the new EPYC processor could achieve improved average memory latency. But, the best-case latency still required going off the CCD to access memory. As a result, the AMD design focused on reducing the Infinity Fabric latency, so the best-case latency was only 4 nanoseconds longer.

Because AMD committed to keeping the EPYC package size and pinout unchanged, there needed to be a close silicon/package co-design as the number of die increased from four in the first EPYC to nine in the second-generation EPYC. The routing paths were very tight and required routing signals under inner CCD chiplets to reach CCD chiplets further away from the centralized I/O die.

Many of the other ISSCC talks featured the circuits that compensate for internal voltage dips (droop) when the processors are under heavy load. AMD has a current shunt — extra current — to fight droop and can also stretch clocks. The same low dropout (LDO) design allowed individual core linear regulation that enables power savings by adapting the voltage to each core’s capability.