This is the moment that all PC hardware enthusiasts have eagerly awaited for nearly half a decade. AMD today has officially pulled back the covers on its brand new high performance Zen CPU microarchitecture, its features, specs and most importantly performance. That's right, this is the very first time that the company has actually demonstrated the performance of Zen directly against its competition from Intel.

This all took place in a private media event that the company held in San Francisco. At which AMD's President & CEO Lisa Su took the stage alongside Chief Technology Officer Mark Papermaster to talk all about Zen. So what exactly did they have to say? to summarize it in two words, a LOT!

Zen, AMD’s Most Important Product In More Than A Decade

Many Years In The making

Zen has been one of AMD's most eagerly anticipated products for as far as I can remember. It’s the company’s first attempt to compete at the high-end, enthusiast, CPU market since the introduction of the Bulldozer microarchitecture five years ago. Zen breaks new ground for AMD in many ways. It’s the company’s first ever CPU architecture to feature simultaneous multithreading. It’s also the very first product for AMD to be built on a process technology that's very close to parity with Intel since the days of the original Athlon more than a decade ago.

This fact alone is huge. It means that for the very first time since the early 2000s AMD’s CPU products won’t be at an inherent disadvantage due to Intel’s process lead. From an architectural point of view Zen is a brand new clean-slate design that’s been led from the get-go by accomplished CPU architect Jim Keller. The very same engineer that played a pivotal role in designing the original Athlon XP and Athlon64 processors. The most competitive CPU products in the history of the company.

Zen is AMD’s biggest long-term technology bet and one of the company’s largest engineering efforts undertaken by the company. President & CEO Lisa Su stated that this year’s products, culminated in Zen and Polaris, represent company’s most competitive roadmap in more than a decade.

AMD President & CEO Lisa Su – Q4 2015 Earnings Call



“We remain focused on completing our strategic work around three key growth pillars. First, in PCs, even in a declining overall market, we believe we can regain client compute and discrete graphics share for the year, driven by gaming, VR, commercial, and our most competitive product roadmap in more than a decade.

We have clear opportunities to regain GPU share in 2016 based on the performance per watt of our new GPUs and software leadership. Earlier this quarter at CES, we announced our new Polaris GPU architecture, which we expect to begin shipping in the middle of 2016.”



The microarchitecture taped out back in 2015 and is already sampling. Consumer based desktop products are on track to be available en mass in 2017. However, we know that AMD is working on far more than just high performance desktop CPUs. The company has had 32 Core Zen server CPU, a sixteen core Zen HPC APU and a quadcore Zen consumer APU all in the works for several years.

The Zen Microarchitecture

Below we have a visual representation of Zen's high-level design from AMD. Interestingly enough it looks very much like our very own in-house diagram that we had published last year. The integer cluster in each Zen core has six pipes, four ALUs, Arhithmatic Logic Units, and two AGUs which is short for Address Generation Units.































These AGUs can perform two 16-byte loads and oine 16-byte store per cycle via a 32 KB 8-way set associative write-back L1 data cache. According to AMD the move from a write-through to a write-back cache has noticeably reduced stalls in several types of code paths. The load/store cache operations cache in Zen also reportedly exhibit lower latency compared to Excavator.





The floating point unit is capable of performing two FMAC operations or a single 256-bit AVX operation per cycle. Exactly as we had detailed in our exclusive architectural deep-dive last year funnily enough.

AMD's First Microarchitecture To Feature Simulataneous Multithreading

AMD has done away with the CMT - clustered multi-threading - concept that was introduced with the Bulldozer family of cores in 2011 in favor of a more traditional SMT - sumultaneous multi-threading - design. This means that each Zen core will be able to execute two threads simultaneously. A principal very high throughput thread and a secondary thread that can be used opportunistically.

In contrast, each Bulldozer module can execute two identical threads. This is achieved through two separate integer clusters with a single front-end. This approach saves area versus building two separate cores and delivers two high throughput threads. However, there are advantages that Zen's SMT implementation holds over the Bulldozer CMT implementation. For one it allows AMD to build a single larger integer cluster with significantly higher single threaded performance. Another advantage with this approach is that it leaves a lot of wiggle room for clever savings in area and power.

A Drive For Power Efficiency

CPU Microarchitecture AMD Phenom II / K10 AMD BD/PD AMD SR/XV AMD Zen Intel Skylake Instruction Decode Width 3-wide 4-wide 8-wide 4-wide 4-wide Single Core Peak Decode Rate 3 instructions 4 instructions 8 instructions 4 instructions 4 instructions Dual Core Peak Decode Rate 6 instructions 4 instructions 8 instructions 8 instructions 8 instructions

A lot of the engineering effort around Zen has also been done to address one of Bulldozer's major flaws. Bulldozer and Intel's Sandy Bridge - and subsqeuent Intel architectures including Skylake - had equally deep pipelines to achieve high clock speeds. The deeper the pipeline the more latency that the design will exhibit. Particularly when it comes to branch misprediction errors, which are quite common in such pipelines.

The latency that results from branch mispredicts are quite significant. To combat this issue Intel introduced a micro-op cache with Sandy Bridge. It worked to a great extent in reducing mispredict penalties and was believed to be the principle reason behind Sandy Bridge's significant single threaded performance advantage over Bulldozer. The good news is that AMD has confirmed that it's finally introducing its own micro-op cache with Zen. Another thing that we had pointed out in our architectural deep dive on Zen last year.



On the front-end each Zen core is capable of decoding four instructions per cycle, which are fed to the operations queue. The micro-op cache along with the queue have a throughput of six operations per cycle going into the schedulers.

The final result is similar overall throughput when we look at both treads of each SMT core vs both threads in each CMT core. This evidenced by the leaked Zen benchmarks that we've seen recently. However, Zen delivers significantly higher single threaded performance. Furthermore, because each Bulldozer module houses two integer clusters and a single floating point unit it was always very integer heavy. Each Zen core on the other hand includes one large integer cluster and one large floating point unit making it a much more balanced design.

The Zen Microarchitecture In A Nutshell

AMD's brand new Zen core features a significantly wider execution engine than anything we've seen before from the company. Leveraging simultaneous multi threading and a micro-op queue to boost throughput and single threaded performance. This combined with a brand new, low latency cache sub-system and a new set of pre-fetch algorithms result in a dramatic instruction per clock improvement and doubling of throughput per core compared to AMD's previous 8 Piledriver FX 8300 series CPUs.

High Level View:

Two threads per core

8 MB shared L3 cache

Large, unified L2 cache

Micro-op Cache

Two AES units for security

14nm FinFET Transistors

AMD : Zen Outperforms Intel's High-End Broadwell-E CPUs

Zen Performance Demo vs Intel's $1000 Broadwell-E i7 6900X

During the event AMD treated the audience to the very first public, real-world performance showdown featuring Zen and a contemporary competing Intel high-end enthusiast class CPU. The demo involved two similarly specced PCs one configured with an eight core, 16 thread Zen engineering sample clocked at 3.0Ghz and the other configured a with an eight core, 16 thread Intel Broadwell-E processor also clocked at 3.0Ghz.

Both systems started the same Blender render session at the same time and the Zen CPU was actually able to finish first. This marks the very first time that we have seen an AMD CPU outshine an Intel CPU in instructions per clock in more than a decade. Outperforming the Intel CPU core for core, and clock for clock. That demo represents a truly historic moment for AMD.

The key takeaways here are

- Zen has better or equivalent IPC to Intel's highest performing desktop CPUs yet, Broadwell-E.

- AMD states that 8 core, 16 thread Zen CPUs will scale to frequencies beyond 3.0Ghz.

All New AM4 Socket & Platform

AMD is finally bringing all of its desktop products under one roof. The next generation AM4 socket will be compatibel with both desktop APUs and CPUs. This includes the upcoming Excavator based Bristol Ridge APUs which are set to arrive some time during the next four months. As well as the brand new Summit Ridge family of high-performance Zen CPUs.

All CPUs inside the Summit Ridge family will include the company's new platform security processor, PCIe 3.0 support, dual channel DDR4 memory controllers, copious amounts of L3 cache and updated storage features. Including USB3.1 and NVMe.

AMD AM4 platform key technology features include:

DDR4 Memory

PCIe Gen 3

USB 3.1 Gen2 10Gbps

NVMe

SATA Express

Availablity & Closing Comments

AMD confirmed that AM4 motherboards will begin shipping throughout the next number of months. The first processors to be available on the new socket are Bristol Ridge APUs, which will begin shipping some time before the Holiday season. The multi-core enthusiast Zen CPUs that have been the subject of most of this article will be available en mass in early 2017 CEO Lisa Su has confirmed. There were some hints that Zen could launch around CES, so January 2017 would be a good bet.

It seems that all the stars are lining up for AMD, which has seen its fair share of struggles over the past several years. All the components are there to make Zen a successful product. It has taken more than a decade, but it’s finally here. An AMD product that can actually challenge Intel’s highest performing Extreme Edition CPUs. It’s almost unfathomable at first thought, after all we all grew accustomed to seeing Intel’s enthusiast CPUs go uncontested for the longest time.

Bringing this to a close, it's clear that AMD's is doing a lot of things right with Zen. Pushing IPC and power efficiency to where they need to be. Building a comprehensive modern platform and bringing much needed updates to the feature-set. Creating an attractive value proposition for desktop users, servers and notebooks. All the ingredients to make Zen a success are here, all that's left is for AMD to execute and deliver. The mere prospect that enthusiasts may actually have AMD CPUs as a worthwile option again for the first time in a decade come this October is refreshing. And maybe, just maybe, we'll finally be able to say "AMD's back".