Zen 2 Missives - 2019

Matthew Dillon

(I) The Socket Didn't Have to Change

After languishing through the CPU dark-ages (read: Intel trying real hard to keep people on 4 cores so they could charge an arm and a leg for more), the last two years has seen a vertible tsunami of advances in CPU technologies. AMD's introduction of an 8-core CPU (the 1700X) began the rat-race and with both AMD and Intel now pushing high-core-count CPUs, the big winners here are us! The consumer, the power user, the programmer, the technically-oriented enthusiast. All of us are the winners of this new race. But all is not equal. The situation developing now is primarily related to power distribution and power consumption. Intel got caught with their pants down on multiple fronts... they got stuck on their 14nm node and could only produce minor improvements in power efficiency. And they also got stuck on a socket with power delivery capabilities a bit lower than they would have liked. This puts Intel in the unenviable situation of having to compete against AMD by introducing a higher core-count CPU (a 10-core) without a commesurate improvement in power efficiency, forcing a new socket and motherboard upgrade on the Intel world. Intel is taking power consumption way beyond what most people are actually going to be willing to push into their machines. It's a doubly-whammy with Intel on the losing-end. AMD thought ahead. Their AM4 socket can handle tons more power. More importantly, AMD is now delivering, on 7nm, performance efficiencies that are nearly double that of Intel. This means that AMD not only does not have to change their socket, but their new CPUs will run just fine on just about ANY AM4 motherboard introduced in the last three years without even breaking a sweat. AMD is not going to have to impose a new socket until something major changes in the memory subsystem, such as a new memory standard that is incompatible with the DDR4 DIMM socket. Notice I didn't say electrical, I said socket. Because the CPU is more or less directly wired to the DIMM slots. It is a common misconception that getting the most out of one of these new Zen 2 CPUs requires a high-end X570 motherboard, with beefier, higher-end VRMs and other beefy features. But as it turns out, this is not actually true. The reason is that power efficiency actually keeps even the high-end 3900X (and later the 3950X) within the power envelope that older motherboards (B350 and B450 mobos) can deliver. In the case of the B450, with room to spare. A 3900X gets 95% of its performance with just 110W in the socket and even low-end B450 motherboards can put 150W into the socket. Low-end B350 motherboards can put at least 100W in the socket, which is close enough. Did I mention that you can just pop into the BIOS and set the power cap for the socket to whatever you want? Poor airflow? Old motherboard? It doesn't matter. Well, sure, it does matter, just not as much as people seem to think.

(II) The Physics has Changed with These Smaller Nodes

When you compare the TSMC 7nm node AMD's Zen 2 CPUs are currently on with Intel's 14nm+++(many pluses) node you will notice some major differences in how the physics of the node works. Intel's 14nm node is relatively temperature-agnostic. When you overclock a 9900K you can take temps right up to the limit, continuing to push more power into the socket to get those high frequencies. Zen 2 on 7nm doesn't work this way. On 7nm, temperature has a direct correlation with frequency. And so on a Zen 2 system if you increase the voltage to push frequency you also wind up increasing the temperature which retards the maximum possible stable frequency. In otherwords, you can't just push power into a Zen 2 cpu to get the overclocks you want. It doesn't work. On a Zen 2 system the key to overclocking is lower temps, NOT higher power. Well, if you want to run the bleeding edge and you hit a hard stop on temps then sure, you can push more power into the socket (as long as you keep those temps hard-stopped), but this level of overkill just doesn't net a whole lot more in the performance department. Strangely enough this means that you don't actually need those beefy X570 motherboards with 10+ phases to get a decent overclock, you just need a good cooler. I'll bet a lot of people will start seriously thinking about going sub-ambient as well (at least down to +5C, since going negative has severe condensation issues). But for the rest of us mere mortals, a decent tower air cooler gets us almost as much overclock as a good water loop. When I say good what I mean is that if you really want to overclock you can easily get to 4.2 on air without pushing massive amounts of power. The absolute best overclock you will ever get on Zen 2 at non-destructive voltages (1.3V VCORE, approximately) will be around 4.4 GHz, all-cores. Higher than that and it won't be stable or you will have to push too much voltage. The difference between 4.2 and 4.4 is only 5%. For most of us, 5% isn't worth the massive investment in time, effort, and cooling. Overclocking Zen 2 is easy... you just leave it at stock settings and maybe bump up the socket power envelope a little, run XMP on your memory, and you are done, and you get it on any motherboard, even the cheap ones. A second major differences in the physics of the TSMC 7nm node verses Intel's 14nm is the single-core boost. Whereas you can push more volts into a 9900K (almost) regardless of how many cores are loaded, the smaller 7nm node cannot. High current all-cores boosts require a lower voltage to avoid destroying the chip due to electron momentum, whereas a low current single-core boost allows a higher voltage (up to 1.45V). So while Intel overclockers can always look for ways to push more power into the socket (at least on 14nm+++*), AMD overclocks wind up between a rock and a hard place fairly quickly with ambient cooling. Heat matters. The physics are just different. This isn't a bad thing, by the way. It means that very high performance systems can be built more cheaply. Intel will be in the same boat soon enough, because this difference in physics seems to be a side effect of getting smaller. Intel's 10nm node is rumored to be limited to 4.1 GHz or so. And later on we may see even more limitations on clocks. The only way to really scale from here on will be with IPC and by adding cores. Nobody is interesting in pushing 200W into their socket, which puts Intel at a dead-end on 14nm. They may try to push more cores on 14nm, but the power consumption makes it non-competitive. Remember that.

(III) Power Density, Caches, and IPC

As nodes get smaller, power density increases. Radically. That is, the smaller size of the chips more then compensates for the improved power efficiency. From an absolute performance point of view we do get that power efficiency. But from a power-density point of view we do not. The transistors are packed more tightly. The power density is heading up and not down. Higher power densities mean higher temperature gradiants going from the transistors to the socket to the cooling solution. No amount of cooling can completely compensate for this gradient. Only lower frequencies can help here, and much larger caches. Why the larger caches? Because CPU caches have relatively low power densities. They take up chip real-estate but actually improve the overall power density problem. So every new architecture from here forwards is going to have much larger CPU caches. We've seen this with Zen 2 where each CPU chiplet (8 cores) sports 32MB of L3 cache. This means that the 3900X and the 3950X both have 64MB of L3 cache. A 64-core TR3 or EPYC will have 256MB of L3 cache. Up until now, Intel has always sported small caches on their consumer chips. They dribble out 6MB here, 8MB there, and they reserve the large caches for their expensive Xeon behemoths... its a really poor showing by Intel, frankly. Increasing IPC requires increasing the CPU cache sizes, and possibly even adding an even larger L4 cache to the die. This is no longer a knob that Intel can shirk on, not with AMD putting 64MB of L3 into its high-end consumer chips with Zen 2. As already indicatd, increasing cache size is one of the most important ways a CPU can scale IPC up.

(IV) I/O Infrastructure and Bandwidth in the New World

Forget PCIe-v4, or v5, or v6... well, no. Don't forget about them, they are important. Just not quite as important as people are probably thinking. The most important aspect of AMD's X570 chipset is not the PCIe-v4 support it has going into the PCIe connectors, it's the 4-lanes of (effectively) infinity-fabric (basically a rejiggered PCIe-v4) going from the CPU to the chipset. And it is the 20 lanes of PCIe-v4 heading out of the CPU just waiting to be fed into expanders. Why? Because very few PCIe cards actually need PCIe-v4's bandwidth. Even the fancy new PCIe-v4 M.2 SSDs... its already overkill. What is important here is the land expansion that is possible, not pushing 5GBytes/sec from a single device. Paired with this vast new I/O capability is M.2 and U.2, embodying a wonderful new chipset standard called NVMe that Intel couldn't intentionally hamstring like they did AHCI and SATA (in order to favor SAS). This plus SSDs could spell doom for consumer-vs-commercial separation of the SSD markets. For a SSD the only thing that matters in terms of market separation is its endurance... and endurance is a lot harder to gimick than an interface standard. Rejoice folks! The age of incredible I/O bandwidth has arrived and we are already being overwhelmed by it!

(V) Intel Will Catch-up, Consumers Will Still Win