ZenRipper

Notes on Ryzen and Threadripper Hardware

With the advent of high core-count consumer and prosumer CPUs one must take a slightly different view of performance-tuning the hardware, otherwise known as overclocking. Ryzen and Threadripper systems are unlocked by default, allowing overclocking, but just overclocking the hell out of these systems is not necessarily your best choice.

General power/performance trade-off between memory fabric and cpu cores

The first thing to note is that there is power a trade-off between the memory fabric and the cpu cores. Most of your performance will come from overclocking the memory fabric, but the fabric also eats power which takes away from the power budget for the cpu cores.

On consumer Ryzen cpus which only have 2 memory channels you want to run the memory fabric (sometimes known as XMP in the BIOS) pretty much as fast as it will go up to around 2933 Mhz. Faster than that and you quickly hit diminishing returns for exponential increases in power consumption. You can then also enable XFR (cpu overclocking) and the system will OC the cpu as much as it can for the amount of cooling you have, up to around 200W or so typically. If you want to reduce the wattage at the wall you probably can without decreasing performance all that much, so read-up on that section below.

On Threadripper cpus there are 4 memory channels, but here the memory fabric eats significantly more energy when overclocked than on the consumer cpus. You will generally not want to run the fabric faster than 2933 Mhz, and in many cases you might want to run it even slower, such as at 2666, in order to give the cpu cores more budget to clock faster. It will depend heavily on the type of workload. The threadripper cpus are even more sensitive to the memory fabric speed, to the point where you can often significant reduce the power budget dedicated to the cores without losing much performance, so read-up on that section below. The 2990WX with XFR enabled at stock settings will pull 330W from the wall at full load, which is a lot of power.

All Zen-based AMD cpus support ECC, but not all motherboards do. Being able to run ECC memory on a consumer Ryzen is spectaculary useful for consumers and small businesses who care about memory corruption and reliability. High density memory is prone to bit errors over time, and even if most consumers don't usually notice the corruption (e.g. Windows blue-screens are blamed on Microsoft), being able to use ECC memory is important for a significant subset of buyers. ECC memory cannot usually be overclocked to the same degree that non-ECC memory can, and you will most likely be limited to 2133, 2400, or 2666Mhz. Under these circumstances the performance for many-cores parts such as the 2700[X] and the TR 2950X or 2990WX will often be limited by memory bandwidth and not by cpu core frequency, so read up on the section below about reducing the power envelope. 2700[X] systems can be dropped down to roughly 115W and the 2990WX can be dropped down from 330W to 225W without losing too much performance. At those wattages, these parts are ridiculously power efficient.

Always be sure to enable XMP (extended memory frequency mode) in the BIOS regardless of the machine setup, otherwise many BIOSes will run your memory at 2133 Mhz even if you bought 3000 Mhz memory. XMP is a separate feature from XFR (extended cpu core frequencies and power). You may also have to disable Cool-n'quiet mode in the BIOS which is usually under the Advanced->CPU_Configuration menu. Sometimes the BIOS ignores other options when Cool-n'quiet mode is enabled.

Overclocking

Overclocking is dangerous! You can crowbar your power supply, you can destroy your CPU or motherboard, and you can significantly reduce the life of your system. That said, if you want to learn to overclock a Zen-based AMD cpu, there is an easy way to do it. Do not manually set the CPU frequency. Instead, the BIOS will have a power envelope setting for XFR2 that you can set. This will either be in the O.C. menu or in the Advanced menu under the AMD CBS settings (usually in the Advanced->CBS->NBIO->XFR 2.0 Configuration menu).

In this menu you can enable XFR2 and then set the PPT (wattage), TDC and EDC (current) limits. The system will operate under XFR2 guidelines which means that it will still idle at a nice low level and it will ratchet itself up to the limits specified. The best way to set these values is to unlimit TDC and EDC (usually by setting the values to either 30 or 30000 depending on whether the value is specified in mA or A), and use the PPT setting to govern the system.

The PPT setting is very dangerous and must be specified with caution. It's usually 30-100W lower than the actual power consumption at the wall. If the BIOS specifies this field in mW, then start with 100W or so (100000). If the BIOS specifies this field in W, then start with 100W (100). Run your system at full load and check the actual amperage at the wall with a kill-a-watt meter. Then adjust up or down according to your needs. Be very careful and only increase in small increments. Also note that if you specify too-low a PPT, the BIOS may not be able to post and require that you clear the cmos memory and start over.

When overclocking you should only set a power envelope that your system cooling solution can handle and that your power supply can handle. ALWAYS GIVE YOUR PSU TWICE THE HEADROOM AS THE ACTUAL POWER DRAW FROM THE WALL! If you want to overclock a system to 400W, then you need an 800W power supply, period. And you need cooling that can handle it, meaning at least water cooling. Do not run the system power close to the limits of the PSU. Do not run the CPU temps past 70C or so. All PSUs and mobos have overheat protection, but this is NOT a guarantee that such protection will save your system! If you do overheat the system and the BIOS stops posting reliably, clear the CMOS and let the system cool down unplugged for 30 minutes before resuming.

Reducing the power envelope

The same XFR2 settings can be used to reduce the power envelope of the system. This is extremely useful in three situations: (1) When running threadripper CPUs, (2) When running with slower memory, and (3) When you want to run servers at their most efficient power/performance point.

On systems where the memory is relatively slow (2133, 2400, 2666) you should test your nominal workload at different PPTs. You may be surprised at just how low a PPT you can specify without any significant loss in performance. This is particularly true for compile farms. You might ask, why would someone put slow memory into a threadripper system? There are two reasons. First, if you need a ton of memory and use high-density sticks you just won't be able to run the memory fabric at high speeds. That's just the way it goes... the memory chips put too much load on the the traces. Second, if you use ECC memory for reliability purposes you probably won't be able to find relatively cheap high-speed ECC memory. 2666 is just about the limit for unbuffered ECC.

On such systems you can almost certainly reduce the power envelope with only a small loss in performance and this can really save on the power bill. For my bulk compiles, a Ryzen 2700X can run the jobs at 125W just as fast as it does at 180W when I use ECC 2133 EUDIMMs. Similarly, I can run the threadripper 2990WX at 250W instead of 330W with only minor losses in performance.

On CPUs with fewer threads there is a better match-up between cores and memory bandwidth, so YMMV. Being able to test your particular workload against different PPT settings is important.

Severe reduction of power envelope

If you really want to limit the power envelope you can use the PPT setting to force the system to operate at near its idle power even with all CPUs loaded. A setting of 50W is about as low as I would go for a 2700X, and perhaps 100W for a Threadripper (Maybe 150W for the 2990WX), but YMMV. Also set memory at 2133 when doing this. Note that setting a very low PPT will not reduce idle power consumption.

When you do this the system will not be able to run cores any faster than around 2 GHz and the all-cores load will drop the cores to around 1GHz if you can believe that. A typical concurrent compile workload on all-cores on a 2700X will run about 3x longer.

This methodology to reduce power consumption is not really recommended because the 'burst' performance for single-threaded workloads is severely impacted.

A better way to severely limit power consumption is to use powerd(8) either with default settings or to force a very slow ramp-up of the cpu core mask used by the scheduler. Default settings will limit incidental quick-burst workloads such as from smtp connections, web servers, cron, and so forth, to a single cpu. If you want powerd to ramp-up the cpu mask even more slowly, 'powerd -u 50' will accomplish that. Settings higher than -u 50 may prevent any ramp-up in the scheduler's cpu mask and are not recommended.

Reducing Idle power consumption

Reducing idle power consumption on AMD's Zen[+] architecture is fairly difficult. It is already optimized almost as low as it can go, typically 40W at the wall for a 2700X, and 65W for a TR2 2990WX. You may be able to get a few more watts by turning on Cool'n-Quiet mode (usually under Advanced/CPU in the BIOS) and by running the memory at 2133.

About the only way to reduce power consumption is to cut away some of the cores in the BIOS, which we do not recommend (instead buy a lower-end cpu if idle power consumption is a concern).