Analyzing World of Warcraft multi-core and frequency scaling

How many cores World of Warcraft can use? How many you will need to run this game alongside some other apps like Discord, browser or multiple addons? Let's take a look at cores and frequency scaling of the game and how games can use multiple cores.

Some CPU tasks can spread across many cores. Use a 64 core Threadripper and you will see all core load. When you extract or compress files, or when you do a big WoW update you can see such load on your CPU. When gaming this is hard to achieve as a game is an application that does many different things. Compression/decompression algorithm may be implement in a way to send batches of data to every cores and there is one task to do. Game has to talk to the GPU, has to load and manage assets, has to perform game logic calculation and more. There is no easy way to just spread all of this on many cores. That is why very little games scale perfectly to large core counts if at all. Ashes of the Singularity: Escalation is currently likely the only game that can scale to 16 cores if having enough GPU performance available - but that game primary objective is to implement such scaling and performance, to showcase what's possible and deliver massive battles alongside it.

Game multi-core scaling can be divided to two components:

Game-GPU communication

Non-GPU game logic

A game will talk to the GPU via a provided API - interface. On Windows it will be DirectX with Vulkan or OpenGL being other, cross platform interfaces. In DX9 and DX11 games only one thread at once can talk to a GPU so a game has to have a dedicated rendering thread. Such thread sends draw calls to the GPU and works only on one CPU core.

Threads as per Wikipedia definition are a way for a program to divide ( split ) itself into two or more simultaneously (or pseudo-simultaneously) running tasks. Threads and processes (like running WoW.exe) differ from one operating system to another but, in general, a thread is contained inside a process (running wow.exe) and different threads in the same process share the same resources while different processes in the same multitasking operating system do not.

Threads can end up running on multiple cores if a developer chooses so (more so it’s operating system scheduling task). Some tasks must be executed in some order or wait for other to finish so making a multi-threaded game isn't as straightforward as with compute tasks (like compressing files). That requires an explicit design, which for MMORPG games is kind of harder to do than for a single player offline games.

Game-GPU communication is done via the DX driver. In the era of DX9 and DX11 games the problem was that one thread had to do all of the communication. Aside of that the high level nature of the API did a lot of tasks without game developer control - which caused that each draw call had a relatively high CPU cost. That's why older games are so dependent on single core performance - even if they offload work to other cores they need one rendering thread to do a lot of work that will be the first to max out the capabilities of a CPU core and be the bottleneck. Some less advanced DX11 games or old DX9 games may just use one or two cores max. That’s a big performance scaling problem that had to be solved.

DX11 rendering pipeline

DX12 and Vulkan APIs were designed differently - very low level with as little overhead as possible and with as much freedom for the developers as possible. Such low level API also has disadvantages – it puts more responsibilities in the hands of the game developer, requires more knowledge and is quite different than existing solutions - meaning that game engines pretty much have to be re-implemented from scratch.

In DX12 main rendering thread does much less work. Depending on how thing will be implemented it will have a queue of lists of things to do on the GPU. Those lists are sent to worker threads (can be many cores) that then execute calls from the lists. There is not single thread/core responsible for this.

DX12 rendering pipeline

3DMark API overhead benchmark for GTX 1070 and i5-9400F

As you can see DX12 allows (in this synthetic benchmark) for much higher amount of draw calls to be processed and the performance scales greatly with CPU cores. DX11 multithreaded also uses multiple threads but is limited partially by the main rendering thread.

WoW as many other games also has a lot of non-GPU related logic inside. For one it's an online game that relies on severs for it game state, then it has all of it game mechanics. You could remember spell batching created for Classic WoW where they explicitly used slower queue to handle spell casting. All of this creates CPU load that can be to some extent split across threads and those across cores. However as the game state must be consistent and in sync with the server it's easy to end up with an expensive master thread that is responsible for that state management and thus ending up with one CPU core running it being the limiting factor.

In patch 8.1 Blizzard started implementing multithreading for DX12. This was expanded/enabled by default in subsequent patches. Aside of that there was some work done on locations like Boralus to optimize flow of the draw calls and thus allow rendering large structures like the whole city of Boralus.

This is responsible for multithreading GPU communication but not for other types of logic present in the game.

The introduction is quite important as the game has two sides of CPU performance. One is the main thread having highest load and being single-core limited. The other is DX12 multithreaded GPU communications that is offloaded to other cores resulting in lower but constant load on at least few of them.

Below you will find a set of benchmarks made by me on patch 8.3.0.33941 using GTX 1070 and Vega 64 paired with Intel i5-9400F and 16GB of RAM at 3200MHz.

Racing around Dalaran gives fluctuating FPS and some I/O activity along the way

127 average FPS in Dazar'alor yet GPU isn't used that much

235 FPS in Stonard - less game logic to process just draw calls so more can be done in unit of time

In a previous benchmark I've benchmarked WoW at 6, 4 and 2 core configuration using Intel i5-9400F. The results showed that for the most part game reaches max performance at 4 cores with only some scenarios benefiting from 6 core configuration. That benchmark was done at 1080p mode 7 with GTX 1070:

Dazar'alor shows the most significant change in FPS when going to lower core configurations. There is a lot of structures, players and NPCs. A lot of draw calls and tasks that can be spread across cores. Just note that this spot has some randomness to it.

I've re-run those benchmarks again to verify the results and to then plot a chart of relative FPS (FPS at 6-cores was assigned 100%). I've also added ultrawide 3440x1440 resolution to see how it will scale, although some scenarios were GPU bound at such resolution.

As you can see for the most part there is no difference between 4 and 6 cores and you can see clear drop at 2 core configuration, especially for 1% low FPS which can be a deciding factor if the game feels fluid.

The relative FPS exceeds 100% a bit at 4-core configuration in some cases. This can be due to slight variance in test results as well as CPU boosting from 3902 MHz to 4006 MHz when less cores is active.

Is a quad core enough for WoW then? Short answer is not exactly . You can treat it as a minimum requirement for acceptable performance. For actual use there is one thing that is being overlooked and that are other applications. Those results were acquired when no other app like a web browsers, Discord client, media players, WoW addons and alike were working and when Windows weren't performing any expensive background tasks.

If the game doesn't scale with core count maybe there is room for those apps on a quad core? No, there isn't, check those CPU usage charts:

Resource usage in Dazar'alor

Resource usage in Karazan during mass combat

As you can see standing still and looking at the entrance to the Great Seal in Dazar'alor puts a relatively high load on all cores. Pulling large amount of mobs in Kazaran and having all of them in the field of view has lower off-core load which may indicate that there is less draw calls to spread across other cores while game logic is much less capable of being offloaded.

Moving draw calls off the main thread freed some CPU cycles for the game logic resulting in better performance. However as most of the logic is still on single core the scaling will be limited. However adding other apps running on the system may start to starve rendering cores and thus decrease performance. Also pushing more load across the CPU can cause a change in boost clock. Intel CPUs have higher single core boost clock versus all core boost thus more load across the CPU can downclock the core running main game thread which would then lower the performance as well. Ryzen CPUs keep the boost clock on all cores but their clocks are also tied to temperature so cooling will decide how it behaves on long lasting load.

Quad core with almost full all core load during Boralus flyby

Slightly lower load in Dazar'alor

Much lower draw call load while in mass actor combat in Karazan

WoW running at 912MHz

Intel unlocked CPUs can also have freely lowered clock multiplier thus can be forced to work at much lower frequencies. By lowering the max clock speed we can check how the game will behave - by how much the performance will decrease and will there be more tasks offloaded to other threads.

I used 1512 and 912 MHz as my test frequencies. Low enough to be sure the game is CPU bound, however on 1080p this wasn't a problem in the first place.

Average FPS versus CPU all core frequency

1% low FPS versus CPU all core frequency

As you can see the game FPS decreases nearly linearly with decreasing CPU clock frequency with some gains to Dazar'alor. Those charts show that the game is still managed by the main thread working on one core and only in some edge case scenarios when there is more GPU work than other logic it can scale bit better. Single core frequency and efficiency (IPC) are the king while stronger GPU comes into play only if you want better looks after you provided the CPU power to achieve good FPS.

3440 x 1440 all core frequency performance scaling

On ultrawide display Dazar'alor didn't scale as good as Boralus flyby. Bigger field of view of such resolution puts more Boralus in the field which then gives more draw calls that to some extent can be rendered by multiple cores counterbalancing the problem of lowered clock speeds.

This is also very important for portable laptops where different CPUs can have different clock speeds available. 10 nm Intel parts have less cores, lower clock speeds but better integrated graphics. Intel 14nm parts have more cores, higher clocks but worse iGPU (and often paired with Nvidia MX cards). AMD third gen mobile parts (4000 U-series) will be stronger, competing only with upcoming Intel chips. If you want an ultrabook capable of running WoW check what clock speeds it has, check reviews for thermal throttling and then look at the iGPU/dGPU.

Nvidia with their Turing architecture greatly improved performance of their cards when running in DX12 mode. This was mainly to how optimally different tasks can be scheduler in parallel. Previous GPUs often performed better in DX11 mode than in DX12.

AMD cards were favoring DX12 before it even was a thing (AMD even made a showcase API called Mantle that then led to DX12 and Vulkan development). Yet due to low budget they had a GCN general purpose architecture - for compute and gaming tasks. This resulted in a design that had a lot of compute power but was hard to efficiently use in gaming where there was a set of mixed instructions. With Navi they implement RDNA architecture with smaller waves (a collections of threads with same instruction type) that will be more efficient in gaming as games tend to schedule smaller chunks of one specific instruction - thus less clock cycles the compute unit will be idle. With RDNA2 it's expected that there will be even smaller wave support to keep the compute units saturated as much as possible.

Those latest features of latest GPU generations help improve latency, speed at which commands are executed thus increasing the efficiency of DX12 worker threads on the CPU off-cores. Also more VRAM can allow the game to just cache more assets on the GPU instead sending them more often. Improved data compression algorithms also improve how that data is handled on the GPU itself. All of this improve GPU performance but may have a positive effect on then worker threads and to lesser degree amount of tasks the main thread has to do. Latest GPU won't solve WoW scaling but may help you with 1%/0.1% low FPS and alike.

WoW performance is still driven by single core performance, but it also needs few additional cores to offload draw calls to. With some BfA zones starting to scale beyond 4-core CPUs we can see even more benefits in Shadowlands that will be released way after multithreaded DX12 mode was added to the game.

4-core modern CPU can be a bare minimum for acceptable performance.

6-core will have some additional headroom for upcoming zones and for some light other app activity. Latest Ryzen and Intel chips will be better than older multi-core parts.

activity. Latest Ryzen and Intel chips will be better than older multi-core parts. CPU Multithreading (Hyperthreading on Intel CPUs) usually has no effect on gaming performance but affects some other non-gaming tasks or may help in some less likely edge scenarios. (check R5 3500X vs R5 3600 benchmarks done not so long ago).

If you want to stream you may be looking at 8-core or bigger CPUs. Video editing and other expensive tasks alongside WoW will require more cores but also good cooling to have highest possible clocks.

Game performance is nearly linear with CPU frequency - when picking mobile parts check their boost clocks and thermal performance in given device.

RkBlog