One of the greatest challenges facing the designers of many-core processors is resource contention. The chart below visually lays out the problem of resource contention, but for most of us the idea is intuitively easy to grasp: more cores and more simultaneous threads means more contention for shared resources, specifically cache space and memory bandwidth.





As Moore's Law increases, the amount of parallel hardware and the number of threads that can access a single, shared resource, this problem will continue to grow. Indeed, resource contention challenges have the potential to scale fairly well with increases in core and thread counts, so chip multiprocessor (CMP) designers have been working on ways to address this issue since the very start of the dual-core era.

In the present article, I'll take a look at the issue of resource contention and at one of Intel's proposed methods for overcoming this challenge: the quality-of-service-aware memory hierarchy.

Making the problem worse: virtualization and heterogeneous

multiprocessing

Virtualization has the potential to exacerbate the resource contention problem tremendously, since different workloads running in different virtual machines—workloads managed by OS instances that are mutually unaware of one another—can create a kind of "perfect storm" and pull overall system performance down through the floor.

Or, to put this in plainer English by using an analogy, the VM contention problem is kind of like a two-family condo where both floors are on the same (aging) breaker box. On a Friday night in the dead of summer, if the undergrad girls in the top unit are running eight lights, two air conditioners, four curling irons, and an entertainment system, and the couple down below is running an air conditioner, a washer and dryer, and a dishwasher, then something's bound to blow. Within each unit, the inhabitants may be carefully monitoring their electricity usage so as not to overtax the aging circuits, but the lack of cross-unit coordination is a recipe for periodic blackouts.

(To unpack the analogy briefly, the two condo units would be two virtual machines, and the person keeping track of electricity usage in each unit would be the OS.)

Virtualization-related resource contention is already a real problem, and there have been a number of attempts by Intel and others in the academic community to profile how different types of workloads interact (c.f. "Virtualization in the Enterprise," Intel Technology Journal, v10, issue 3). What these studies have found is that degenerate cases, where resource contention causes performance degradation across all the VMs in a system, are not uncommon on enterprise workloads.

The other major contributing factor to the looming resource contention crisis is the heterogeneous nature of future multiprocessors. As I described in my previous article on Terascale, Intel's vision of the future of the "processor" is a heterogeneous network on a chip (NoC). This NoC may contain cores of different types, some of which are highly application-specific. These cores will run threads that have very different resource needs and usage patterns than the more general-purpose x86 cores, with the result that so many highly diverse thread types drawing on one memory hierarchy could create undreamed of levels of contention. In other words, the individual members in this crowd of threads will be very different from one another, and when you cram them together into one memory hierarchy you can expect something like a bar-room brawl to break out.

QoS and the memory hierarchy

Intel's solution to the resource contention problem—a solution with a very long pedigree in everything from networks to the power grid—is to build a framework for enforcing priority-based quality of service (QoS) in the memory hierarchy. This solution was described and evaluated in a recent ACM SIG presentation entitled "QoS Policy and Architecture for Cache/Memory in CMP Platforms."

The basic idea goes as follows: A hardware/software mechanism is provided so that a user or administrator can assign running threads one of two priority levels, either low (1) or high (0). High-priority threads, which are presumably more critical than lower-priority threads, are then given more cache space and more memory bandwidth than low-priority threads. A special hardware unit keeps track of the resources that each thread is using in real time and makes tweaks and adjustments to the cache and memory unit so that the overall result is that the high-priority threads keep the upper hand over the low-priority threads.





So the idea behind a QoS-aware memory hierarchy is relatively straightforward (though I have oversimplified a bit here). But the devil, as always, is in the details. Let's take a look.