Thank you to all who provided feedback on my last post- the code definitely benefitted. Since then I have added a handful of different mutex implementations to my GitHub repository, each with their own (dis)advantages. I have also tried to measure the performance of each under varying circumstances. The results are presented here†.

There are four mutexes tested in this post:

std : This is std::mutex provided by the standard library.

: This is provided by the standard library. spin : A pure spinning mutex, it will never block while trying to acquire the lock.

: A pure spinning mutex, it will never block while trying to acquire the lock. adaptive spin : This is the spin/block hybrid mutex described in the previous post.

: This is the spin/block hybrid mutex described in the previous post. adaptive block : This mutex measures the amount of time it remains locked, and (if it cannot lock immediately) blocks itself for a duration of time proportionate to the critical section it protects. Admittedly, it is an experimental behavior.

Test Parameters

Time Measurement

There are two timing methods used in this post that each shed light on the behavior of these mutexes:

Wall time : The amount of time it takes to accomplish a task in real time. It is so named because it uses a high-precision variant of the clock on your wall.

: The amount of time it takes to accomplish a task in real time. It is so named because it uses a high-precision variant of the clock on your wall. CPU time : The cumulative amount of processor time dedicated to the program. This value is telling in a multithreaded environment, as multiple CPUs can be given to the program during the same period of wall clock time. As such, it is expected to be larger.

Thread Subscription

I also tested different threading subscriptions related to the number of concurrent threads supported by the computer††:

Under : Used 6/12 work threads

: Used 6/12 work threads Exact : Used 12/12 work threads

: Used 12/12 work threads Over : Used 24/12 work threads

Each thread is given the same amount of work to perform, so it’s expected that the times would scale near-proportionally to the number of threads.

Thread Work

The two main operations I tested were inserting a randomly-generated key/value pair into a map, and searching for a random key on the same.

Charting the Results

The charts contain eight bars, two for each mutex type (the first for the wall clock, and the second for the CPU.) Each test is performed multiple times, resulting in a normal distribution of the time it took to complete each task. The red dots in the bars are the average time taken (in milliseconds.) The max and min values for each bar represent the 3-sigma range. In other words, each bar contains ~95% of the times recorded for each operation.

Red dots (and bars) closer to zero are better. Shorter bars suggest a more stable/predictable locking behavior.

Map Insertion

Map insertion is intensive, involving both traversing and manipulating the map, as well as allocating memory for the value type being inserted.

Under-subscribed