There are several interesting observations here!

First, we reproduce the result that the variance of spinlocks on Linux with default scheduling settings can be huge:

1 2 parking_lot::Mutex min 6ms max 11ms AmdSpinlock min 6ms max 123ms

Note that these are extreme results for 100 runs, where each run does 32 * 10_000 lock operations. That is, individual lock/unlock operations probably have an even higher spread.

Second, the uncontended case looks like I have expected: mutexes and spinlocks are not that different, because they essentially use the same code

1 2 Parking_lot::Mutex avg 6ms min 4ms max 9ms spin::Mutex avg 5ms min 4ms max 7ms

Third, under heavy contention mutexes annihilate spinlocks:

1 2 parking_lot::Mutex avg 10ms max 11ms spin::Mutex avg 55ms max 161ms

Now, this is the opposite of what I would naively expect. Even in heavy contended state, the critical section is still extremely short, so for each thread, the most efficient strategy seems to spin for a couple of iterations.

But I think I can explain why mutexes are so much better in this case. One reason is that with spinlocks a thread can get unlucky and be preempted in the critical section. The other more important reason is that, at any given moment in time, there are many threads trying to enter the same critical section. With spinlocks, all cores can be occupied by threads who compete for the same lock. With mutexes, there is a queue of sleeping threads for each lock, and the kernel generally tries to make sure that only one thread from the group is awake.

This is a funny example of mechanical race to the bottom. Due to the short length of critical section, each individual thread would spend less CPU cycles in total if it were spinning, but it increases the overall cost.

EDIT: simpler and more plausible explanation from the author of Rust’s parking lot is that it does exponential backoff when spinning, unlike the two spinlock implementations.

Fourth, even under heavy contention spin locks can luck out and finish almost as fast as mutexes:

1 2 parking_lot::Mutex avg 10ms min 6ms spin::Mutex avg 55ms min 7ms

This again shows that a good mutex is roughly equivalent to a spinlock in the best case.

Fifth, the amount of contention required to disrupt spinlocks seems to be small. Even if 32 threads compete for 1 000 locks, spinlocks still are considerably slower:

1 2 parking_lot::Mutex avg 6ms min 3ms max 8ms spin::Mutex avg 37ms min 4ms max 115ms