A week ago, I woke up having put two and two together.

Six months ago, while explaining perishable collections, I mentioned that you could traverse a particular linked list without holding a lock (despite the list being modified concurrently). The idea was that, because nodes’ Next pointers were always valid before being inserted into the list, the traversal would always ‘escape’. (I was wrong.)

Five months ago, I watched a talk by Herb Sutter called Atomic Weapons: The C++ Memory Model and Modern Hardware (Part 1, Part 2). Notably, the talk mentions a rule of thumb: every write-release barrier should be paired with a read-acquire barrier.

Wait. What? Argh! My code breaks the rule of thumb. In a way that’s definitely wrong.

Double-Checked Locking

A classic example of the dangers of writing concurrent code is double-checked locking, which is a technique for doing thread-safe lazy initialization (i.e. avoid computing a value until it’s requested, and then cache it).

The simplest way to do thread-safe lazy initialization is to, while holding a lock, check if the value needs to be created and then create it if necessary. The issue people have with this approach is that it requires acquiring and releasing a lock (which is relatively expensive) even after the result has been cached. Thus the idea of double-checking, where you do an extra unsynchronized “is the value cached?” check in the hopes of being able to just return the cached value without acquiring the lock.

The problem with double-checked locking is that it’s easy to get wrong. For example:

private static Point s = null; public double getLazyX_WRONG() { if (s == null) { lock { if (s == null) { Point p = new Point(); p.X = 1; p.Y = 1; s = p; } } } return s.X; }

The above code is wrong, and in a non-obvious (but well publicized) way. When a thread does the first s == null check, sees that s has already been cached, and proceeds to read s.X , there’s no acquire barrier ensuring sequential consistency. Consequently, it’s possible for threads to see a half-constructed s .

To make it a bit clearer what the problem is, I’ve made an animated diagram of the bug occurring:

What’s happening in the above diagram:

We start just after a “producer” thread has seen that s is null, acquired the lock, and confirmed that s is null and needs to be initialized. The producer creates a point, initializes it, and assigns it to s . The producer releases the lock. This creates a release barrier, forcing its cached values to be written to memory. A separate “consumer” thread enters the method. The consumer happens to update its cached value of s from memory, but it doesn’t update the memory that s points to. (An acquire barrier would have forced the other values to also be updated, but the consumer isn’t encountering any acquire barriers.) The consumer notes that s is not null, assumes it’s been initialized, and proceeds to return uninitialized garbage from inside s .

Fixing the bug in the above code requires introducing an acquire barrier when reading the cached value. For example, in Java or C#, you could declare s as volatile (they guarantee all volatile reads have an acquire barrier). Alternatively, you could include an explicit memory barrier or hold a lock while reading s .

Same Mistake, Different Pile

I made the same mistake, failing to include an acquire barrier, in the PerishableCollection class (before fix). Here’s slightly simplified code containing the problem:

public IEnumerable<T> CurrentItems_WRONG() { for (var n = _head.Next; n != _head; n = n.Next) yield return n.Value; } public void Add(T item) { var itemNode = new Node { Value = max, Next = _head }; lock (_head) { itemNode.Prev = _head.Prev; itemNode.Prev.Next = itemNode; _head.Prev = itemNode; } }

Notice that there’s no acquire barrier when iterating the items in the above code. Reading the Value and Next fields might result in garbage, even if the producer did everything right.

The above code can be fixed (in C#) by declaring the Next field to be volatile , which introduces acquire barriers during the enumeration. It also introduces equally-important paired release barriers when initializing a new node and inserting it before _head . Alternatively, we can hold the lock while reading the Next pointer or during the entire enumeration.

This sort of bug is particularly nasty because (and someone please correct me if I’m wrong) you can’t reproduce it on an x86 machine. x86 doesn’t allow the reorderings (see Section 8.2 from the Intel® 64 and IA-32 Architectures Software Developer’s Manual) that cause the bug. Even on machines where the bug can occur, actually catching it can be difficult. Who knows how long it would have gone undetected if I hadn’t caught it in my sleep.

Summary

If you haven’t paired every write-release barrier with a read-acquire barrier, you’ve probably introduced a bug. Apparent re-orderings can happen both on the producer side and on the consumer side.

Update: I should really point out that I’m in no way an expert at writing concurrent code, and that concurrency has this nasty habit of biting non-experts. A comment on Reddit explains a few omissions / errors.

—

—

I’m now on Twitter: @CraigGidney

—



Twisted Oak Studios offers consulting and development on high-tech interactive projects. Check out our portfolio, or Give us a shout if you have anything you think some really rad engineers should help you with.



Archive