I have read a couple of posts on memory models over the couple of weeks: one from Jeremy Manson on What Volatile Means in Java, and one from Bartosz Milewski entitled Who ordered sequential consistency?. Both of these cover a Sequentially Consistent memory model — in Jeremy's case because sequential consistency is required by the Java Memory Model, and in Bartosz' case because he's explaining what it means to be sequentially consistent, and why we would want that.

In a sequentially consistent memory model, there is a single total order of all atomic operations which is the same across all processors in the system. You might not know what the order is in advance, and it may change from execution to execution, but there is always a total order.

This is the default for the new C++0x atomics, and required for Java's volatile , for good reason — it is considerably easier to reason about the behaviour of code that uses sequentially consistent orderings than code that uses a more relaxed ordering.

The thing is, C++0x atomics are only sequentially consistent by default — they also support more relaxed orderings.

Relaxed Atomics and Inconsistent Orderings

I briefly touched on the properties of relaxed atomic operations in my presentation on The Future of Concurrency in C++ at ACCU 2008 (see the slides). The key point is that relaxed operations are unordered. Consider this simple example with two threads:

#include <thread> #include <cstdatomic> std::atomic<int> x(0),y(0); void thread1() { x.store(1,std::memory_order_relaxed); y.store(1,std::memory_order_relaxed); } void thread2() { int a=y.load(std::memory_order_relaxed); int b=x.load(std::memory_order_relaxed); if(a==1) assert(b==1); } std::thread t1(thread1); std::thread t2(thread2);

All the atomic operations here are using memory_order_relaxed , so there is no enforced ordering. Therefore, even though thread1 stores x before y , there is no guarantee that the writes will reach thread2 in that order: even if a==1 (implying thread2 has seen the result of the store to y ), there is no guarantee that b==1 , and the assert may fire.

If we add more variables and more threads, then each thread may see a different order for the writes. Some of the results can be even more surprising than that, even with two threads. The C++0x working paper features the following example:

void thread1() { int r1=y.load(std::memory_order_relaxed); x.store(r1,std::memory_order_relaxed); } void thread2() { int r2=x.load(std::memory_order_relaxed); y.store(42,std::memory_order_relaxed); assert(r2==42); }

There's no ordering between threads, so thread1 might see the store to y from thread2 , and thus store the value 42 in x . The fun part comes because the load from x in thread2 can be reordered after everything else (even the store that occurs after it in the same thread) and thus load the value 42! Of course, there's no guarantee about this, so the assert may or may not fire — we just don't know.

Acquire and Release Ordering

Now you've seen quite how scary life can be with relaxed operations, it's time to look at acquire and release ordering. This provides pairwise synchronization between threads — the thread doing a load sees all the changes made before the corresponding store in another thread. Most of the time, this is actually all you need — you still get the "two cones" effect described in Jeremy's blog post.

With acquire-release ordering, independent reads of variables written independently can still give different orders in different threads, so if you do that sort of thing then you still need to think carefully. e.g.

std::atomic x(0),y(0); void thread1() { x.store(1,std::memory_order_release); } void thread2() { y.store(1,std::memory_order_release); } void thread3() { int a=x.load(std::memory_order_acquire); int b=y.load(std::memory_order_acquire); } void thread4() { int c=x.load(std::memory_order_acquire); int d=y.load(std::memory_order_acquire); }

Yes, thread3 and thread4 have the same code, but I separated them out to make it clear we've got two separate threads. In this example, the stores are on separate threads, so there is no ordering between them. Consequently the reader threads may see the writes in either order, and you might get a==1 and b==0 or vice versa, or both 1 or both 0. The fun part is that the two reader threads might see opposite orders, so you have a==1 and b==0 , but c==0 and d==1 ! With sequentially consistent code, both threads must see consistent orderings, so this would be disallowed.

Summary

The details of relaxed memory models can be confusing, even for experts. If you're writing code that uses bare atomics, stick to sequential consistency until you can demonstrate that this is causing an undesirable impact on performance.

There's a lot more to the C++0x memory model and atomic operations than I can cover in a blog post — I go into much more depth in the chapter on atomics in my book.

Posted by Anthony Williams

[/ threading /] permanent link

Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.