ace conditions are an inherent part of parallel programming. A race condition exists any time a program's behavior may depend on the relative ordering of events on separate threads. In the vast majority of cases, race conditions are harmlessthe program works regardless of which thread gets a lock first, or which thread processes a chunk of data. In some cases, however, race conditions can cause problems.

The main danger with race conditions is that by their very nature they are timing dependent. This becomes problematic when one thread executes a particular piece of code while another thread is executing a different piece of code. If the pieces of code in question are very small (only one or two CPU instructions, for example) and occur very rarely, then the race condition might not show up very often, and you may miss it entirely during testing. In fact, the conditions necessary for the problem to occur may not manifest at all during testing.

For example, if your test system has only one CPU, then threads cannot really execute in parallel. You must interleave them. This lack of true concurrency means that some potential race condition problems just cannot occur. The same problem exists to a lesser extent if you test code on a system with a small number of CPUs (for example, on a dual-core desktop machine) when the problematic conditions can happen only with a higher level of parallelism (for example, on a 64-CPU server machine).

This article demonstrates how race conditions in general and C++0x data races in particular can cause real problems in parallel code and offers some tips for preventing them.

Locks Cannot Prevent Race Conditions

Protecting your data with a mutex lock does not guarantee that your code will be free from problematic race conditions, even if you obsessively ensure that the data is accessed only while the lock is held. If the lock is at the wrong level of granularity, or the scope of the lock is wrong, then problematic race conditions can still occur.

For example, consider a simple data structure that contains a list of items and a count of the items in the list. If you protect each part of the data structure with its own mutex, you can still get race conditions even though everything is nominally synchronized. Because the parts are protected with individual mutexes, you must update them separately. This means that at certain points you will have updated one and not the other (for example, you have added a new item to the list but have not yet updated the count). Thus, when another thread accesses the data structure, it will see the two parts of the data structure as out of sync with each other.

The solution in this case is obvious: use a single mutex to protect the entire data structure. In more complex cases, it can be much harder to identify scenarios where there may be a race condition, and eliminating the race condition may require more extensive changes, such as changes to the interface.