This is the story how I have (not) solved a race condition that impacts QWaitCondition and is also present on every other condition variable implementations (pthread, boost, std::condition_variable). bool QWaitCondition::wait(int timeout) is supposed to return true if the condition variable was met and false if it timed out. The race is that it may return false (for timeout) even if it was actually woken up. The problem was already reported in 2012. But I only came to look at it when David Faure was trying to fix another bug in QThreadPool that was caused by this race.

The problem in QThreadPool

When starting a task, QThreadPool did something along the lines of:

QMutexLocker locker (& mutex ); taskQueue .append( task ); // Place the task on the task queue if ( waitingThreads > 0 ) { // there are already running idle thread. They are waiting on the 'runnableReady' // QWaitCondition. Wake one up them up. waitingThreads --; runnableReady . wakeOne (); } else if ( runningThreadCount < maxThreadCount ) { startNewThread ( task ); }

And the the thread's main loop looks like this:

void QThreadPoolThread :: run () { QMutexLocker locker (& manager -> mutex ); while (true) { /* ... */ if ( manager -> taskQueue .isEmpty()) { // no pending task, wait for one. bool expired = ! manager -> runnableReady . wait ( locker .mutex(), manager -> expiryTimeout ); if ( expired ) { manager -> runningThreadCount --; return; } else { continue; } } QRunnable * r = manager -> taskQueue .takeFirst(); // run the task locker .unlock(); r ->run(); locker .relock(); } }

The idea is that the thread will wait for a given amount of second for a task, but if no task was added in a given amount of time, the thread expires and is terminated. The problem here is that we rely on the return value of runnableReady . If there is a task that is scheduled at exactly the same time as the thread expires, then the thread will see false and will expire. But the main thread will not restart any other thread. That might let the application hang as the task will never be run.

The Race

Many of the implementations of a condition variable have the same issue.

It is even documented in the POSIX documentation:

[W]hen pthread_cond_timedwait() returns with the timeout error, the associated predicate may be true due to an unavoidable race between the expiration of the timeout and the predicate state change.

pthread documentation describes it as an unavoidable race. But is it so? The wait condition is associated with a mutex, which is locked by the user when calling wake() and that is also passed locked to wait() . The implementation is supposed to unlock and wait atomically.

The C++11 standard library's condition_variable even has an enum ( cv_status ) for the return code. The C++ standard does not document the race, but all the implementations I have tried suffer from the race. (No implementations are therefore conform.)

Let me try to explain the race better: this code show a typical use of QWaitCondition

Thread 1 Thread 2 mutex .lock(); if(! ready ) { ready = true; condition . wakeOne (); } mutex .unlock(); mutex .lock(); ready = false; bool success = condition . wait (& mutex , timeout ); assert ( success == ready ); mutex .unlock();

The race is that the wait condition in Thread2 timeout and returns false, but at the same time, Thread1 wakes the condition. One could expect that since everything is protected by a mutex, this should not happen. Internally, the wait condition unlocks the internal mutex, but does not check that it has not been woken up once the user mutex is locked again.

QWaitCondition has internal state that counts the number of waiting QWaitCondition and the number of QWaitCondition that are waiting to be woken up.

Let's review the actual code of QWaitCondition (edited for readability)

bool QWaitCondition:: wait (QMutex * mutex , unsigned long time ) { // [...] pthread_mutex_lock(&d-> mutex ); ++d-> waiters ; mutex ->unlock(); // (simplified for briefty) int code = 0 ; do { code = d-> wait_relative ( time ); // calls pthread_cond_timedwait } while ( code == 0 && d-> wakeups == 0 ); --d-> waiters ; if ( code == 0 ) --d-> wakeups ; // [!!] pthread_mutex_unlock(&d-> mutex ); mutex ->lock(); return code == 0 ; } void QWaitCondition:: wakeOne () { pthread_mutex_lock(&d-> mutex ); d-> wakeups = qMin(d-> wakeups + 1 , d-> waiters ); pthread_cond_signal(&d-> cond ); pthread_mutex_unlock(&d-> mutex ); }

Notice that d->mutex is a native pthread mutex, while the local variable mutex is the user mutex. In the line marked with [!!] we effectively take the right to wake up. But we do that before locking the user's mutex. What if we checked again for waiters under the user's lock?

Attempt 1: check again under the user's lock

bool QWaitCondition:: wait (QMutex * mutex , unsigned long time ) { // Same as before: pthread_mutex_lock(&d-> mutex ); ++d-> waiters ; mutex ->unlock(); int code = 0 ; do { code = d-> wait_relative ( time ); // calls pthread_cond_timedwait } while ( code == 0 && d-> wakeups == 0 ); // --d->waiters; // Moved bellow if ( code == 0 ) --d-> wakeups ; pthread_mutex_unlock(&d-> mutex ); mutex ->lock(); // Now check the wakeups again: pthread_mutex_lock(&d-> mutex ); --d-> waiters ; if ( code != 0 && d-> wakeups ) { // The race is detected, and corrected --d-> wakeups ; code = 0 ; } pthread_mutex_unlock(&d-> mutex ); return code == 0 ; }

And there we have fixed the race! We just had to lock the internal mutex again because d->waiters and d->wakeups need to be protected by it. We needed to unlock it because locking the user's mutex with the internal mutex locked would potentially cause deadlock as lock order would not be respected.

However, we now have introduced another problem: If there are three threads, a thread may be woken up before

// Thread 1 // Thread 2 // Thread 3 mutex->lock() cond->wait(mutex); mutex->lock() cond->wake(); mutex->unlock() mutex->lock() cond->wait(mutex, 0);

We don't want that the Thread 3 steal the signal from the Thread 1. But that can happen if the Thread 1 is sleeping a bit too long and do not manage to lock the internal mutex in time before Thread 3 expires.

The only way to solve this problem would be if we could order the thread by the time they started to wait.

Inspired by the bitcoin's blockchain, I created a linked list of nodes on the thread's stack that represent the order. When a thread is starting to wait, it adds itself at the end of the double linked list. When a thread is waking other thread, it marks the last node of the linked list. (by incrementing a woken counter inside the node). When a thread is timing out, it checks if it was marked, or any other thread after him in the linked list. We only solve the race in that case, otherwise we consider it is a timeout.

You can see the patch on the code review tool.

Performance

This patch adds quite a bit of code to add and remove nodes in the linked list, and also to go over the list to check if we were indeed woken up. The linked list is bound by the number of waiting thread. I was expecting that this linked list handling would be negligible compared to the other cost of QWaitCondition

However, the results of the QWaitCondition benchmark show that, with 10 threads and high contention, we have a ~10% penalty. With 5 threads there is ~5% penalty.

Is it worth it to pay this penalty to solve the race? So far, we decided not to merge the patch and keep the race.

Conclusion

Fixing the race is possible, but has a small performance impact. None of the implementations attempt to fix the race. I wonder why there is even a returned status at all if you cannot rely on it.