I've tried to implement Haskell Control.Concurrent.MVar that resides in shared memory and allows communicating between multiple independent processes/programs using POSIX functionality. But I have failed with lots of deadlocks.

The problem is that pthread_cond_timedwait sometimes does not return being called within GHC FFI (albeit interruptible or unsafe ). After a few days of desperate attempts to resolve the problem, I decided to minify the code and ask community to help. Unfortunately, I could not condense the problem into a few lines of code pastable in here. Therefore, I stored the (as small as possible) code on github together with the instructions on how to replicate the problem here is a permalink to the current state of it ( mvar-fail branch).

In the essence, the functions to take and put mvar look like this:

int mvar_take(MVar *mvar, ...) { pthread_mutex_timedlock(&(mvar->statePtr->mvMut), &timeToWait); while ( !(mvar->statePtr->isFull) ) { pthread_cond_signal(&(mvar->statePtr->canPutC)); pthread_cond_timedwait(&(mvar->statePtr->canTakeC), &(mvar->statePtr->mvMut), &timeToWait); } memcpy(localDataPtr, mvar->dataPtr, mvar->statePtr->dataSize); mvar->statePtr->isFull = 0; pthread_mutex_unlock(&(mvar->statePtr->mvMut)); } int mvar_put(MVar *mvar, ...) { pthread_mutex_timedlock(&(mvar->statePtr->mvMut), &timeToWait); while ( mvar->statePtr->isFull ) { pthread_cond_signal(&(mvar->statePtr->canTakeC)); pthread_cond_timedwait(&(mvar->statePtr->canPutC), &(mvar->statePtr->mvMut), &timeToWait); } memcpy(mvar->dataPtr, localDataPtr, mvar->statePtr->dataSize); mvar->statePtr->isFull = 1; pthread_mutex_unlock(&(mvar->statePtr->mvMut)); }

(Plus error checking and printfs after every command). Full code for mvar_take . The initialization happens as follows:

pthread_mutexattr_init(&(s.mvMAttr)); pthread_mutexattr_settype(&(s.mvMAttr), PTHREAD_MUTEX_ERRORCHECK); pthread_mutexattr_setpshared(&(s.mvMAttr), PTHREAD_PROCESS_SHARED); pthread_mutex_init(&(s.mvMut), &(s.mvMAttr)); pthread_condattr_init(&(s.condAttr)); pthread_condattr_setpshared(&(s.condAttr), PTHREAD_PROCESS_SHARED); pthread_cond_init(&(s.canPutC), &(s.condAttr)); pthread_cond_init(&(s.canTakeC), &(s.condAttr));

Full code. The Haskell part looks like this:

foreign import ccall interruptible "mvar_take" mvar_take :: Ptr StoredMVarT -> Ptr a -> CInt -> IO CInt foreign import ccall interruptible "mvar_put" mvar_put :: Ptr StoredMVarT -> Ptr a -> CInt -> IO CInt takeMVar :: Storable a => StoredMVar a -> IO a takeMVar (StoredMVar _ fp) = withForeignPtr fp $ \p -> alloca $ \lp -> do r <- mvar_take p lp if r == 0 then peek lp else throwErrno $ "takeMVar failed with code " ++ show r putMVar :: Storable a => StoredMVar a -> a -> IO () putMVar (StoredMVar _ fp) x = withForeignPtr fp $ \p -> alloca $ \lp -> do poke lp x r <- mvar_put p lp unless (r == 0) $ throwErrno $ "putMVar failed with code " ++ show r

Full code. Changing FFI from interruptible to unsafe does not prevent the deadlock. Sometimes the deadlock happens every second run, sometimes it happens after 50 runs only (and the rest is executed as expected).

My guess is that GHC might interfere the work of POSIX mutexes with some OS signal handling, but I don't know GHC internals enough to verify it.

Is that me doing something stupidly wrong, or do I need to add some special tricks to make it work inside GHC FFI?

P.S.: the last version of README with my investigations is available at interprocess mvar-fail .

UPDATE 13.06.2018: I tried to temporarily block all OS signals by surrounding function code with following:

sigset_t mask, omask; sigfillset(&mask); sigprocmask(SIG_SETMASK, &mask, &omask); ... sigprocmask(SIG_SETMASK, &omask, NULL);

This did not help.