Threads, Interrupts, Transactions #

hacking, April 30th 2007

I've recently been working on SBCL thread and interrupt safety. The good news is that at least on x86(-64)/Linux my trust in the overall stability of threaded SBCL has increased vastly: a number of apparently small fixes that have gone in make a big difference — and the work currently under way should smooth the remaining known issues.

Thread safety is simple if tricky to get right. Any half-way decent CS introduction can tell you you about it, and plenty of literature exist.

Interrupt safety is another matter. There is (or at least I have been unable to find, hints welcome!) almost no prior art on making systems that are both interrupt safe and responsive to interrupts. For the purposes of this discussion I define an interrupt as asynchronous event that can cause execution of arbitrary code in the context of the running thread — including unwinding.

The problem is that interrupt safety doesn't compose. Even though any halfway sane Lisp system will have key internals interrupt proofed (allocation, etc), this doesn't mean user code can rest on the merits of the underlying system:

(defun call-with-foo (function) (let (foo) (unwind-protect (progn (setf foo (get-foo)) ; 1 (funcall function foo)) (when foo ; 2 (release-foo foo)))))

Assume that GET-FOO and RELEASE-FOO are interrupt safe. Even so, CALL-WITH-FOO is not interrupt safe.

If an interrupt arrives and causes an unwind after GET-FOO has returned, but before the variable has been assigned, we have a leak. If an interrupt arrives and causes an unwind during the execution of the cleanup forms, we have a leak.

Being the clever sort, we fix our code:

(defun call-with-foo (function) (let (foo) (without-interrupts (unwind-protect (progn ;; For simplicity's sake call GET-FOO with ;; interrupts disabled. There are safe ways around ;; this, but let's not go there now. (setf foo (get-foo)) (with-interrupts (funcall function foo))) (when foo (release-foo foo))))))

We so rock. ...but that is not the end of the story: any caller of CALL-WITH-FOO still needs to worry about interrupts:

(defun queue-result () (let ((result (call-with-foo (lambda (foo) (pop-result foo))))) (enqueue result) result))

Again, assume that all the individual components are interrupt safe. QUEUE-RESULT isn't — at least not if we assume a destructive POP-RESULT . Should an interrupt arrive and unwind us at any point between the pop and the ENQUEUE call, we have lost the result and cannot ever recover it. ...so, being the clever sort we interrupt proof our code. Can you guess where this story goes?

Safe points are not an answer. Safe point is analogous to the WITH-INTERRUPTS in CALL-WITH-FOO : a place where we know that it is locally safe to receive an interrupt. Things still do not compose: if an interrupt arrives after the FUNCALL there, the result may still be lost. "Oh, but let's, like, put safe points only in you know, safe places. Like before the FUNCALL ." Nice effort, but no cookie: (1) the system is no longer responsive as the function call in CALL-WITH-FOO now doesn't respond to interrupts. (2) QUEUE-RESULT still needs to ensure its own interrupt-safety, so composability has not been achieved.

So, we have established that interrupts are nasty, horrible, and need to be dealt with all over the place. If you dont' trust this informal exploration, Fare Rideaus thesis (not submitted, AFAIK) has a more formal approach to this.

One way around this is to do what Windows does: conclude that since interrupts are so horrible we have no truck with them. This is certainly valid, and I have plans to make this approach possible in SBCL (make a knob that says "no asynch events, please", so that those wanting to bullet-proof their production systems can.)

This is not a good solution overall, though. We sort of like the fact that C-c at the terminal gives us the debugger, and synchonizing events means you can't program in an interrupt-driven style.

Another solution is to do away with side-effects. If there is no state we cannot mess it up. Fine, but not exactly feasible for Common Lisp.

Lisp is supposed to be able to do better then this, really. Gabor Melis had an idea that makes sense to me: get with the program and provide software transactional memory. Don't ask me about the details, I don't have any yet — but the basic idea is obvious: unwinding from a transaction restores state, so why not use them to provide interrupt safety in addition to thread safety...