Do other async libraries give any useful hints on what to do? Not really, unfortunately.

Twisted by default registers a signal handler for control-C that triggers a clean shutdown of their event loop. This means that control-C won't work if your Twisted program runs away in an infinite loop that never yields to the event loop, and even if does work then any callback chains or coroutines that are in progress will get abruptly abandoned, but it will at least run any registered shutdown callbacks . It's not bad, it can be made to work, but doing so is tricky and there are limitations. Trio's motto is "make it easy to get things right", so we'd like to do better.

You can implement the Twisted-style behavior on these systems by manually registering your own signal handler that triggers some graceful shutdown logic, but all in all it's not very user friendly, and has the same limitations. (The asyncio developers have even considered making the Twisted-style behavior the default, but are unhappy about the side-effects and haven't reached consensus on a solution.)

I also looked at tornado, asyncio, curio, and gevent, but (as of April 2017) they're even less sophisticated than twisted: by default they don't do any special handling for keyboard interrupts at all, so hitting control-C may or may not blow up their event loop internals in a graceless fashion; in particular, any callback chains or coroutines you have running are likely to be abruptly abandoned, with no chance to even run their finally blocks, and it's entirely possible that you'll hit a deadlock or something, who knows. And as an additional wrinkle, at least asyncio has some problems handling control-C on Windows. (Checked with asyncio in CPython 3.6.1; I didn't check the other projects at all.) For example, if you run this program then be prepared to kill it with the task manager or something, because your control-C has no power here:

We do have one example of a program that implements the semantics we want: the Python interpreter itself. How does it work? Let's walk through it.

Control-C handling starts when the operating system detects a control-C and informs the interpreter. The way it does this is by running whatever signal handler was previously registered to handle the SIGINT signal. Conceptually, this is similar to how signal.signal works, but technically it's very different because signal.signal takes a Python function to be run when a signal arrives, and the operating system APIs only let you register a C function to be run when a signal arrives. (Note that here we're talking about "C" the language – that it uses the same letter as control-C is just a coincidence.) So if you're implementing a Python interpreter, that's your challenge: write a function in C that causes the Python signal handler function to be run. Once you've done that, you're basically done; to get Python's default behavior you just have to install a default handler that looks like:

def default_sigint_handler ( * args ): raise KeyboardInterrupt

and then if the user wants to override that with something fancier, they can.

But implementing the C-level handler turns out to be trickier than you might think, for the same basic reason we keep running into: control-C can happen at any moment. On Unix, signal delivery is done by hijacking a thread, essentially pausing it in between two assembly instructions and inserting a call to a C function that was registered as a signal handler. (What if the thread isn't running any assembly instructions, because it's blocked in a syscall inside the kernel? Then the kernel unceremoniously cancels that syscall – making it return the special error code EINTR – and this forces the thread back into userspace so it can be hijacked. Remember that stick we mentioned above? The kernel has a very big stick. This design is historically somewhat controversial .) On Windows, things are a bit more civilized and also more annoying: when the user hits control-C, a new thread spontaneously materializes inside our process and runs the C signal handler. On the one hand, this is an elegant re-use of an existing concept and avoids the whole weird hijacking thing. On the other hand, if you want to somehow poke the main thread to wake it up, then you're on your own – you have to build your own stick from scratch.

In any case, the end result of all this is that the C-level signal handler will get run, but this might happen a time when the interpreter is in some messy and inconsistent state. And in particular, this means that you can't simply have the C-level signal handler run the Python-level signal handler, because the interpreter might not be in a state where it can safely run Python code.

To see why this is a problem, let's look at an example from inside CPython. When raising an exception, Python keeps track of three things: the exception's type, value, and traceback. Here's the code from PyErr_SetExcInfo that CPython uses to record these (comments are mine; original is here):

/* Save the old exc_info values in temporary variables */ oldtype = tstate -> exc_type ; oldvalue = tstate -> exc_value ; oldtraceback = tstate -> exc_traceback ; /* Assign the new exc_info values */ tstate -> exc_type = p_type ; tstate -> exc_value = p_value ; tstate -> exc_traceback = p_traceback ; /* Drop the references to the old values */ Py_XDECREF ( oldtype ); Py_XDECREF ( oldvalue ); Py_XDECREF ( oldtraceback );

You'll notice this is written in a slightly complicated way, where instead of simply overwriting the old values, they get saved in temporaries etc. There are two reasons for this. First, we can't just overwrite the old values because we need to decrement their reference counts, or else we'll cause a memory leak. But we can't decrement them one by one as we assign each field, because Py_XDECREF can potentially end up causing an object to be deallocated, at which point its __del__ method might run, which is arbitrary Python code, and as you can imagine you don't want to start running Python code at a moment when an exception is only half raised. Before it's raised is okay, after it's raised is okay, but half-way raised, with sys.exc_info() only partially filled in? That's not going to end well. The CPython developers of course are aware of this, so they carefully wrote this function so that it assigns all of the values and puts the interpreter back into a sensible state before it decrements any of the reference counts.

But now imagine that a user is annoying (as users sometimes are) and hits control-C right in the middle of this, so that just as we're half-way through assigning the new values, the operating system pauses our code and runs the C signal handler. What happens? If the C-level signal handler runs the Python-level signal handler directly, then we have the same problem that we just so carefully avoided: we're running arbitrary Python code with an exception only half-raised. Even worse, this Python function probably wants to raise KeyboardInterrupt , which means that we end up calling PyErr_SetExcInfo to raise a second exception while we're half-way through raising the first. Effectively the code would end up looking something like:

/******************************************************************/ /* Raising the first exception, like a RuntimeError or whatever */ /* Save the old exc_info values in temporary variables */ oldtype1 = tstate -> exc_type ; oldvalue1 = tstate -> exc_value ; oldtraceback1 = tstate -> exc_traceback ; /* Assign the new exc_info values */ tstate -> exc_type = p_type1 ; /******************************************************************/ /* Surprise! Signal handler suddenly runs here, and calls this */ /* code again to raise a KeyboardInterrupt or something */ /* Save the old exc_info values in temporary variables */ oldtype2 = tstate -> exc_type ; oldvalue2 = tstate -> exc_value ; oldtraceback2 = tstate -> exc_traceback ; /* Assign the new exc_info values */ tstate -> exc_type = p_type2 ; tstate -> exc_value = p_value2 ; tstate -> exc_traceback = p_traceback2 ; /* Drop the references to the old values */ Py_XDECREF ( oldtype2 ); Py_XDECREF ( oldvalue2 ); Py_XDECREF ( oldtraceback2 ); /******************************************************************/ /* Back to the original call */ tstate -> exc_value = p_value1 ; tstate -> exc_traceback = p_traceback1 ; /* Drop the references to the old values */ Py_XDECREF ( oldtype1 ); Py_XDECREF ( oldvalue1 ); Py_XDECREF ( oldtraceback1 );

This would cause all kinds of chaos: notice that p_type2 overwrites p_type1 , but p_value1 overwrites p_value2 , so we might end up with a sys.exc_info() where the type is KeyboardInterrupt but the exception object is an instance of RuntimeError . The oldvalue1 and oldvalue2 temporaries end up referring to the same object, so we end up decrementing its reference count twice, even though we only had one reference; this probably leads to some kind of nasty memory corruption.

Clearly this isn't gonna work. The C-level signal handler cannot call the Python-level signal handler directly. Instead, it needs to use the same trick we discussed above: the C-level handler sets a flag, and the interpreter makes sure to check this flag regularly at moments when it knows that it can safely run arbitrary Python code.

Specifically, the way CPython does this is that in its core bytecode evaluation loop, just before executing each bytecode instruction, it checks to see if the C-level handler's flag was set, and if so then it pauses and invokes the appropriate Python handler. (After all, the moment when you're about to run an arbitrary opcode is by definition a moment when you can run some arbitrary Python code.) And then, if the Python-level handler raises an exception, the evaluation loop lets this exception propagate instead of running the next instruction. So a more complete picture of our chain of custody looks like this, with two branches depending on which kind of Python-level handler is currently set. (These correspond to the two strategies we described at the beginning.):

C-level handler --> bytecode eval loop sets flag checks flag & runs Python-level handler | \ | default Python-level handler | raises KeyboardInterrupt \ \ custom Python-level handler --> main loop sets another flag checks flag

But what if the eval loop isn't actually... looping? What if it's sitting inside a call to time.sleep or select.select or something? On Unix this is mostly taken care of automatically by the kernel – though at the cost of the interpreter needing annoying boilerplate every time it does an operating system call. On Windows, we're on our own. And unfortunately, there is no general solution, because, well, it's Windows, and the Windows low-level APIs wouldn't recognize "general" if it showed up in a uniform with stars on the shoulder. Windows has at least 4 qualitatively different methods for interrupting a blocking call, and any given API might respond to one, several, or none of them .

In practice CPython compromises and uses two mechanisms: the C-level handler can be configured to write to a file descriptor (which is useful for waking up calls that wait for a file descriptor to have data, like select), and on Windows it unconditionally fires an "event" object, which is a Windows-specific synchronization primitive. And some parts of CPython are written to check for this – for example the Windows implementation of time.sleep is written to wake up early if the event gets fired and check for signals. And that's why on Windows you can do time.sleep(99999) and then hit control-C to cancel it. But this is a bit hit-and-miss: for example, Python's implementation of select.select doesn't have any similar early-exit code, so if you run this code on Windows and hit control-C, then it will raise KeyboardInterrupt ... a month from now, give or take:

# If you run this on Windows, have the task manager ready sock = socket . socket () select . select ([ sock ], [], [], 2500000 )

The C-level signal handler runs and sets its flag, but the interpreter doesn't notice until the select call has finished. This explains why asyncio has problems – it blocks in select.select , not time.sleep . Which, I mean, that's what you want in an event loop, I'm not saying it should block in time.sleep instead, but if you're using select.select then Python's normal guarantees break down and asyncio isn't compensating for that.

So here's the final version of our chain-of-custody diagram for control-C in a generic Python program:

C-level handler --> bytecode eval loop sets flag checks flag & runs Python-level handler & writes to fd | \ (if enabled) | default Python-level handler & fires an event | raises KeyboardInterrupt (if on windows) \ \ custom Python-level handler --> main loop sets another flag checks flag

And now you know how the Python runtime handles control-C (usually) promptly and reliably, while protecting itself from getting into a broken state.

Of course, this doesn't really help the code that's running on top – if your Python code wants to avoid getting wedged in a broken state, it's on its own.

...Mostly. It turns out that that there are some details that can sometimes make our Python code a little more robust to KeyboardInterrupt s. There's no guarantee – remember, this is the 99% solution we're trying to implement – but if the interpreter can make it 99.9% instead of 99.0% without any extra work for users, then it's a nice thing to do (and we probably want to do the same thing in trio, if we can). So let's look at how these work.

Let's start with our example from above, of some code that isn't quite KeyboardInterrupt safe:

lock . acquire () try : ... finally : lock . release ()

First, what happens if KeyboardInterrupt is raised when we're half-way through running lock.acquire or lock.release ? Can we end up with our lock object in an inconsistent state where it's only "half-locked" (whatever that would even mean)?

Well, if our lock is an instance of the standard library's threading.Lock class, then it turns out we're safe! threading.Lock is implemented in C code, so its methods get the same kind of protection that PyErr_SetExcInfo does: you can get a KeyboardInterrupt before or after the call, but not during the call . Sweet.

What about a KeyboardInterrupt that happens between calling acquire and entering the try block, or between entering the finally block and calling release ? Well, in current CPython there's no way to eliminate this entirely, but it turns out that the bytecode eval loop has some tricks up its sleeve to make things less risky.

The first trick we'll examine is also the oldest, and probably the least useful. To see how this works, we need to look at how our example gets compiled down to bytecode instructions that run on CPython's virtual machine. (If you aren't familiar with CPython's bytecode, this is a great talk and will give you a good introduction.) Running this code:

import dis def f (): lock . acquire () try : pass finally : lock . release () dis . dis ( f )

prints a chunk of disassembled bytecode. I won't paste the whole thing, but it starts like:

2 0 LOAD_GLOBAL 0 (lock) 3 LOAD_ATTR 1 (acquire) 6 CALL_FUNCTION 0 (0 positional, 0 keyword pair) 9 POP_TOP 3 10 SETUP_FINALLY 4 (to 17)

The first four lines of bytecode correspond to the first line of our Python code, the call to lock.acquire() . Then SETUP_FINALLY marks the beginning of the try block. So danger here would be if a KeyboardInterrupt arrives in between the CALL_FUNCTION (where we actually acquire the lock) and the SETUP_FINALLY. Since signal handlers run in between opcodes, there are two places this could happen: between CALL_FUNCTION and POP_TOP, and between POP_TOP and SETUP_FINALLY.

Well, it turns out that way back in 2003, Guido added a bit of code to the bytecode eval loop to skip running signal handlers if the next opcode is SETUP_FINALLY, and it's still there today. This means that we can't get a KeyboardInterrupt in between POP_TOP and SETUP_FINALLY. It's... mostly useless? We can still get a KeyboardInterrupt in between CALL_FUNCTION and POP_TOP, and in fact the CALL_FUNCTION → POP_TOP case is much more likely to cause problems then that POP_TOP → SETUP_FINALLY case. The check after CALL_FUNCTION notices any signals that arrived during CALL_FUNCTION, which can take an arbitrarily long time; the check after POP_TOP only notices signals that arrived during POP_TOP, and POP_TOP is an extremely fast opcode – basically just a few machine instructions. In fact it's so fast that the interpreter usually doesn't bother to check for signals after it anyway because the check would add substantial overhead , so in our example this special case doesn't really accomplish anything at all.

The one case I can think of where the SETUP_FINALLY special case might be useful is in code like:

SOME_VAR = True try : ... finally : SOME_VAR = False

because if you look at how this compiles to bytecode, the assignment ends up being a single opcode that comes right before the SETUP_FINALLY. But fundamentally, this strategy can't really work: there's generally going to be some sort of logically atomic operation before each try / finally pair that shouldn't be interrupted by signals, but there's no way for the interpreter to figure out where the start of that of that logical operation is. That information just isn't recorded in the source code.

Except... sometimes it is, which leads to another trick the interpreter pulls. Back in 2003 try / finally was all we had, but in modern Python, a nicer way to write our example would be:

with lock : ...

Of course it's well documented that this is just syntactic sugar for something like:

# simplified but gives the idea, see PEP 343 for the full details lock . __enter__ () try : ... finally : lock . __exit__ ( ... )

This looks pretty similar to our problematic code above, so one would think that the with version has the same problems. But it turns out this is not quite true – not only is the with version nicer to look at it than the try / finally version, it actually makes stronger guarantees about KeyboardInterrupt safety!

Again, let's look at the bytecode:

import dis def f (): with lock : pass dis . dis ( f )

2 0 LOAD_GLOBAL 0 (lock) 3 SETUP_WITH 5 (to 11) 6 POP_TOP 3 7 POP_BLOCK 8 LOAD_CONST 0 (None) >> 11 WITH_CLEANUP_START 12 WITH_CLEANUP_FINISH 13 END_FINALLY

The key thing we learn here is that entering a with block is done via SETUP_WITH and exiting is done via WITH_CLEANUP_START. If we consult Python/ceval.c in the CPython source, it turns out that SETUP_WITH is a single opcode that both calls lock.__enter__ and also sets up the invisible try block, and WITH_CLEANUP_START is a single opcode that both marks the beginning of the invisible finally block and also calls lock.__exit__ . And the crucial thing for us is that since the interpreter only runs Python-level signal handlers in between opcodes, this means it's now impossible for a KeyboardInterrupt to arrive in between calling lock.__enter__ and entering the try block, or in between entering the finally block and calling lock.__exit__ .

Basically, the key thing about with blocks is that they tell the interpreter where the boundary of the critical operations are (they're whatever __enter__ and __exit__ do) so a solution becomes possible in principle; then threading.Lock.__enter__ is implemented in C so it's atomic itself, and the design of the with opcodes rules out the two remaining problematic cases: KeyboardInterrupt after acquiring the lock but entering the try , and KeyboardInterrupt after entering the finally but before releasing the lock. Hooray, we're safe!

...almost. Now we can't have a KeyboardInterrupt between entering the finally block and releasing the lock. But that's not really what we want. We want to make sure we can't have a KeyboardInterrupt between exiting the try block and releasing the lock. But wait, you might think. This is really splitting hairs – just look at the source code, the end of the try block and the start of the finally block are the same thing!

Well, yeah, that would make sense... but if we look at the bytecode, we can see that this isn't quite true: the POP_BLOCK instruction at offset 7 is the end of the try block, and then we do a LOAD_CONST before we reach the WITH_CLEANUP_START at offset 11, which is where the finally block starts.

The reason the bytecode is written like this is that when the interpreter gets to the finally block hidden inside WITH_CLEANUP_START, it needs to know whether it arrived there because an exception was thrown or because the try block finished normally. The LOAD_CONST leaves a special value on the stack that tells WITH_CLEANUP_START that we're in the latter case. But for present purposes the reason doesn't really matter... the end result is that there's this gap, where if we get a KeyboardInterrupt raised in between the POP_BLOCK and LOAD_CONST, or in between the LOAD_CONST and WITH_CLEANUP_START, then it will propagate out of the with block without calling __exit__ at all. Oops!