Observation:

Co-routines are to state machines what recursion is to stacks

When you have to traverse some sort of a nested data structure (say, a binary tree), one approach is to create a stack that remembers where in the tree you are. Another, much more elegant approach, is to write the function recursively. A recursive function employs the machine stack used to implicitly implement function calls - you get the benefits of the stack without paying the cost of reduced readability.

In this article I'll try to show, using a simple, yet very realistic example why co-routines do the same to state machines.

The problem - serial framing I've written a detailed article about framing earlier this month. The simple summary is: we have an endless incoming stream of bytes, from which we need to deduce structured data frames. That is, we have to find where a frame starts, where it ends and what is the data it carries. For this purpose we use a special header value, footer value and an escape byte (DLE). A complete Python implementation is described here, but in this article I will present the solution in a simplified manner, keeping all irrelevant details out.

The state machine Given a stream and receiving one byte at a time, here is the state machine that describes the framing process: Only inputs and state transitions are shown. The framing process outputs complete frames when moving from the IN_MSG state to the WAIT_HEADER stage (this happens when a footer is received)

Implementing the state machine Here's an implementation of this state machine in Python. The internal state is kept in an object: class ProtocolWrapper ( object ): def __init__ ( self , header= '\x61' , footer= '\x62' , dle= '\xAB' , after_dle_func= lambda x: x): self .header = header self .footer = footer self .dle = dle self .after_dle_func = after_dle_func self .state = self .WAIT_HEADER self .frame = '' # internal state (WAIT_HEADER, IN_MSG, AFTER_DLE) = range ( 3 ) def input ( self , byte): """ Receive a byte. If this byte completes a frame, the frame is returned. Otherwise, None is returned. """ if self .state == self .WAIT_HEADER: if byte == self .header: self .state = self .IN_MSG self .frame = '' return None elif self .state == self .IN_MSG: if byte == self .footer: self .state = self .WAIT_HEADER return self .frame elif byte == self .dle: self .state = self .AFTER_DLE else : self .frame += byte return None elif self .state == self .AFTER_DLE: self .frame += self .after_dle_func(byte) self .state = self .IN_MSG return None else : raise AssertionError() Note that the code of the input method closely follows the state diagram. This is how implementations of state machines are - it's generally difficult to understand what's going on in the code without having some sort of a state diagram in front of your eyes. In this case the state machine has just 3 states, but it can be easily 20 for more complex needs. Understanding such a state function with 20 states is impossible without a diagram. Anyhow, here's some test code that simulates a stream of data with a couple of frames and invalid data in between: bytes = '' .join( chr (b) for b in [ 0 x70, 0 x24, 0 x61, 0 x99, 0 xAF, 0 xD1, 0 x62, 0 x56, 0 x62, 0 x61, 0 xAB, 0 xAB, 0 x14, 0 x62, 0 x7 ]) pw = ProtocolWrapper() for byte in bytes: frame = pw.input(byte) if frame: print 'Got frame:' , frame.encode( 'hex' ) This prints: Got frame: 99afd1 Got frame: ab14

Co-routines I don't intend to teach the theory behind co-routines here, and I'll assume at least a basic familiarity with the concept. My goal is to show a real-life, relevant example that demonstrates how co-routines relate to state machines. This link is a good tutorial on co-routines (in C, of all languages), and there's of course Wikipedia and C2. But the absolutely best tutorial, with focus on Python, is David Beazley's presentation from this year's PyCon: A curious course on coroutines and concurrency. It is while reading this tutorial that the connection finally 'clicked' in my head. It is most highly recommended . If there's one description of co-routines you should remember while reading this article and later, it is that co-routines save the control state of a function between calls. Kinda like recursion - you know exactly where are you going to return after a function call. When you call a co-routine, it doesn't start all over from the beginning. Rather, it starts from right after where it returned (yielded control) the previous time it was called. This also explains why co-routines can replace state machines. The input method of ProtocolWrapper is invoked multiple times. Since it's a "normal" function, it begins running from its first line for each invocation. This is why it needs to keep a state machine - to know it's current "place in the world" when the next byte is received. With co-routines this isn't necessary - co-routines start exactly where they stopped the previous time they were called - so no state keeping is required!

Using co-routines for framing Without further ado, here is the co-routine implementation of the framing problem: @coroutine def unwrap_protocol (header= '\x61' , footer= '\x62' , dle= '\xAB' , after_dle_func= lambda x: x, target= None ): """ Simplified framing (protocol unwrapping) co-routine. """ # Outer loop looking for a frame header # while True : byte = ( yield ) frame = '' if byte == header: # Capture the full frame # while True : byte = ( yield ) if byte == footer: target.send(frame) break elif byte == dle: byte = ( yield ) frame += after_dle_func(byte) else : frame += byte Look how simple and elegant it is. You can tell immediately what it does just by looking at the source code - no state diagrams are needed. We loop over frames. A frame starts with a header byte. After a header byte has been received, we accumulate the bytes of the frame until a footer is encountered. The (yield) calls is where the magic is. The function suspends at these points until it is called again . Then, the value passed in the new call is returned from (yield) and the co-routine proceeds from the same place. Note how the state machine is implicitly embedded in this code. It's there, but you don't see it - it's hiding in the control structures (the IFs, ELSEs and the WHILEs) of the function. When a complete frame is received, it is sent to the target of the co-routine, which may process it at will. After executing send , the co-routine breaks out of the inner loop and suspends waiting for a new header in the outer loop. The @coroutine decorator is a simple utility required for Python co-routines: def coroutine (func): def start (*args,**kwargs): cr = func(*args,**kwargs) cr.next() return cr return start This is needed to bring a co-routine to its first yield and suspend there. You can just use this decorator without worrying about the details, until you become more comfortable with the concept to understand the exact inner workings described in PEP 342. To test this co-routine implementation we also need a simple "sink" co-routine (using Dave Beazley's terminology from his presentation). This will be the receiver of the send calls made by our co-routine: @coroutine def frame_receiver (): """ A simple co-routine "sink" for receiving full frames. """ while True : frame = ( yield ) print 'Got frame:' , frame.encode( 'hex' ) bytes = '' .join( chr (b) for b in [ 0 x70, 0 x24, 0 x61, 0 x99, 0 xAF, 0 xD1, 0 x62, 0 x56, 0 x62, 0 x61, 0 xAB, 0 xAB, 0 x14, 0 x62, 0 x7 ]) unwrapper = unwrap_protocol( target=frame_receiver()) for byte in bytes: unwrapper.send(byte) Prints: Got frame: 99afd1 Got frame: ab14