posted on September 08, 2009 - tagged as: python

One of the many things I like about Python is generators. They allow for the creation of iterators without the boilerplate imposed by Java or PHP. Furthermore, iterators can be thought of as streams, and many times we don’t really want to default to eager behavior (maybe eager behavior should be demanded ). For example, consider a class implementing a very simple binary tree:

class BinaryTree : def __init__ ( self , data , left = None , right = None ): self . data = data self . left = left self . right = right def __unicode__ ( self ): return ' %s ' % self . data

More of a struct, this simple class has slots (instance variables) for the left sub-node, the right sub-node, and the data. When we first encounter a structure like this in, say, a data structure course, our inclination is to traverse the thing. Remember, we can do this any number of ways: depth-first, breadth-first, pre-order, post-order (for the traversals in this article, I will only be concerned with node data, but all the algorithms can easily be modified to yield the nodes themselves). If we want to write a very simple, eager, depth-first (and also pre-order) traversal of a tree like this, we can do something as follows:

def recursive_dfs ( tree ): nodes = [] if ( tree != None ): nodes . append ( tree . data ) nodes . extend ( recursive_dfs ( tree . left )) nodes . extend ( recursive_dfs ( tree . right )) return nodes

This is a great first step, but, as already mentioned, it is eager. By calling this function, we get a complete list of all the nodes in the tree whether we need them or not. With a few simple modifications, however, we can pull nodes out of a tree on demand in the same pre-order fashion by using Python generators. We simply start the traversal, yield the node data, yield all nodes in the left subtree, and then yield all nodes in the right subtree:

def basic_dfs ( tree ): if ( tree != None ): yield tree . data for node_data in basic_dfs ( tree . left ): yield node_data for node_data in basic_dfs ( tree . right ): yield node_data

If we wanted a (not-quite)-post-order traversal, we would yield the nodes in the right subtree first. We could do this by simple rewriting the function above. However, we can take this a step further, and leave the the decision of what nodes to yield first to another function entirely (I will call this the ‘chooser’ function):

def left_then_right ( tree ): if ( tree != None ): yield tree . left yield tree . right def dfs ( tree , chooser = left_then_right ): if ( tree != None ): yield tree . data for immediate_child in chooser ( tree ): for node_data in dfs ( immediate_child , chooser ): yield node_data

Thus dfs(sometree) will call the left_then_right() function by default, and perform a pre-order traversal. For our (not-quite)-post-order traversal, we define the right_then_left() function:

def right_then_left ( tree ): if ( tree != None ): yield tree . right yield tree . left

And passing this function with dfs(sometree, right_then_left) , we have our new traversal. To really see the benefit of our lazy traversals, though, we can go one step further, and implement a binary-search on top of our dfs() function as a chooser function. Instead of yielding the left subtree and the right subtree, the chooser function will yield the nodes in the left subtree if the value we’re searching for is less than the node data, the nodes in the right if the value is greater than the node data, or the node data itself if it is equal to the value:

def binary_search_chooser ( value ): def binary_search_chooser_inner ( tree ): if ( tree != None and tree . data != None ): if ( value <= tree . data ): yield tree . left else : yield tree . right return binary_search_chooser_inner

Don’t let the closure fool you, this is still simple stuff. A call to binary_search_chooer(5) , returns a chooser function that will decide whether to go left or right down the tree based on a node’s value. So to search a BST for 5, we can just call bfs(tree, binary_search_chooser(5)) . This will give us a list of nodes (the path to a leaf), the last of which will be 5 if that value is found.

For sure there are more efficient ways to do these kinds of traversal with pointer manipulation, etc, but this serves as fun exercise for fans of generators. The astute reader will also note that we’ve implemented the strategy pattern in a functional-programming type of way with our use of first class functions.