Trees are wonderful data structures, but traversing them can often be a pain. You can use recursion, as you long as you are confident that you won't overflow your stack. But, if you want to do anything non trivial when you've found the nodes you care about, it can be hard to keep that work separate from the traversal code.

A few weeks ago, I ran into just this problem. I was using Ruby's Ripper library to generate ASTs from Ruby source (Ripper calls them s-expressions). Here is a simple example.

This call:

Ripper.sexp("class A; def a; @a = @b; end; end")



yields this AST for the code:

[:program, [[:class, [:const_ref, [:@const, "A", [1, 6]]], nil, [:bodystmt, [[:def, [:@ident, "a", [1, 13]], [:params, nil, nil, nil, nil, nil], [:bodystmt, [[:assign, [:var_field, [:@ivar, "@a", [1, 16]]], [:var_ref, [:@ivar, "@b", [1, 21]]]]], nil, nil, nil]]], nil, nil, nil]]]]

Essentially, it is a tree constructed from arrays showing the structure of the program along with some text locations.

If we wanted to find all of the nodes for classes, it's simply a matter of searching for the arrays that start with a with a :class symbol. To find uses of instance variables, we search for a :@ivar symbol. But, how do we traverse this?

Well, we could write that recursive routine, or we could take advantage of the fact that our trees are stored as arrays and use some of the functional goodness that we have on the Array class. In particular, the flatten function.

Flatten takes takes an array that may have elements that are arrays and merges those arrays into the top level.

For example,

[1,[2,3],[4,[5]].flatten

yields:

[1,2,3,4,5]

We can have more control if we provide an argument. When you pass an integer, it tells flatten how many levels to merge into the top level.

So,

[1,[2,3],[4,[5]]].flatten(1)

yields:

[1,2,3,4,[5]]

Sweet, eh?

Here's our strategy. Take the array that represents the tree and flatten it one level. Then, select all elements which are arrays that start with the symbol we care about. Then, flatten the array to the next level, and select sub-arrays at that level which start with the symbol. We can group all of these selections together and end up with an array of all of the subtrees regardless of the level at which they were located in the original tree.

If we have a sense of the maximum depth we can have in our tree, and we don't mind some inefficiency, we can do this using a side-effect free style:

def sexp_select sexp, symbols (1..(max_nesting_level = 20)).map {|n| sexp.flatten(n) .select {|e| nonempty_ary?(e) } .select {|e| symbols.include? e[0] }} .select {|e| nonempty_ary?(e) } .flatten(1) end

The code above takes a tree and produces a list of progressive flattenings of it. The first element contains the tree flattened once, the second twice, etc. It selects from each of them the arrays that have the symbol we care about as the first element. We then select all of the non-empty arrays from that array (after all, there may be some levels that do not have the symbol we are interested in). Afterward, we flatten that selection to bring all of the sub-trees from each of the flattenings to the top level.

This expression:

sexp_select(Ripper.sexp("class A; def a; @a = @b; end; end"), [:@ivar])

yields:

[[:@ivar, "@a", [1, 16]], [:@ivar, "@b", [1, 21]]]

which is an array of the subtrees that define the instance variable references in the AST.

Yes, that function is an eyeful, and there are things that can be done to make the code clearer, such as assigning the subexpressions to variables so that we can name them well, but if we don't mind the extra processing that we are doing, this is a rather elegant way to find subtrees within a larger tree represented in an array. As well, it is a decent example of functional style in Ruby.