Posted in clojure | Wednesday, September 1st, 2010 | Trackback

Clojure.zip is a library that lets you move around freely within trees and also create changed copies of them. This is a tutorial I wish I’d had when I started using it.



In this tutorial, I’ll use sequences as trees. You can create your own kind of trees if you want, but I won’t cover that.

Here’s what we’ll be working with:

user=> (def original [1 '(a b c) 2]) #'user/original

It’s a tree whose root is a vector with three children: 1, the subtree (a,b,c) and 2. We need to convert it into some sort of data structure that allows free movement. That’s done like this:

user=> (require '[clojure.zip :as zip]) nil user=> (def root-loc (zip/seq-zip (seq original))) #'user/root-loc

Notice that I used the alias zip . If you refer or use clojure.zip, you’ll find yourself overwriting useful functions like clojure.core/next .

Notice also that I explicitly wrapped the tree in a seq. If you use seq-zip on an unwrapped vector, you’ll get confusing results.

At this point, I’d describe root-loc as “the location (or loc) of the root of the tomorrow tree.” I say “tomorrow tree” because it’s not a tree itself, but something that can later be converted into a tree. In reality, root-loc names both the loc and the tomorrow tree, bundled up together, but I think it most straightforward to leave the tomorrow tree implicit.

(The actual data structure is called a “zipper”, which is a decent analogy for the actual implementation but didn’t help me understand how to use the library.)

Moving around inside a tomorrow tree

With the loc, we can move around:

user=> original [1 (a b c) 2] user=> (zip/node (zip/down root-loc)) 1

zip/down moves to the leftmost child of the current loc and returns that child’s loc. zip/node gives you the subtree of the original tree corresponding to its loc argument. It’s one of the main ways you get parts of a tree—regular lists, vectors, and the like—”out of” a tomorrow tree.

The -> macro makes movement in the tomorrow tree much easier to understand:

user=> (-> root-loc zip/down zip/right zip/node) (a b c) user=> (-> root-loc zip/down zip/right zip/down zip/right zip/node) b

Nevertheless, I always wrap anything but the simplest traversals in their own functions.

Here are some other movement functions for you to try: up , left , rightmost , and leftmost . Beware of (arguable) inconsistencies in the handling of edge cases. For example, right and rightmost behave differently when moving “off the end” of a list of siblings:

user=> original [1 (a b c) 2] user=> (def last-one (-> root-loc zip/down zip/right zip/right)) #'user/last-one user=> (zip/node last-one) 2 user=> (-> last-one zip/right) nil ; off into nothingness user=> (-> last-one zip/rightmost zip/node) 2 ; stays put

Parts of a tree

In addition to zip/node , there are other functions for recovering parts of the original tree. All work relative to the current loc.

user=> (def b (-> root-loc zip/down zip/right zip/down zip/right)) #'user/b user=> (zip/node b) b user=> (zip/lefts b) (a) user=> (zip/rights b) (c)

An interesting one shows all subtrees from the root of the tree down to just above the current loc:

user=> (zip/path b) [ (1 (a b c) 2) (a b c) ]

Changing the tree

A number of functions take a current loc (and associated tomorrow tree) and produce a new loc inside a different tomorrow tree. For example, let’s delete the ‘(a b c) subtree:

user=> (def loc-in-new-tree (zip/remove (zip/up b))) #'user/loc-in-new-tree

How does the tree represented by this new tomorrow tree differ from the original tree? We can see that with the root function, which applies zip/node to the root of the tomorrow tree:

user=> (zip/root loc-in-new-tree) (1 2) user=> original [1 (a b c) 2]

Where, exactly, is the new location?

user=> (zip/node loc-in-new-tree) 1

The new loc has “backed up” from its previous version. (That’s not an exact enough description, but it’ll do for a few more paragraphs.) The other editing functions return an “unchanged” loc (except in the sense that it’s pointing into a new tomorrow tree with a changed structure): insert-left , insert-right , replace , edit , insert-child , and append-child . Try them out.

In which I reveal I’m slow

At first, I easily forgot that these functions create a new tomorrow tree and don’t really “replace” or “insert” or “edit” parts of the old one. That is:

user=> (zip/root loc-in-new-tree) (1 2) ; I see that I've edited the tree. user=> (zip/root b) (1 (a b c) 2) ; Wait - I thought I changed the tree?!

“Well, duh”, you might say, “it’s a functional language with immutable state, so of course it doesn’t change the old tree.” You’re absolutely right, but it was surprisingly easy for old habits to sneak up on me. So, two rules:

If you ever fail to use the return value of one of these functions, you’re doing it wrong.

If you ever write code like this: (let [stashed-location (zip/whatever ...)] ... make "changes" ... ... use stashed location ...) you might be doing it wrong. Make sure that you’re not unthinkingly assuming that later changes to “the” tree are reflected in your stashed-location .

Whole-tree editing

Here’s an example of printing out a whole tree, one node at a time:

(defn print-tree [original] (loop [loc (zip/seq-zip (seq original))] (if (zip/end? loc) (zip/root loc) (recur (zip/next (do (println (zip/node loc)) loc))))))

This is an ordinary recursive loop. It visits each location in the tomorrow tree, stopping when zip/end? is true. zip/next returns a new current loc that is the next one in the tomorrow tree, where “next” means “in preorder depth-first order”. To see that, here’s what one run of the function prints:

user=> (print-tree [1 '(a (i ii iii) c) 2]) (1 (a (i ii iii) c) 2) 1 (a (i ii iii) c) a (i ii iii) i ii iii c 2

To make changes to the tree, add a cond . The default case should hand the current loc to zip/next . The other cases should yield a loc pointing into a changed copy of the tomorrow tree:

(loop [loc (zip/seq-zip original-tree)] (if (zip/end?> loc) (zip/root loc) (recur (zip/next (cond (subtree-to-change? loc) (modify-subtree loc) … :else loc)))))

The tricky bit is making sure that modify-subtree returns a loc just before the next loc of interest (in a depth-first traversal). (It has to be before so that zip/next takes you to the interesting loc.) To get to that loc, you can use any of the movement functions ( zip/next , zip/up , zip/rightmost , and so on). There’s also a zip/prev that returns the loc just before the current one.

To keep from confusing myself, I write little helper functions, each named by what it does and what loc it returns. So, for example, I have one function that glories in this name:

(defn wrap-with-expect__at-rightmost-wrapped-location [loc] (assert (start-of-arrow-sequence? loc)) (let [right-hand (-> loc zip/right zip/right) edited-loc (zip/edit loc (fn [loc] `(expect ~loc => ~(zip/node right-hand))))] (-> edited-loc zip/right zip/right zip/remove zip/remove)))

It takes a form like ... (f 1) => (+ 2 3) :next , with the current loc being (f 1) , and turns it into this:

... (expect (f 1) => (+ 2 3)) :next

… with the current loc being at the 3 so that the zip/next returns a loc at :next . This positioning works because I use zip/remove , which returns a loc that’s “backed up” to the previous loc in a depth-first traversal. (That’s the fix to my earlier imprecision about what zip/remove returns. It’s not the previous loc at the same level, which—for the sake of a simpler explanation—I earlier allowed you to assume.)

By building and testing these little functions first, my main cond-loop is easier to get right. You can see some more examples in my test package, Midje. You can look at both the tests and the code.