Finding a Connection

I’ve been having fun brushing up on basic graph theory lately. It’s amazing how many problems can be modeled with it. To that end, I did a code kata the other day that lent itself to a graph-based solution:

. . . the challenge is to build a chain of words, starting with one particular word and ending with another. Successive entries in the chain must all be real words, and each can differ from the previous word by just one letter.

One way to approach this is to think of all valid words as nodes in a graph, where words that differ from each other by one letter are connected. To find a path between one word, say “cat”, and another, “dog”, traverse the graph breadth-first starting at the “cat” node until you find the “dog” node.

Implementing this in Clojure is a cinch. First let’s create a dictionary of the words we’ll use:

(def dictionary (->> (slurp "/usr/share/dict/words") split-lines (map lower-case) (into #{})))

This takes in words from a file (OS X’s built-in dict here) and sticks them in a set. Having the words in a set gives us a fast and easy way to check whether a word is valid:

(filter dictionary ["cuspidor" "cromulent" "xebec"]) => ("cuspidor" "xebec")

Next we need a function to give us a word’s neighbors:

(def alphabet "abcdefghijklmnopqrstuvwxyz") (defn edits [^String word] "Returns words that differ from word by one letter. E.g., cat => fat, cut, can, etc." (->> word (map-indexed (fn [i c] (let [sb (StringBuilder. word)] (for [altc alphabet :when (not= altc c)] (str (doto sb (.setCharAt i altc))))))) (apply concat) (filter dictionary)))

For every letter in a word, replace it with every other letter in the alphabet; collect all these variations together and then keep only the legit ones.

Lastly, we need a function to actually perform the search:

(defn find-path [neighbors start end] "Return a path from start to end with the fewest hops (i.e. irrespective of edge weights), neighbors being a function that returns adjacent nodes" (loop [queue (conj clojure.lang.PersistentQueue/EMPTY start) preds {start nil}] (when-let [node (peek queue)] (let [nbrs (remove #(contains? preds %) (neighbors node))] (if (some #{end} nbrs) (reverse (cons end (take-while identity (iterate preds node)))) (recur (into (pop queue) nbrs) (reduce #(assoc %1 %2 node) preds nbrs)))))))

This is a fairly straight translation of the imperative algorithm. 1 We use a PersistentQueue to keep track of nodes to visit next. The preds map does double-duty: it keeps track of nodes already seen, and allows us to trace our path back to the beginning once we reach our destination.

Now we’re ready to actually run the search:

(find-path edits "cat" "dog") => ("cat" "cot" "dot" "dog") (find-path edits "four" "five") => ("four" "foud" "fold" "fole" "file" "five") (find-path edits "bleak" "bloke") => ("bleak" "bleat" "blest" "blast" "blase" "blake" "bloke")

Nice. The longest of these runs in just over 100ms on my machine — not too shabby (though we can certainly do better). There are about 200k nodes and 100k edges in the word-chain graph.

Seeing it Through

I’m a visually-oriented person. Getting a correct result is well and good, but I want to see the process. To do that, I shaved an enormous yak and wrote a graph library for Clojure.

This new library helped me create pretty diagrams like the one at the top of this article. It outsources most of the hard work to the awesome GraphViz and the also-awesome Ubigraph tool, which lets you visualize graph structures and algorithms in realtime. Like this:

Expanding our Horizons

Another place graph traversal comes in handy is finding solutions to certain types of games, like Towers of Hanoi. Think of each possible position in the game as a node. Nodes connect to each other via valid moves.

So to solve a game, instead of an edits function, we need a moves function that takes a game state and returns valid neighboring states. Let’s solve Towers of Hanoi:

As our game state, we’ll use vector with an entry for each peg. Each peg will contain a sorted set of disks, in order of smallest to biggest. The game state where all disks are on the leftmost peg would look like this (assuming three disks and three pegs):

[#{0 1 2} #{} #{}]

Here’s the moves function:

(defn moves [state] (for [[from-peg disk] (map-indexed #(vector %1 (first %2)) state) to-peg (range (count state)) :when (and disk (not= from-peg to-peg) (or (empty? (state to-peg)) (< disk (first (state to-peg)))))] (-> state (update-in [from-peg] disj disk) (update-in [to-peg] conj disk))))

For each topmost disk, see if we can move it to another peg. To do that, the other peg has to have a bigger top disk, or no disks at all.

Run the same find-path function on our new inputs…

(let [start [(sorted-set 1 2 3) (sorted-set) (sorted-set)] end [(sorted-set) (sorted-set) (sorted-set 1 2 3)]] (find-path moves start end)) => ([#{1 2 3} #{} #{}] [#{2 3} #{} #{1}] [#{3} #{2} #{1}] [#{3} #{1 2} #{}] [#{} #{1 2} #{3}] [#{1} #{2} #{3}] [#{1} #{} #{2 3}] [#{} #{} #{1 2 3}])

And voilà. We have…something not so pretty. Loom, GraphViz, and Ubigraph to the rescue:

Here are renderings of solutions to Towers of Hanoi with three pegs and 3, 4, 5, 6, 7, and 8 disks (the last pushed the limits of GraphViz):

(Look familiar?)

Code

Click here for all the code used to create this post (plus some extra bits)

To Be Continued

Next time we’ll play with bigger graphs and leverage Clojure’s state-management tools to create parallel search algorithms.

Footnotes