Trees, like lists, are powerful, ubiquitous data structures. Also, like lists, they are recursively defined.

A Naive Solution

At first blush, a reasonable way of defining a tree might be:

1: 2: 3: 4: type Tree < ' a > = { Tag : ' a Children : Tree < ' a > list }

This looks clean and elegant, but we immediately find it restrictive when we want to actually create a nested and branched structure. Unless we always build the tree up from the leaves towards the root, we need more sophisticated ways of walking the structure, and this leads to an interesting problem.

Specifically, if we need to introduce a reference to the parent of a given node, things get out of hand very quickly. This is the equivalent of trying to building a doubly-linked list with immutable data structures - which turns out to be a very difficult problem to solve.

An Object Lesson

F# is a multi-paradigm language, and we can easily sacrifice immutability to get bi-directional links.

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: module Tree = type NodeId = | Id of string [< AutoOpen >] module internal Node = [< AbstractClass >] type NodeBase < ' a > () = class member val private _children : NodeBase < ' a > list = [] with get , set member internal this . Children with get () = this . _children and set ns = this . _children <- ns abstract member Level : int end type Root < ' a > () = class inherit NodeBase < ' a > () override this . Level = 0 end type Node < ' a > ( id : NodeId , value : ' a option , parent : NodeBase < ' a > ) = class inherit NodeBase < ' a > () member this . Id = id member this . Parent = parent member val Value = value with get , set override this . Level = parent . Level + 1 end type Tree < ' a > () = class let root = Root < ' a > () member private this . Root = root member val private this . Current = root // Modify the value of the current node member this . ModifyValue f = .. . // Push a child on to the current node and make it the current node member this . PushChild name = .. . // Add a sibling to the current node and make it the current node member this . AddSibling name = .. . // Pop to the parent of this node and make it the current node member this . Pop l = .. . // Other modification operations elided... // Visit this tree in pre-order starting at the root member this . VisitPreOrder f = .. . // Visit the path to the root from the current node member this . VisitToRoot f = .. . end

We note the following immediately:

This is a familiar coding pattern. It's entirely conceivable that you would see similar code in C# or Java. We are taking advantage of F#'s module system to encapsulate the Node and Tree data structures, hiding implementation details, and exposing just the operations we wish to provide We are full-blown object-oriented and mutable at this point, so we are obliged to address several concerns that immutable data structures obviate. The Current member is modified only whilst tree-building, and serves as the starting point for the VisitToRoot operation. VisitPreOrder and VisitToRoot must each have their own way of traversing the tree without modifying either Root or Current . Traversing the tree should necessarily be a read-only operation.

(Pre-Order Traversal is one way to walk a tree from its root - other traversals are also possible.)

This solution may suffice for some cases, but we're going to consider a situation where immutablity is actually something we need for the purposes of the domain. For example, let's say we're building the tree as part of an operation, and we want to ensure that the tree returns to its original state if that operation fails. Keeping track of the tree as it grows, and being able to roll-back to a given state, is not something that is pleasant to do correctly when mutability is in the picture - and doubly so when concurrency and mutability meet as part of the problem.

So we are faced with an interesting quandary - having the ability to VisitToRoot or Pop requires bi-directional linking - which is hard to do with immutable data structures; and having the ability to check-point and roll-back tree-modification operations is difficult to do correctly without immutable data structures! What do we do?

Painting By Numbers

What if, instead of actually creating and modifying a tree like we were taught in CS 101, we simply keep track of the list of tree-modification instructions as a kind of program? This list would have to support a limited form of mutability in that the only way to modify the list would be to append to it, but the existing contents of the list could never change.

When the tree needs to be visited, we take the list of instructions and interpret them to build a tree using the mutable approach, but since the contents of the list at this point is fixed, the tree that we create from it, is, in some sense, constant even though it contains mutable parts. Indeed, the only operations that the tree needs to support from that point on are (possibly repeated) traversals.

This approach is quite a powerful one, and can be applied to a variety of problems. We could, in fact, generalize the pattern completely in other languages that allow abstraction over types, and this forms the general principle behind what is known as the 'Free Monad'. However, since the concept is quite powerful, we are going to explore the concept concretely, and leave the abstraction of the pattern to Haskell and Scala programmers!

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: module Tree = // other members elided ... type internal ConstructOperation < ' a > = | PushChild of NodeId * ' a option | AddSibling of NodeId * ' a option | ModifyValue of ( ' a option -> ' a option ) | Pop of int option type Tree < ' a > () = class member val private ops : ConstructOperation < ' a > list = [] with get , set member this . PushChild x = this . ops <- PushChild x :: this . ops ; this member this . AddSibling x = this . ops <- AddSibling x :: this . ops ; this member this . ModifyValue x = this . ops <- ModifyValue x :: this . ops ; this member this . Pop ? x = this . ops <- ConstructOperation . Pop x :: this . ops ; this end

Of course, this is all well and good to build up a list of operations, but this doesn't actually build a tree - and we aren't really able to traverse the tree in any meaningful way.

One sneaky thing we have done is to build the list in reverse. This ensures that each operation is processed in constant-time.

In order to build the tree, we start with a single node, and fold over the list processing each node in turn. We want the result of the fold to be the tree with bi-directional links.

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: let rec applyOp op ( node : NodeBase < ' a > ) : NodeBase < ' a > = match op with // Push a child on to the given node and return it | PushChild ( x , v ) -> let child = Node ( x , v , node ) node . Children <- upcast child :: node . Children upcast child // Add a sibling to the given node and return it | AddSibling ( x , v ) -> match node with | :? Node < ' a > as n -> let sibling = Node ( x , v , n . Parent ) n . Parent . Children <- upcast sibling :: n . Parent . Children upcast sibling | _ -> failwith "Cannot add sibling to root" // Modify the value of the given node and return it | ModifyValue f -> match node with | :? Node < ' a > as n -> n . Value <- f n . Value upcast n | _ -> failwith "Cannot modify value of root" // Pop (recursively) to an ancestor of this node and return it | ConstructOperation . Pop l -> match node with | :? Node < ' a > as n -> let level = l |> Option . defaultValue ( n . Level - 1 ) if ( n . Parent . Level = level ) then n . Parent elif ( n . Level > level ) then applyOp ( ConstructOperation . Pop ( Some level )) ( n . Parent ) else failwith "How did we get here?" | _ -> failwith "Cannot pop root"

This function takes an operation op and applies it to a given node, returning a result. The signature of the function has been chosen to align with one of the folding functions, so if we start with a list of operations and a root node, we should be able to build up a full tree from the list, and end up pointing to the current node.

1: let current = List . foldBack applyOp ops ( upcast ( Root ()))

Of course, we will want to also have a handle to the root of the tree, so we can do traversals like a pre-order walk. We can get that by recursively walking up from the current position until we hit a root node.

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: let visitRoot start = let rec visit ( node : NodeBase < ' a > ) = match node with | :? Node < ' a > as n -> seq { yield node yield! visit n . Parent } | :? Root < ' a > as r -> seq { yield node } | _ -> Seq . empty visit start let last = visitRoot current |> Seq . last let root = last :?> Root < ' a >

Now, since we have started with a fixed list of operations, the current and root values represent a fixed tree. We can keep this pair in a structure that represents the "tree" version of the operations/

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: module Tree = // other members elided... type Tree < ' a > () = class // other members elided... member this . Build () = TreeCursor < ' a > ( this . ops ) end and TreeCursor < ' a > internal ( ops ) = class let rec applyOp op ( node : NodeBase < ' a > ) : NodeBase < ' a > = .. . let current = List . foldBack applyOp ops ( upcast ( Root ())) let visitRoot start = .. . let root = visitRoot current |> Seq . last :?> Root < ' a > end

While it might seem like a good idea to use a record for this, it might be better to use a class instead, because we don't want to expose the actual current and root members.

In fact, by using appropriate privacy modifiers on the constructor, we can make both the Tree<'a> and TreeCursor<'a> classes totally opaque - hiding the entire data structures within and only providing a clean programmatic interface to them.

Also, since a TreeCursor instance represents a Tree fixed at a given point, the only meaningful thing we can do to a TreeCursor is to traverse it, which leads to a very interesting observation. Since the tree is fixed, its traversals are also fixed. Which means we only have to traverse it once and build up a list of things we saw in the traversal, and then we can play back the traversal operations and process the tree in any way we choose.

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: module Tree = // other members elided... type VisitOperation < ' a > = | VisitRoot | VisitChild of NodeId | ReadValue of NodeId * ' a option | Pop type TreeCursor < ' a > internal ( ops ) = class // other members elided... member this . PathToRoot = let readValue ( nb : NodeBase < ' a > ) = match nb with | :? Node < ' a > as n -> ReadValue ( n . Id , n . Value ) | _ -> VisitRoot visitRoot current |> Seq . map readValue member this . PreOrderPath = let rec visit ( node : NodeBase < ' a > ) = seq { match node with | :? Node < ' a > as n -> yield ReadValue ( n . Id , n . Value ) | _ -> yield! Seq . empty for child in node . Children |> List . rev do match child with | :? Node < ' a > as c -> yield! seq { yield VisitChild c . Id yield! visit child yield VisitOperation . Pop } | _ -> yield! Seq . empty } visit root end

In the code snippet above, we have defined two interesting traversals - one starts at the current node and walks back to the root, and the other starts at the root and traverses the whole tree "pre-order".

Each traversal results in a fixed sequence of VisitOperation<'a> for future use.

Tree traversals are best represented as folds. This is actually a much broader topic of discussion, but folding over trees can build all kinds of other data structures - including other trees, and allow for tree-rewriting.

In our case, we can traverse the tree, and then fold over it, as follows:

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: module Tree = // other members elided... type TreeCursor < ' a > internal ( ops ) = class // other members elided... member this . VisitRoot < ' o > ( processor : ' o -> VisitOperation < ' a > -> ' o ) ( seed : ' o ) = Seq . fold processor seed this . PathToRoot member this . VisitPreOrder < ' o > ( processor : ' o -> VisitOperation < ' a > -> ' o ) ( seed : ' o ) = Seq . fold processor seed this . PreOrderPath end

And there we have it.

We have implemented a traditional tree which affords the benefits of immutable data structures (like check-pointing), whilst allowing for efficient tree traversals using parent-pointers, and functionally separating out the traversal concerns from the tree-node processing concerns.

And in less than 130 lines of code!

Soup's Up!

Let's build an example to see how this can be used:

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: let t = Tree < string > () let t = t . PushChild ( Id "a" , None ) let t = t . PushChild ( Id "b" , None ) let t = t . PushChild ( Id "b1" , None ) let t = t . AddSibling ( Id "b2" , None ) let t = t . Pop () let t = t . AddSibling ( Id "c" , None ) let t = t . PushChild ( Id "c1" , None ) let t = t . AddSibling ( Id "c2" , None ) let t = t . Pop () let t = t . Pop () let t = t . PushChild ( Id "d" , None )

At this point, we have represented the building of a nested structure in an idiomatic manner, but the internal representation is simply a list of operations describing the building of the structure, rather than the structure itself.

We can then create the tree structure - with bi-directional links - at a fixed point in time, allowing us to traverse the tree.

1: let tc = t . Build ()

Now let's write a function to process each node as we encounter it in the traversal.

The signature of this function matches the signature used by a folding function, which allows us to fold over the list of visit operations and build up a composite value.

In our case, we want to build up a string containing a textual representation of the path in the traversal.

1: 2: 3: 4: 5: 6: 7: 8: let printNode res curr = let c = match curr with | ReadValue ( id , vo ) -> sprintf "%s%s" id . unapply ( vo |> Option . map ( sprintf " (%A)" ) |> Option . defaultValue "" ) | VisitRoot -> "|" | VisitChild id -> "↓" | VisitOperation . Pop -> "↑" sprintf "%s %s" res c

For a given visit operation, we compute a glyph describing the traversal ('up' and 'down' for Pop and Push ), or the node's 'id' and 'value'.

We tack this value at the end of the string which represents the path taken so far.

Finally, we pass the printing function to the visitor methods instance

1: 2: printfn "Path to root : %s" <| tc . VisitRoot printNode "" printfn "Pre-Order walk: %s" <| tc . VisitPreOrder printNode ""

Conclusion

This method of description and deferred interpretation is a very powerful technique in functional programming. In our case, it allowed us to separate out concerns between tree creation and tree traversal, and appropriate the benefits of immutability (for tree creation) and mutability (for traversals) without sacrificing cleanliness or readability. In fact, we have hoisted all the mechanics of traversal away from the user, and visiting the tree is reduced to simply providing a folding function.

The concept is well worth learning, as in other languages with higher-kinded types, a lot of mechanical work is lifted by these abstractions. For example, the IO monad in Haskell, and the Free Monad in Scala and Haskell both use and amplify this concept.

All the code for this article is available at Fun With Trees

Keep typing!

type Tree<'a> =

{Tag: 'a;

Children: Tree<'a> list;}



Full name: funwithtrees.Tree<_>

Tree.Tag: 'a

Tree.Children: Tree<'a> list

type 'T list = List<'T>



Full name: Microsoft.FSharp.Collections.list<_>

type NodeId = | Id of string



Full name: funwithtrees.Tree.NodeId

union case NodeId.Id: string -> NodeId

Multiple items

val string : value:'T -> string



Full name: Microsoft.FSharp.Core.Operators.string



--------------------

type string = System.String



Full name: Microsoft.FSharp.Core.string

Multiple items

type AutoOpenAttribute =

inherit Attribute

new : unit -> AutoOpenAttribute

new : path:string -> AutoOpenAttribute

member Path : string



Full name: Microsoft.FSharp.Core.AutoOpenAttribute



--------------------

new : unit -> AutoOpenAttribute

new : path:string -> AutoOpenAttribute

Multiple items

type AbstractClassAttribute =

inherit Attribute

new : unit -> AbstractClassAttribute



Full name: Microsoft.FSharp.Core.AbstractClassAttribute



--------------------

new : unit -> AbstractClassAttribute

Multiple items

type internal NodeBase<'a> =

new : unit -> NodeBase<'a>

abstract member Level : int

member Children : NodeBase<'a> list

member private _children : NodeBase<'a> list

member Children : NodeBase<'a> list with set

member private _children : NodeBase<'a> list with set



Full name: funwithtrees.Tree.Node.NodeBase<_>



--------------------

internal new : unit -> NodeBase<'a>

val set : elements:seq<'T> -> Set<'T> (requires comparison)



Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.set

val this : NodeBase<'a>

member internal NodeBase.Children : NodeBase<'a> list with set



Full name: funwithtrees.Tree.Node.NodeBase`1.Children

property NodeBase._children: NodeBase<'a> list

val ns : NodeBase<'a> list

abstract member internal NodeBase.Level : int



Full name: funwithtrees.Tree.Node.NodeBase`1.Level

Multiple items

val int : value:'T -> int (requires member op_Explicit)



Full name: Microsoft.FSharp.Core.Operators.int



--------------------

type int = int32



Full name: Microsoft.FSharp.Core.int



--------------------

type int<'Measure> = int



Full name: Microsoft.FSharp.Core.int<_>

Multiple items

type internal Root<'a> =

inherit NodeBase<'a>

new : unit -> Root<'a>

override Level : int



Full name: funwithtrees.Tree.Node.Root<_>



--------------------

internal new : unit -> Root<'a>

val this : Root<'a>

override internal Root.Level : int



Full name: funwithtrees.Tree.Node.Root`1.Level

Multiple items

type internal Node<'a> =

inherit NodeBase<'a>

new : id:NodeId * value:'a option * parent:NodeBase<'a> -> Node<'a>

member Id : NodeId

override Level : int

member Parent : NodeBase<'a>

member Value : 'a option

member Value : 'a option with set



Full name: funwithtrees.Tree.Node.Node<_>



--------------------

internal new : id:NodeId * value:'a option * parent:NodeBase<'a> -> Node<'a>

val id : NodeId

val value : 'a option

type 'T option = Option<'T>



Full name: Microsoft.FSharp.Core.option<_>

val parent : NodeBase<'a>

val this : Node<'a>

member internal Node.Id : NodeId



Full name: funwithtrees.Tree.Node.Node`1.Id

member internal Node.Parent : NodeBase<'a>



Full name: funwithtrees.Tree.Node.Node`1.Parent

override internal Node.Level : int



Full name: funwithtrees.Tree.Node.Node`1.Level

property NodeBase.Level: int

Multiple items

type Tree<'a> =

new : unit -> Tree<'a>

member AddSibling : x:'a0 -> Tree<'a>

member ModifyValue : x:'a0 -> Tree<'a>

member Pop : ?x:'a0 -> Tree<'a>

member PushChild : x:'a0 -> Tree<'a>

member private Root : Root<'a>

member private ops : obj list

member private ops : obj list with set



Full name: funwithtrees.Tree.Tree<_>



--------------------

new : unit -> Tree<'a>

val root : Root<'a>

val this : Tree<'a>

Multiple items

member private Tree.Root : Root<'a>



Full name: funwithtrees.Tree.Tree`1.Root



--------------------

type internal Root<'a> =

inherit NodeBase<'a>

new : unit -> Root<'a>

override Level : int



Full name: funwithtrees.Tree.Node.Root<_>



--------------------

internal new : unit -> Root<'a>

member Tree.ModifyValue : x:'a0 -> Tree<'a>

member Tree.PushChild : x:'a0 -> Tree<'a>

member Tree.AddSibling : x:'a0 -> Tree<'a>

member Tree.Pop : ?x:'a0 -> Tree<'a>

member Tree.PushChild : x:'a0 -> Tree<'a>



Full name: funwithtrees.Tree.Tree`1.PushChild

val x : 'a

property Tree.ops: obj list

member Tree.AddSibling : x:'a0 -> Tree<'a>



Full name: funwithtrees.Tree.Tree`1.AddSibling

member Tree.ModifyValue : x:'a0 -> Tree<'a>



Full name: funwithtrees.Tree.Tree`1.ModifyValue

member Tree.Pop : ?x:'a0 -> Tree<'a>



Full name: funwithtrees.Tree.Tree`1.Pop

val x : 'a option

val applyOp : op:'a -> node:'b -> 'c



Full name: funwithtrees.applyOp

val op : 'a

val node : 'b

val failwith : message:string -> 'T



Full name: Microsoft.FSharp.Core.Operators.failwith

module Option



from Microsoft.FSharp.Core

union case Option.Some: Value: 'T -> Option<'T>

val current : obj



Full name: funwithtrees.current

Multiple items

module List



from Microsoft.FSharp.Collections



--------------------

type List<'T> =

| ( [] )

| ( :: ) of Head: 'T * Tail: 'T list

interface IEnumerable

interface IEnumerable<'T>

member GetSlice : startIndex:int option * endIndex:int option -> 'T list

member Head : 'T

member IsEmpty : bool

member Item : index:int -> 'T with get

member Length : int

member Tail : 'T list

static member Cons : head:'T * tail:'T list -> 'T list

static member Empty : 'T list



Full name: Microsoft.FSharp.Collections.List<_>

val foldBack : folder:('T -> 'State -> 'State) -> list:'T list -> state:'State -> 'State



Full name: Microsoft.FSharp.Collections.List.foldBack

val visitRoot : start:'a -> 'b



Full name: funwithtrees.visitRoot

val start : 'a

val visit : ('c -> 'd)

val node : 'c

Multiple items

val seq : sequence:seq<'T> -> seq<'T>



Full name: Microsoft.FSharp.Core.Operators.seq



--------------------

type seq<'T> = System.Collections.Generic.IEnumerable<'T>



Full name: Microsoft.FSharp.Collections.seq<_>

module Seq



from Microsoft.FSharp.Collections

val empty<'T> : seq<'T>



Full name: Microsoft.FSharp.Collections.Seq.empty

val last : obj



Full name: funwithtrees.last

val last : source:seq<'T> -> 'T



Full name: Microsoft.FSharp.Collections.Seq.last

val root : obj



Full name: funwithtrees.root

Multiple items

module Tree



from funwithtrees



--------------------

type Tree<'a> =

{Tag: 'a;

Children: Tree<'a> list;}



Full name: funwithtrees.Tree<_>

val map : mapping:('T -> 'U) -> source:seq<'T> -> seq<'U>



Full name: Microsoft.FSharp.Collections.Seq.map

val rev : list:'T list -> 'T list



Full name: Microsoft.FSharp.Collections.List.rev

val fold : folder:('State -> 'T -> 'State) -> state:'State -> source:seq<'T> -> 'State



Full name: Microsoft.FSharp.Collections.Seq.fold

union case Option.None: Option<'T>

val id : x:'T -> 'T



Full name: Microsoft.FSharp.Core.Operators.id

val sprintf : format:Printf.StringFormat<'T> -> 'T



Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.sprintf

val map : mapping:('T -> 'U) -> option:'T option -> 'U option



Full name: Microsoft.FSharp.Core.Option.map

val printfn : format:Printf.TextWriterFormat<'T> -> 'T



Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn