So I’m writing those thoughts here instead. They stand alone, but may make more sense in the context of the episode .

Wow, it’s difficult to think, type and talk at the same time! As I worked on the problem I had lots of ideas that I almost verbalised, but in the end I kept them to myself. That was probably for the best — we ran out of time anyway — but it’s also a shame, because thinking aloud might’ve illuminated some aspects of what I was doing.

I was the first guest on Drew Neil ’s Peer to Peer interview series. Drew gave me a programming problem and sat with me as I solved it. I had lots of fun; you can watch the episode here .

I don’t know if any of this is right — I’m just thinking out loud. There must be clever people out there who’ve already thought about this and come up with something more concrete than my vague speculations; if you’re one of them, or know someone who is, please tell me !

I generally find this kind of test less useful for designing programs and preventing bugs, although its black-box flavour does make it easier to refactor method internals without breaking the tests (this comes up in “ Catamorphisms ” below).

The behaviour of one invocation of #descendant_count or #leaf_count is so simple as to be uninteresting. What I really wanted to know was whether the behaviour across many recursive invocations was correct, which invited more of a classical (vs mockist) unit test, even straying into integration test territory.

The methods are defined recursively: when they collaborate with another object, they’re calling themselves, just in a different context. In a sense the method is already “isolated” without replacing those collaborators with doubles, because the recursive calls don’t lead the flow of control outside of the code under test. Trying to force more isolation than this might be an unhelpful, dogmatic step too far.

So, alright, mocks are a bad fit for stateless functional code. But that doesn’t necessarily mean that other kinds of test double are a bad fit too; I’ve used stubs successfully in larger functional programs.

Perhaps that’s a result of the tension between object-oriented and functional programming: OO implicitly favours the “ tell, don’t ask ” style in which objects avoid asking other objects for information, but functional programming is all about functions calling other functions and making decisions based on the values they return.

The methods are written in a purely functional style that doesn’t rely on mutating the internal state of objects, and it feels weird to use mocks to verify interactions that have no effect.

Off the top of my head, I can think of three possible reasons for that:

But in the case of the #descendant_count and #leaf_count methods I implemented in the episode, my instinct is that it’s less useful to think of isolation in those terms. That might mean that I need to change how I think about isolation, or it might mean that unit testing is less appropriate than other approaches (namely integration and acceptance testing) for this kind of code.

I mostly work on programs that can naturally be thought of as systems of collaborating objects, so it often makes sense to aim for “collaborator isolation”: treat a single object (and sometimes a single method) as the unit that’s being tested, and avoid exercising the rest of the system by using test doubles to stand in for any other objects that would otherwise be involved.

It seems likely that my notion of what “isolate” means is too narrow-minded, too fine-grained, or just flat-out inappropriate for problems like this.

When I’m writing unit tests I typically use test doubles (almost exclusively stubs and mocks ) to isolate the code under test so that I can explore the problem in small, controlled pieces. That usually works well, but for some reason I didn’t feel comfortable doing it while working on this problem, and I’d like to better understand why.

Before I get my hands dirty with the actual programming stuff , I’d like to note down a couple of things that occurred to me about how I work. (Feel free to skip ahead if you’re not interested in process — you won’t miss anything essential.)

Programming

Alright, that’s enough hand-waving about process. Let’s get stuck into some actual code.

Immutable trees I didn’t explain why I wanted the tree structure to be immutable, beyond some vague suggestion that I disliked the idea of clients messing with it. Like my decision to use Cucumber, this was really just an instinct that I didn’t feel the immediate need to discuss, but I now think some elaboration would’ve been useful. Immutability is generally desirable for various reasons, but here it elegantly imposes an important constraint: a tree shouldn’t contain cycles. Let’s say that we have a MutableNode class which exposes an array of its children : >> class MutableNode attr_accessor :children def initialize self.children = [] end end => … >> tree = MutableNode.new => #<MutableNode @children=[]> >> tree.children = 2.times.map { MutableNode.new } => [#<MutableNode @children=[]>, #<MutableNode @children=[]>] >> tree.children.last.children = [MutableNode.new] => [#<MutableNode @children=[]>] >> tree => #<MutableNode @children=[ #<MutableNode @children=[]>, #<MutableNode @children=[ #<MutableNode @children=[]> ]> ]> It’s easy to write a #height method that recursively walks down a tree to find the number of nodes (versus edges, per Drew’s problem statement) in the longest path from its root to a leaf: >> module Parent def leaf? children.none? end def height if leaf? 1 else children.map(&:height).max + 1 end end end => … >> MutableNode.include(Parent) => … >> tree.height => 3 But there’s nothing to stop us creating (perhaps accidentally) an infinitely-deep “tree” which contains itself: >> tree.children.push(tree) => [ #<MutableNode @children=[]>, #<MutableNode @children=[ #<MutableNode @children=[]> ]>, #<MutableNode @children=[...]> ] Now #height will try to walk down the tree forever, eventually exhausting the Ruby interpreter’s call stack: >> tree.height SystemStackError: stack level too deep This crash is symptomatic of a larger problem. We’d like to write straightforward code that makes reasonable assumptions about the structures it manipulates, but our MutableNode implementation allows one such assumption to be violated. An easy way of avoiding this is to make it more difficult to change the children of a node after it’s been created: >> class ImmutableNode include Parent def initialize(children) self.children = children.dup end private attr_accessor :children end => … (This isn’t true immutability, but it’s as close as we can conveniently get in such a permissive language; “Ruby is not a language to keep people away from horror”.) Immutability solves the problem by forcing a particular order of operations. To build a tree out of immutable pieces, we must start by creating the leaves, then use those leaves to construct their parent nodes, and so on until we finish by creating the root node: >> a, b = 2.times.map { ImmutableNode.new([]) } => [#<ImmutableNode @children=[]>, #<ImmutableNode @children=[]>] >> c = ImmutableNode.new([b]) => #<ImmutableNode @children=[#<ImmutableNode @children=[]>]> >> tree = ImmutableNode.new([a, c]) => #<ImmutableNode @children=[ #<ImmutableNode @children=[]>, #<ImmutableNode @children=[ #<ImmutableNode @children=[]> ]> ]> >> tree.height => 3 There’s no way to pass an ImmutableNode or any of its ancestors into its own constructor, because neither the node nor its ancestors can possibly exist at the moment when the constructor is being called. The resulting tree is cycle-free by construction; the arrow of time prevents us from tying the knot. Note that our #height implementation treats the children collection as Enumerable , but of course Enumerable makes no guarantees about finiteness, so we can still break #height by supplying a collection of children that goes on forever: >> tree = ImmutableNode.new([ImmutableNode.new([])].cycle) => #<ImmutableNode @children=#<Enumerator: [#<ImmutableNode @children=[]>]:cycle>> >> require 'timeout' => … >> Timeout.timeout(60) { tree.height } Timeout::Error: execution expired I don’t know an elegant way of avoiding that problem in Ruby, although it seems less likely to be a problem in practice. In other situations it may also be important to ensure that no children are shared between nodes — this is what differentiates trees from polytrees, multitrees and other kinds of directed acyclic graph — but that didn’t matter for any of the code I wrote this time.

Enumerators The problem with ImmutableNode is that it doesn’t let you see its children at all! We could fix that by adding a simple #children getter, but then a client of an ImmutableNode would be able to get hold of its children array and mutate it directly. We could fix that by freezing children , but freezing is a clumsy solution which introduces its own problems. A more expressive alternative is to expose the children as an Enumerator . An Enumerator lets us generate a collection dynamically with code: >> fruits = Enumerator.new do |yielder| yielder.yield 'apple' yielder.yield 'banana' yielder.yield 'cherry' end => #<Enumerator: #<Enumerator::Generator>:each> >> fruits.each do |fruit| puts fruit end apple banana cherry => … Because the Enumerator class has the Enumerable module mixed in, it supports all the usual Ruby collection methods: >> fruits.count => 3 >> fruits.take(2) => ["apple", "banana"] >> fruits.map(&:upcase) => ["APPLE", "BANANA", "CHERRY"] We can wrap any Enumerable object (e.g. an array) with an Enumerator by writing a block that explicitly iterates over it… >> vegetables_array = %w(asparagus broccoli carrot) => ["asparagus", "broccoli", "carrot"] >> vegetables_enumerator = Enumerator.new do |yielder| vegetables_array.each do |vegetable| yielder.yield vegetable end end => #<Enumerator: #<Enumerator::Generator>:each> >> vegetables_enumerator.map(&:chop) => ["asparagu", "broccol", "carro"] …but it’s much easier to use Ruby’s built-in #to_enum method to achieve the same thing: >> vegetables_enumerator = vegetables_array.to_enum => #<Enumerator: ["asparagus", "broccoli", "carrot"]:each> >> vegetables_enumerator.map(&:reverse) => ["sugarapsa", "iloccorb", "torrac"] The point of all this is that an Enumerator provides no way for a caller to modify its contents. Exposing an Enumerator sends a strong message: “you may iterate over these objects, but you may not remove, add to or reorder them”: >> vegetables_enumerator.push('daikon') NoMethodError: undefined method `push' for #<Enumerator: ["asparagus", "broccoli", "carrot"]:each> >> vegetables_enumerator.delete('broccoli') NoMethodError: undefined method `delete' for #<Enumerator: ["asparagus", "broccoli", "carrot"]:each> (Note, however, that anyone with access to vegetables_array can still mutate it, thereby indirectly changing the contents of vegetables_enumerator . Again, Ruby’s permissiveness makes it hard to entirely prevent this sort of thing, so we have to content ourselves with designing interfaces that encourage the behaviour we want.) A convenient and idiomatic way of exposing a node’s children is to provide an #each_child method that takes a block argument. #each_child calls #to_enum on children to create an Enumerator , then immediately passes the block to the Enumerator ’s #each method: >> class EnumerableNode def initialize(children) @children = children.dup end def each_child(&block) @children.to_enum.each(&block) end end => … When #each_child is called with a block, it’ll iterate over the children in the obvious way: >> tree = EnumerableNode.new([ EnumerableNode.new([]), EnumerableNode.new([ EnumerableNode.new([]) ]) ]) => #<EnumerableNode …> >> tree.each_child do |child| puts child.class end EnumerableNode EnumerableNode => … But if we don’t provide a block, the #each inside #each_child will just return the Enumerator to the caller untouched: >> tree.each_child => #<Enumerator: …> Once we have the Enumerator containing all the children, we can call any Enumerable method on it: >> tree.each_child.count => 2 For our #height method to work we only need children to respond to #none? and #map , both of which are from Enumerable , so we can implement #children to simply return the Enumerator and everything will be fine: >> class EnumerableNode include Parent def children each_child end end => … >> tree.children => #<Enumerator: …:each> >> tree.height => 3

Adding and removing children Although Enumerator s are immutable, we can still use one as the basis for a modified collection by creating another Enumerator : >> more_fruits = Enumerator.new do |yielder| fruits.each do |fruit| yielder.yield fruit end yielder.yield 'damson' end => #<Enumerator: #<Enumerator::Generator>:each> This new Enumerator iterates over fruits , yielding every value it finds, and afterwards yields the extra value 'damson' . As a result, more_fruits behaves like fruits with an extra member: >> fruits.include?('damson') => false >> more_fruits.include?('damson') => true >> more_fruits.count => 4 >> more_fruits.drop(2).take(2) => ["cherry", "damson"] We can do something similar to remove a member from a collection: >> fewer_fruits = Enumerator.new do |yielder| more_fruits.each do |fruit| yielder.yield fruit unless fruit == 'banana' end end => #<Enumerator: #<Enumerator::Generator>:each> This Enumerator iterates over more_fruits and yields all of its members except for 'banana' . So fewer_fruits behaves like more_fruits with 'banana' removed: >> more_fruits.include?('banana') => true >> fewer_fruits.include?('banana') => false >> fewer_fruits.count => 3 >> fewer_fruits.each do |fruit| puts fruit end apple cherry damson => … Here the 'apple' and 'cherry' members are coming from the original fruits collection, and 'damson' is coming from more_fruits , though we’re not getting its 'banana' . We can use this idea to support adding a child to an immutable node: >> class EnumerableNode def add_child(extra_child) children = Enumerator.new do |yielder| each_child do |child| yielder.yield child end yielder.yield extra_child end self.class.new(children) end end => … >> extra_child = EnumerableNode.new([ EnumerableNode.new([ EnumerableNode.new([]) ]) ]) => #<EnumerableNode …> >> bigger_tree = tree.add_child(extra_child) => #<EnumerableNode @children=#<Enumerator: #<Enumerator::Generator>:each>> >> tree.children.include?(extra_child) => false >> bigger_tree.children.include?(extra_child) => true >> bigger_tree.height => 4 And similarly, we can remove a child too: >> class EnumerableNode def remove_child(unwanted_child) children = Enumerator.new do |yielder| each_child do |child| yielder.yield child unless child == unwanted_child end end self.class.new(children) end end => … >> unwanted_child = bigger_tree.children.drop(1).first => #<EnumerableNode @children=[#<EnumerableNode @children=[]>]> >> smaller_tree = bigger_tree.remove_child(unwanted_child) => #<EnumerableNode @children=#<Enumerator: #<Enumerator::Generator>:each>> >> bigger_tree.children.include?(unwanted_child) => true >> smaller_tree.children.include?(unwanted_child) => false >> bigger_tree.children.count => 3 >> smaller_tree.children.count => 2 Whether this technique is actually more sensible or efficient than a more conventional approach (e.g. calling #entries on the Enumerator and mutating the resulting array) depends on the particular application, but it’s worth knowing about, because it’s one way of achieving the “structural sharing” used by many persistent data structures.