Introduction

The other night, I came across the problem of implementing autovivication in Ruby hashes. The solution I devised exemplifies an elegant and practical use of the Y combinator, and it seemed worth sharing.

For those of you who just want the solution, here it is:

module YCombinator def y_comb(&generator) proc { |x| proc { |*args| generator.call(x.call(x)).call(*args) } }.call(proc { |x| proc { |*args| generator.call(x.call(x)).call(*args) } }) end end class Object include(YCombinator) end hash = y_comb { |callback| proc { Hash.new { |h, k| h[k] = callback.call } } }.call hash['a']['b']['c']['d']['e']['f']['g'] = 1 hash

In Perl, the following line will successfully run:

$hash{'a'}{'b'}{'c'} = 1;

%hash

%hash

$hash{'a'}

even ifwas previously undefined. Perl will automatically setto be an empty hash, it will assignto be a reference to an empty hash, and so on, all automatically. This is called autovivication in Perl: hash and array variables automatically “come to life” as necessary. This is incredibly convenient for working with multidimensional arrays, for example.

Ruby, sadly, does not have autovivication (though not surprisingly, given the structure of the language). It does, however, provide a facility for default values of hashes. For example, one could write:

hash = Hash.new { |hash, key| hash[key] = {} } hash['a']['b'] = 3

hash['a']

'b'

hash['a']['b']['c'] = 3

hash['a']['b']

nil

['c']

in which case Ruby will create an empty hash, assign it to, and set the keyin that new hash to 3. However, this only works for two levels. This would not work:becausereturns, causing the attempt to accessto fail.

One solution would be to make an even more complicated default procedure:

hash = Hash.new { |hash, key| hash[key] = Hash.new { |hash, key| hash[key] = {} } }

But again, that will fail for even higher dimensions of access.

What is needed is a recursive definition of the default function, one that continues to assign auto-generating hashes to each higher dimension. A standard recursive approach will work:

def make_hash return Hash.new { |hash, key| hash[key] = make_hash } end

make_hash

But this requires that you define and store thefunction somewhere. This is risky: what if someone redefines or deletes the method? Can it be done without defining any methods, to prevent this problem?

Luckily, there is a solution made just for this situation: the Y combinator! There are several good web pages that derive and explain the Y combinator in greater detail (e.g., Wikipedia, and a blog post that claims that the Y combinator has “no practical use”!), so this will be a brief explanation of how it works.

Say you have a recursive Ruby procedure:

factorial = proc { |arg| if arg.zero? then 1 else arg * factorial.call(arg - 1) end }

factorial

Right now this procedure relies on the variableto contain the procedure.

We can remove this dependence by passing the recursive call's procedure as an argument. Instead of writing a function that calculates factorials, we write a function that generates another function that calculates factorials:

factorial_generator = proc { |callback| proc { |arg| if arg.zero? then 1 else arg * callback.call(arg - 1) end } }

callback

Now we just need to pass the rightto this procedure to implement the recursion.

Imagine what an appropriate callback procedure would look like. When the Y combinator is applied to generator , it should return a procedure that can be called to perform the recursive function (here, the factorial function). Call this function recursive . In the midst of executing recursive , the callback procedure may be executed. If callback is identical to recursive , then we will have successfully implemented a recursive function. So the goal is to design a Y combinator that accepts a generator and produces a procedure that, when executed, will set the generator's callback to be identical to that same procedure.

Here is the Y combinator implemented in Ruby:

y = proc { |generator| proc { |x| proc { |*args| generator.call(x.call(x)).call(*args) } }.call(proc { |x| proc { |*args| generator.call(x.call(x)).call(*args) } }) }

proc { |x| proc { |*args| generator.call(x.call(x)).call(*args) } }

y = proc { |generator| “ycombinator-function”.call(“ycombinator-function”) }

It is difficult to explain how this works, but here is how I think of it at least. Think of the procedure that reads:as the “ycombinator-function”. Now the Y combinator looks like:

Now we pass the generator function to the Y combinator:

recursive = y.call(factorial_generator)

generator

factorial_generator

recursive = proc { |*args| generator.call(x.call(x)).call(*args) }

generator

factorial_generator

x

and the following things happen. First, thevariable is bound to. Then, the “ycombinator-function” is executed with itself as the argument. All that the function does is return another procedure, so the result of calling the Y combinator is the procedure:withbound toandbound to the “ycombinator-function”.

Now what happens when we call this resulting function with some arguments (e.g., recursive.call(5) )? First, the argument to generator.call is evaluated. But since that argument is x.call(x) , and x is bound to the “ycombinator-function”, the result is a procedure identical to recursive ! Hence, we have successfully set the callback to what we wanted from above.

Next, the generator procedure is executed, setting recursive to be the callback procedure. This produces the procedure specified by the generator . That procedure is run with the passed arguments. If that generated procedure needs to make a recursive call, it executes the callback procedure, and because the callback procedure is the recursive procedure, this whole cycle repeats itself.

Whew! Hopefully some of that makes sense. The main point, though, is that:

factorial = y.call(proc { |callback| proc { |arg| if arg.zero? then 1 else arg * callback.call(arg - 1) end } }) factorial.call(5) # => 120

will produce a proper factorial procedure, without ever having to store a name for that procedure.

For autovivifying hashes, we start with our regular recursive implementation:

make_hash = proc { Hash.new { |hash, key| hash[key] = make_hash.call } }

proc { |callback| proc { Hash.new { |hash, key| hash[key] = callback.call } } }

module YCombinator def y_comb(&generator) proc { |x| proc { |*args| generator.call(x.call(x)).call(*args) } }.call(proc { |x| proc { |*args| generator.call(x.call(x)).call(*args) } }) end end class Object include YCombinator end

hash = y_comb { |callback| proc { Hash.new { |h, k| h[k] = callback.call } } }.call # Now this will work: hash['a']['b']['c']['d']['e']['f']['g'] = 1

Now we separate out the named procedure call to produce a generator function:The next step is to implement the Y combinator. Unlike the above explanation, I will make it a method available to all objects rather than a procedure.The autovivifying hash, now, is created by:Congratulations! You have successfully implemented hash autovivication, without needing to define any external methods or variables. (Other than the Y combinator, of course. It is possible to do that, though I’ll leave that as an exercise to the reader.)

In writing this, I discovered an unusual “feature” of ruby. If you run the code:

x = 1 y = proc { |x| x + 3 } y.call(5)

x

y

it turns out thatis not local to the procedure stored in, but rather is actually identical to the variable outside. I’m not sure I like this aspect of Ruby, but there it is.

It is possible to implement autovivication of hashes without using the Y combinator but still retaining the safety of having no global variables. Just take advantage of scoping inside procedures:

hash = proc { make_hash = proc { Hash.new { |h, k| h[k] = make_hash.call } } make_hash.call }.call

make_hash

This works fine as long as there is no variablealready defined. But this approach uses 33 lexical tokens, and the Y combinator one only uses 31 (not including the Y combinator itself). So there!

Ryan Davis sent me an object-oriented implementation of autovivication:

class AVHash < Hash def initialize super { |h,k| h[k] = AVHash.new } end end

This is probably the best approach if you need to make many autovivifying hashes. This is what I like about Ruby: there are so many ways to do things, and they are all surprisingly elegant.

A comment on this article on Hacker News observed that my implementation was vulnerable to someone overwriting the Y combinator method that I defined in Object . This is partially correct: if someone overwrites the y_comb method, then uses of that method afterwards may fail, although already created (“combinated”?) procedures will work fine. But, more importantly, notice that defining of the Y combinator is actually unnecessary to define the Y combinator at all. You could just define and run it all in one go:

hash = proc { |generator| proc { |x| proc { |*args| generator.call(x.call(x)).call(*args) } }.call(proc { |x| proc { |*args| generator.call(x.call(x)).call(*args) } }) }.call(proc { |callback| proc { Hash.new { |h, k| h[k] = callback.call } } }).call

The commenter is right, though, in that the Y combinator shouldn't really be necessary in most modern production languages used by coders of reasonable skill—though I see no reason not to use it if the opportunity presents itself and it is documented properly (consider linking to this article!).

The commenter also expressed difficulty in understanding this article, as well as the Y combinator in general. If it is any consolation, it took me three full days before I managed to wrap my head around it. This is tricky stuff, but that’s what makes the best brain exercise.