

Anybody can say you can’t write. Let no one say you don’t.



Languages like Haskell support lazy evaluation . In principle, you only compute what you actually need, everything else just goes away. If you usually need everything you compute, this may seem like a frill: elegant, even interesting, but having little practical importance. I find it is very much like Tail Call Optimization: if you don’t have it, you code around it. Often it makes no difference, but from time to time there is a case where your code will be clearer and more maintainable if you express yourself succinctly and let the compiler do the work of making it efficient.This post is a switch from mumbling about Y Combinators and using trivial cases to explain interesting ideas: I’m going to show some actual code I’m writing. I apologise in advance if this adds so much background that it obscures the message about lazy evaluation: extremely simplistic examples sometimes work against the argument because it is too easy to think of other ways to accomplish the needed results.I don’t apologise at all for the unpolished nature of the code. This isn’t a textbook. And besides:

—Ken Rand, courtesy of Chalain







In computer programming, a hacker is a software designer and programmer who builds elegant, beautiful programs and systems… For some, “hacker” has a negative connotation and refers to a person who “hacks” or uses kludges to accomplish programming tasks that are ugly, inelegant, and inefficient.



I’ve been hacking some naïve code to cluster data sets.

—Wikipedia

The curves being generated are paths composed of cubic beziérs. Each segment on the path requires specifications for four different control points.

The curves being generated are paths composed of cubic beziérs. Each segment on the path requires specifications for four different control points.

odd = LazyList.unfoldr(1) { |n| n + 2 }

even = LazyList.unfoldr(0) { |n| n + 2 }

odd

even

merge

even.merge(odd) => (0 1 2 3 ...)

cons

pairwise

product



Any sufficiently complicated Lisp or Ruby program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Haskell.



LazyList

distribute(2, 1..4) -> [[1, 2], [1, 3], [1, 4], [2, 3], [2, 4], [3, 4]]

def distribute choices, range

spots = (range.end - range.begin) + 1

return [] if choices <= 0 || choices > spots

remaining_choices = choices - 1

if remaining_choices == 0

return range.to_a.map { |spot| [spot] }

else

(range.begin..(range.end - remaining_choices)).to_a.inject([]) { |acc, first_spot| acc + (distribute(remaining_choices, (first_spot + 1)..range.end).map { |remaining_distribution| remaining_distribution.unshift(first_spot) }) }

end

end



def distribute choices, range

spots = (range.end - range.begin) + 1

return LazyList.new() if choices <= 0 || choices > spots

remaining_choices = choices - 1

if remaining_choices == 0

LazyList.from_range(range).map { |spot| LazyList.from_element(spot) }

else

LazyList.from_range(range.begin..(range.end - remaining_choices)).mapr(:merge) do |first_spot|

distribute(remaining_choices, (first_spot + 1)..range.end).map do |remaining_distribution|

LazyList.cons(first_spot, remaining_distribution)

end

end

end

end



LazyLists

mapr

inject

cons

unshift



A language that doesn’t affect the way you think about programming, is not worth knowing.



The clustering algorithm requires a very large, fixed set of curves.I wrote the initial curve generation by building a gigantic list of parameter tuples, and then processing the list into records. Once the “search space” grew beyond a trivial size, the program began to eat enormous amounts of memory. The problem was that I was trying to write the generation code as clearly as possible, and that created a massive thicket of objects that all resided in memory simultaneously.I rejected the idea of a rewrite into looping, imperative form, I wanted to separate the “list comprehension” code from the “make it run in less than a gigabyte” code. Instead, I chose to use lazy evaluation: the principle is that although the code defines a huge data structure that takes up gigabytes, we only actually evaluate things as we need them, and we discard them when we are done, so the total memory footprint for the lazy form is about the same as the memory footprint for an imperative form.The easiest way to perform the refactoring—besides a rewrite in Haskell—was to switch all of the arrays I used for a lazy list data structure. A lazy list is a linked list composed of head and tail tuples (known as cons cells ). Where a normal linked list holds an element and the rest of the list, a lazy list holds an element and a function for computing the rest of the list ( SICP calls them “ streams .”)If a modicum of care is taken not to pin objects down, you can build fantastically large lazy lists and process them at will. In fact, lazy lists can be infinitely large if you provide the appropriate function for generating the listFor that reason, you should never append one lazy list onto another lazy list. Considerand. If you appendonto, the resulting list in theory has both even and odd numbers. But in practice you can never reach the odd numbers because there is an infinite quantity of even numbers at the front of the list.Instead,them together to produce a lazy union of the two lists. Merge interleaves alternating elements from each list, so the resulting lazy list contains all the elements of each list, and if you take a finite sample, such as the first 1,000 elements, you get 500 of each.. (Other interesting operations that work with infinite lists includeand.)My current version of aclass in Ruby is here . As soon as I switched from the “eager” code using arrays to the “lazy” code using lazy lists, memory use went down and performance went up. Not as much as a switch to imperative (and not even close to writing a better clustering algorithm), but enough that I can move forward.Here is an example of a function using arrays to enumerate choices ():And rewritten with lazy lists:It’s almost identical.replace arrays,replaces accumulating a list with, andreplaces, but otherwise it’s the same code. Mission accomplished: we’ve changed the behaviour without having to change the way we express our algorithms. Had we wanted to, we could have supported enough array syntax that Lazy Lists look exactly like arrays, but that isn’t necessary.And when running the entire generator, the memory footprint is dramatically lower. For example, this small routine quoted above no longer generates an array of arrays: it generates a structure that can generate the lists when needed. Switching the data structure changes the evaluation behaviour of the generator code: it gets us 80% of having an imperative structure without having to tie the code to the implementation.This is exactly what separation of concerns is all about: green code here, yellow code there It’s great to make a change to the code’s behaviour without changing the way we think about the solution to our problem. But you know what? It isn’t interesting. In fact, if lazy optimization was something we needed on a regular basis, we ought to switch to a language that does it for us . Implementation details ought to be left to libraries, compilers, and virtual machines. That’s what they’re for.

—Alan Perlis

Now that we hold in our hands a tool for making infinitely long lists, the question to ask is, “how does that affect the way we think about generating sample curves?”The first thing that comes to mind is to ask why the set of sample curves we generate is finite. It isn’t really finite, there are an infinite number of curves. (And not just any infinity, it’s Aleph One, the children’s book One Two Three… Infinity explained that forty years ago.)

y

0.0

1.0

0.0000...00001

0.0000...00002

0.0

1.0

(0.0000...00001 0.0000...00002 0.0000...00003 0.0000...00004 ... )

x

y

(0.0 1.0 0.5 0.75 0.25 ... )

binary_search

def binary_search seed1, seed2, &block

bisection_block = block || lambda { |x, y| (x-y)/2 + y }

LazyList.cons(

seed1,

LazyList.new(

seed2,

delay {

bisect(seed1, seed2, &bisection_block)

}

)

)

end



# infinitely bisects a space, exclusive of the seed values



def bisect seed1, seed2, &block

return EMPTY if seed1 == seed2

half = block.call(seed1, seed2)

LazyList.new(

half,

delay {

merge(

bisect(seed1, half, &block),

bisect(half, seed2, &block)

)

}

)

end



LazyList.binary_search(0.0, 1.0) -> (0.0 1.0 0.5 0.75 0.25 ... )





Satan, Cantor, and Infinity and Other Mind-boggling Puzzles is a five-star introduction to Cantor’s work on infinity, including a special treat: a completely different proof that Aleph One is greater than Aleph Zero based on games, very much in the style of Conway’s Surreal Numbers. Currently out of print, but please get yourself a used copy from a bookseller: you won’t be disappointed!



is a five-star introduction to Cantor’s work on infinity, including a special treat: a completely different proof that Aleph One is greater than Aleph Zero based on games, very much in the style of Conway’s Surreal Numbers. Currently out of print, butget yourself a used copy from a bookseller: you won’t be disappointed!

(x, y)

( (0.0 0.0) (1.0 0.0) (0.5 0.0) (0.75 0.0) (0.25 0.0)... )

Breadth-first mapping from elements to integers

1 2 3

4 5 6

7 8 9

y

0.0

1.0

1, 5, 9

4, 8

2, 6

[{:x=>0.0, :y=>0.0}, {:x=>0.0, :y=>1.0}, {:x=>1.0, :y=>0.0}, {:x=>0.0, :y=>0.5}]

[{:x=>1.0, :y=>1.0}, {:x=>1.0, :y=>0.5}, {:x=>0.5, :y=>0.0}, {:x=>0.0, :y=>0.25}]

[{:x=>0.5, :y=>0.5}, {:x=>0.5, :y=>0.25}, {:x=>0.5, :y=>1.0}, {:x=>1.0, :y=>0.25}, {:x=>0.25, :y=>0.25}, {:x=>0.25, :y=>0.75}, {:x=>0.25, :y=>0.0}, {:x=>0.0, :y=>0.75}]



And I’m thinking about eternity

Some kind of ecstasy got a hold on me



What we really want is a finite sample of that infinite set of curves. Now let’s zoom in and think about a single parameter, thevalue for one of the control points on a curve. Let’s say it’s a rational number betweenandinclusive. It’s relatively easy to generate a list of rationals in that range. Say we generate(with jillions of zeroes elided: we want the smallest number our computer can represent), followed by(the second smallest), and so on.If we leave the computer running for a few months or years, we should get all of the numbers betweenandthat our computer can represent. That shouldn’t be hard. But even taking into account the finite limitations on a computer’s representation, that list is infinitely long.We need to take a finite sample of all of the infinite possibilities in that list.When working with an infinite list, what we want is that the front part of the list, the part we can access before the heat death of the Universe, contains a useful sample. If we were sampling a list likefor control points, the first values are not useful at all. We’d just get a line along theaccess.There are various strategies for generating a more useful list. We could create an infinite list of random values. I’ll outline another method below. But the take-away idea is this: we need lists where the most useful values are first. Working with infinite lists encourage us to think of infinitely sized sets of samples, ordered by usefulness.Let’s say that when we generate, we devise a list like. We’re successively bisecting the intervals (which is why the library method is called). We can stop at any time and we have decent distribution. (And we can pipe dream that in the future, we can train our generator to favour points more likely to be useful to us.) That is much more useful!Here’s some Lazy List code that does just that: given two seeds and an optional bisection block, it generates an infinite list of values:And indeed,What happens when we try to combine two infinite lists? For example, we want a list oftuples, generated as the Cartesian Product of our binary search list with itself. If we use a typical breadth-first or depth-first algorithm, we’re sunk. We get. That has a nice distribution along one axis but still useless. What we need is a way of combining two infinite lists such that they give us a nice useful sample when we take a finite number of elements off the front.Here’s a table describing how the ‘breadth-first’ algorithm for generating the product of two lists works. Consider two lists of three elements each. In the table below, columns represent one list and rows the other. The numbers represent the order that the product of the lists is generated:This works fine for finite lists, but as we saw above it is useless for infinite lists. But looking at it in table form is useful. The product of two lists is tabular, the problem with a traditional algorithm is the order that we select elements from the table, not that the elements are tabular. Just as we solved the problem of how to samplemagnitudes by changing the order in which we selected rationals betweenand, what we need to do with Cartesian Products, what we need to do is select the products in an order that provides useful results.As it happens, there’s an order that generates useful finite samples when dealing the product of two infinite lists: Instead of generating rows and appending them to each other, it generates diagonals and merges them with each other. The first diagonal is. The second is. The third is. And so on. The advantage of this algorithm is that if given two infinite lists, it starts in the upper left-hand corner and supplies values, working its way right and down.Here’re the first four values of the product of two binary searches with each other:. And the next four:. And the next eight:We can take as many samples as we want, and more sample give us more “resolution.” We now have the tools we need to get rid of all of the limits and finite combinations and focus on describing what we’re trying to generate without intermingling a lot of generating code.

—Bruce Cockburn, Wondering Where The Lions Are

Now that we can generate an infinite list of useful magnitudes, plus we can combine infinite lists in a useful way, we can define an infinite list of sample curves. Here’s the exact definition of an infinite list of partial beziérs (there are only three control points because the origin of the beziér is always the terminus of the previous beziér in the curve we are generating):

def sample_cubic_beziers

magnitudes = LazyList.binary_search(0.0, 1.0)

control_points = LazyList.cartesian_product(magnitudes, magnitudes) do |x, y|

{ :x => x, :y => y }

end

LazyList.cartesian_product(control_points, control_points, control_points) do |p1, p2, p3|

{

:p1 => p1,

:p2 => p2,

:p3 => p3

}

end

end





Lazy evaluation solved a performance problem without needing an extensive rewrite of our generation code;

Taking the red pill, infinite lists allowed us to radically simplify the original code.



The data is a set of (x,y) tuples where x is time and y is a magnitude. The effort remaining in a sprint backlog is an example of this kind of data set. I need a function that computes the distance between any two data sets. I already have a function that computes the distance between a curve through the xy space and a data set, so if I build a lot of curves and then take the distance between each curve and each data set, I can find the distance between pairs of data sets by searching for the curve with the lowest total distance.



This operation is On2, but that’s why we invented distributed processing. Also, this operation is done infrequently relative to operations making use of the clusters, something like Google’s indexing being far less frequent than searching Google. And of course, when my MacBook overheats and burns its way through my desk, I can come back and replace the brute force operation with something more intelligent.



Well, all this requires generating the sample curves. I have code that makes curves out of Beziérs, provided I supply coördinates for the control points. the curves are actually ActiveRecord models. So all I needed was some simple code that generates all of the possible combinations and then write them to the database. It’s the kind of thing candidates write in job interviews where employers actually care about their stone cutting skills.

This problem is congruent to the proof that the set of all points on a plane is the same size as the set of all points in a line.

And it could be simpler yet: a “list comprehensions” DSL for lazy lists is on the wish list would make things far more readable.



That’s really simple, and really easy to understand.The domain-specific code that defines a cubic Beziér is concise and readable. And the infrastructure code that handles combinatorics is shunted away out of sight where it doesn’t clutter up our thinking.So what have we seen?Are there some opportunities to steal an idea like lazy evaluation from other languages and use them to simplify your code?

Labels: lispy, ruby