Lazy evaluation can be a powerful tool for structuring your code. For instance, it can let you turn your code inside out, inverting the flow of control. Many a Haskell program take advantage of laziness to express algorithms in clear succinct terms, turning them from recipes to declarations.

The question for today’s blog post is: How can we tap the power of lazy evaluation in an inherently eager language like C++? I’ll lead you through a simple coding example and gradually introduce the building blocks of lazy programming: the suspension, the lazy stream, and a whole slew of functional algorithms that let you operate on them. In the process we’ll discover some fundamental functional patterns like functors, monads, and monoids. I have discussed them already in my post about C++ futures. It’s very edifying to see them emerge in a completely different context.

The Problem

Let’s write a program that prints the first n Pythagorean triples. A Pythagorean triple consists of three integers, x, y, and z, that satisfy the relation x2 + y2 = z2. Let’s not be fancy and just go with the brute force approach. Here’s the program in C:

void printNTriples(int n) { int i = 0; for (int z = 1; ; ++z) for (int x = 1; x <= z; ++x) for (int y = x; y <= z; ++y) if (x*x + y*y == z*z) { printf("%d, %d, %d

", x, y, z); if (++i == n) return; } }

Here, a single C function serves three distinct purposes: It

Generates Pythagorean triples, Prints them, Counts them; and when the count reaches n, breaks.

This is fine, as long as you don’t have to modify or reuse this code. But what if, for instance, instead of printing, you wanted to draw the triples as triangles? Or if you wanted to stop as soon as one of the numbers reached 100? The problem with this code is that it’s structured inside out: both the test and the sink for data are embedded in the innermost loop of the algorithm. A more natural and flexible approach would be to:

Generate the list of Pythagorean triples, Take the first ten of them, and Print them.

And that’s exactly how you’d write this program in Haskell:

main = print (take 10 triples) triples = [(x, y, z) | z <- [1..] , x <- [1..z] , y <- [x..z] , x^2 + y^2 == z^2]

This program reads: take 10 triples and print them. It declares triples as a list (square brackets mean a list) of triples (x, y, z) , where (the vertical bar reads “where”) z is an element of the list of integers from 1 to infinity, x is from 1 to z , y is from x to z , and the sum of squares of x and y is equal to the square of z . This notation is called “list comprehension” and is characteristic of Haskell terseness.

You see the difference? Haskell let’s you abstract the notion of the list of Pythagorean triples so you can operate on it as one entity, whereas in C (or, for that matter, in C++) we were not able to disentangle the different, orthogonal, aspects of the program.

The key advantage of Haskell in this case is its ability to deal with infinite lists. And this ability comes from Haskell’s inherent laziness. Things are never evaluated in Haskell until they are absolutely needed. In the program above, it was the call to print that forced Haskell to actually do some work: take 10 elements from the list of triples. Since the triples weren’t there yet, it had to calculate them, but only as many as were requested and not a number more.

Suspension

We’ll start with the most basic building block of laziness: a suspended function. Here’s the first naive attempt:

template<class T> class Susp { public: explicit Susp(std::function<T()> f) : _f(f) {} T get() { return _f(); } private: std::function<T()> _f; };

We often create suspensions using lambda functions, as in:

int x = 2; int y = 3; Susp<int> sum([x, y]() { return x + y; }); ... int z = sum.get();

Notice that the suspended lambda may capture variables from its environment: here x and y . A lambda, and therefore a suspension, is a closure .

The trouble with this implementation is that the function is re-executed every time we call get . There are several problems with that: If the function is not pure, we may get different values each time; if the function has side effects, these may happen multiple times; and if the function is expensive, the performance will suffer. All these problems may be addressed by memoizing the value.

Here’s the idea: The first time the client calls get we should execute the function and store the returned value in a member variable. Subsequent calls should go directly to that variable. We could implement this by setting a Boolean flag on the first call and then checking it on every subsequent call, but there’s a better implementation that uses thunks.

A thunk is a pointer to a free function taking a suspension (the this pointer) and returning a value (by const reference). The get method simply calls this thunk, passing it the this pointer.

Initially, the thunk is set to thunkForce , which calls the method setMemo . This method evaluates the function, stores the result in _memo , switches the thunk to thunkGet , and returns the memoized value. On subsequent calls get goes through the getMemo thunk which simply returns the memoized value.

template<class T> class Susp { // thunk static T const & thunkForce(Susp * susp) { return susp->setMemo(); } // thunk static T const & thunkGet(Susp * susp) { return susp->getMemo(); } T const & getMemo() { return _memo; } T const & setMemo() { _memo = _f(); _thunk = &thunkGet; return getMemo(); } public: explicit Susp(std::function<T()> f) : _f(f), _thunk(&thunkForce), _memo(T()) {} T const & get() { return _thunk(this); } private: T const & (*_thunk)(Susp *); mutable T _memo; std::function<T()> _f; };

(By the way, the function pointer declaration of _thunk looks pretty scary in C++, doesn’t it?)

[Edit: I decided to remove the discussion of the thread safe implementation since it wasn’t ready for publication. The current implementation is not thread safe.]

You can find a lot more detail about the Haskell implementation of suspended functions in the paper by Tim Harris, Simon Marlow, and Simon Peyton Jones, Haskell on a Shared-Memory Multiprocessor.

Lazy Stream

The loop we used to produce Pythagorean triples in C worked on the push principle — data was pushed towards the sink. If we want to deal with infinite lists, we have to use the pull principle. It should be up to the client to control the flow of data. That’s the inversion of control I was talking about in the introduction.

We’ll use a lazy list and call it a stream. In C++ a similar idea is sometimes expressed in terms of input and forward iterators, although it is understood that an iterator itself is not the source or the owner of data — just an interface to one. So we’ll stick with the idea of a stream.

We’ll implement the stream in the functional style as a persistent data structure fashioned after persistent lists (see my series of blog post on persistent data structures). It means that a stream, once constructed, is never modified. To “advance” the stream, we’ll have to create a new one by calling the const method pop_front .

Let’s start with the definition: A stream is either empty or it contains a suspended cell. This immediately suggests the implementation as a (possibly null) pointer to a cell. Since the whole stream is immutable, the cell will be immutable too, so it’s perfectly safe to share it between copies of the stream. We can therefore use a shared pointer:

template<class T> class Stream { private: std::shared_ptr <Susp<Cell<T>>> _lazyCell; };

Of course, because of reference counting and memoization, the stream is only conceptually immutable and, in the current implementation, not thread safe.

So what’s in the Cell ? Remember, we want to be able to generate infinite sequences, so Stream must contain the DNA for not only producing the value of type T but also for producing the offspring — another (lazy) Stream of values. The Cell is just that: A value and a stream.

template<class T> class Cell { public: Cell() {} // need default constructor for memoization Cell(T v, Stream<T> const & tail) : _v(v), _tail(tail) {} explicit Cell(T v) : _v(v) {} T val() const { return _v; } Stream<T> pop_front() const { return _tail; } private: T _v; Stream<T> _tail; };

This mutually recursive pair of data structures works together amazingly well.

template<class T> class Stream { private: std::shared_ptr <Susp<Cell<T>>> _lazyCell; public: Stream() {} Stream(std::function<Cell<T>()> f) : _lazyCell(std::make_shared<Susp<Cell<T>>>(f)) {} Stream(Stream && stm) : _lazyCell(std::move(stm._lazyCell)) {} Stream & operator=(Stream && stm) { _lazyCell = std::move(stm._lazyCell); return *this; } bool isEmpty() const { return !_lazyCell; } T get() const { return _lazyCell->get().val(); } Stream<T> pop_front() const { return _lazyCell->get().pop_front(); } };

There are several things worth pointing out. The two constructors follow our formal definition of the Stream : one constructs an empty stream, the other constructs a suspended Cell . A suspension is created from a function returning Cell .

I also added a move constructor and a move assignment operator for efficiency. We’ll see it used in the implementation of forEach .

The magic happens when we call get for the first time. That’s when the suspended Cell comes to life. The value and the new stream are produced and memoized for later use. Or, this may happen if the first call is to pop_front . Notice that pop_front is a const method — the Stream itself is immutable. The method returns a new stream that encapsulates the rest of the sequence.

Let’s get our feet wet by constructing a stream of integers from n to infinity. The constructor of a Stream takes a function that returns a Cell . We’ll use a lambda that captures the value of n . It creates a Cell with that value and a tail, which it obtains by calling intsFrom with n+1 :

Stream<int> intsFrom(int n) { return Stream<int>([n]() { return Cell<int>(n, intsFrom(n + 1)); }); }

It’s a recursive definition, but without the usual recursive function calls that eat up the stack. The call to the inner intsFrom is not made from the outer intsFrom . Instead it’s made the first time get is called on the emerging Stream .

Of course, we can also create finite streams, like this one, which produces integers from n to m :

Stream<int> ints(int n, int m) { if (n > m) return Stream<int>(); return Stream<int>([n, m]() { return Cell<int>(n, ints(n + 1, m)); }); }

The trick is to capture the limit m as well as the recursion variable n . When the limit is reached, we simply return an empty Stream .

We’ll also need the method take , which creates a Stream containing the first n elements of the original stream:

Stream take(int n) const { if (n == 0 || isEmpty()) return Stream(); auto cell = _lazyCell; return Stream([cell, n]() { auto v = cell->get().val(); auto t = cell->get().pop_front(); return Cell<T>(v, t.take(n - 1)); }); }

Here we are capturing the suspended cell and use it to lazily generate the elements of the new, truncated, Stream . Again, the key to understanding why this works is to keep in mind that Stream s and Cell s are conceptually immutable, and therefore can be shared by the implementation. This has some interesting side effects, which don’t influence the results, but change the performance. For instance, if the caller of take forces the evaluation of the first n elements — e.g., by passing them through the consuming forEach below — these elements will appear miraculously memoized in the original Stream .

Finally, we’ll need some way to iterate through streams. Here’s an implementation of forEach that consumes the stream while enumerating it and feeding its elements to a function.

template<class T, class F> void forEach(Stream<T> strm, F f) { while (!strm.isEmpty()) { f(strm.get()); strm = strm.pop_front(); } }

It’s the assignment:

strm = strm.pop_front();

which consumes the stream by decreasing the reference count of the head of the Stream . In particular, if you pass an rvalue Stream to forEach , its elements will be generated and deleted in lockstep. The algorithm will use constant memory, independent of the virtual length of the Stream . What Haskell accomplishes with garbage collection, we approximate in C++ with reference counting and shared_ptr .

Working with Streams

It’s not immediately obvious how to translate our Pythagorean triple program from nested loops to lazy streams, so we’ll have to take inspiration from the corresponding Haskell program. Let me first rewrite it using a slightly different notation:

triples = do z <- [1..] x <- [1..z] y <- [x..z] guard (x^2 + y^2 == z^2) return (x, y, z)

The general idea is this: Start with the stream of integers from 1 to infinity. For every such integer — call it z — create a stream from 1 to z . For each of those — call them x — create a stream from x to z . Filter out those which don’t satisfy the Pythagorean constraint. Finally, output a stream of tuples (x, y, z) .

So far we’ve learned how to create a stream of integers — we have the function intsFrom . But now we’ll have to do something for each of these integers. We can’t just enumerate those integers and apply a function to each, because that would take us eternity. So we need a way to act on each element of a stream lazily.

In functional programming this is called mapping a function over a list. In general, a parameterized data structure that can be mapped over is called a functor. I’m going to show you that our Stream is a functor.

Stream as a Functor

The idea is simple: we want to apply a function to each element of a stream to get a new transformed stream (it’s very similar to the std::transform algorithm from STL). The catch is: We want to do it generically and lazily.

To make the algorithm — we’ll call it fmap — generic, we have to parameterize it over types. The algorithm starts with a Stream of elements of type T and a function from T to some other type U . The result should be a stream of U .

We don’t want to make U the template argument, because then the client would have to specify it explicitly. We want the compiler to deduce this type from the type of the function. We want, therefore, the function type F to be the parameter of our template (this will also allow us to call it uniformly with function pointers, function objects, and lambdas):

template<class T, class F> auto fmap(Stream<T> stm, F f)

Without the use of concepts, we have no way of enforcing, or even specifying, that F be a type of a function from T to U . The best we can do is to statically assert it inside the function:

static_assert(std::is_convertible<F, std::function<U(T)>>::value, "fmap requires a function type U(T)");

But what is U ? We can get at it using decltype :

decltype(f(stm.get()));

Notice that decltype takes, as an argument, an expression that can be statically typed. Here, the expression is a function call of f . We also need a dummy argument for this function: we use the result of stm.get() . The argument to decltype is never evaluated, but it is type-checked at compile time.

One final problem is how to specify the return type of fmap . It’s supposed to be Stream<U> , but we don’t know U until we apply decltype to the arguments of fmap . We have to use the new auto function declaration syntax of C++11. So here are all the type-related preliminaries:

template<class T, class F> auto fmap(Stream<T> stm, F f)->Stream<decltype(f(stm.get()))> { using U = decltype(f(stm.get())); static_assert(std::is_convertible<F, std::function<U(T)>>::value, "fmap requires a function type U(T)"); ... }

Compared to that, the actual implementation of fmap seems rather straightforward:

if (stm.isEmpty()) return Stream<U>(); return Stream<U>([stm, f]() { return Cell<U>(f(stm.get()), fmap(stm.pop_front(), f)); });

In words: If the stream is empty, we’re done — return an empty stream. Otherwise, create a new stream by suspending a lambda function. That function captures the original stream (by value) and the function f , and returns a Cell . That cell contains the value of f acting on the first element of the original stream, and a tail. The tail is created with fmap acting on the rest of the original stream.

Equipped with fmap , we can now attempt to take the first step towards generating our triples: apply the function ints(1, z) to each element of the stream intsFrom(1) :

fmap(intsFrom(1), [](int z) { return ints(1, z); });

The result is a Stream of Stream s of integers of the shape:

1 1 2 1 2 3 1 2 3 4 1 2 3 4 5 ...

But now we are stuck. We’d like to apply ints(x, z) to each element of that sequence, but we don’t know how to get through two levels of Stream . Our fmap can only get through one layer. We need a way to flatten a Stream of Stream s. That functionality is part of what functional programmers call a monad. So let me show you that Stream is indeed a monad.

Stream as a Monad

If you think of a Stream as a list, the flattening of a list of lists is just concatenation. Suppose for a moment that we know how to lazily concatenate two Stream s (we’ll get to it later) and let’s implement a function mjoin that concatenates a whole Stream of Stream s.

You might have noticed a pattern in the implementation of lazy functions on streams. We use some kind of recursion, which starts with “Are we done yet?” If not, we do an operation that involves one element of the stream and the result of a recursive call to the function itself.

The “Are we done yet?” question usually involves testing for an empty stream. But here we are dealing with a Stream of Stream s, so we have to test two levels deep. This way we’ll ensure that the concatenation of a Stream of empty Stream s immediately returns an empty Stream .

The recursive step in mjoin creates a Cell whose element is the head of the first stream, and whose tail is the concatenation of the tail of the first stream and the result of mjoin of the rest of the streams:

template<class T> Stream<T> mjoin(Stream<Stream<T>> stm) { while (!stm.isEmpty() && stm.get().isEmpty()) { stm = stm.pop_front(); } if (stm.isEmpty()) return Stream<T>(); return Stream<T>([stm]() { Stream<T> hd = stm.get(); return Cell<T>( hd.get() , concat(hd.pop_front(), mjoin(stm.pop_front()))); }); }

The combination of fmap and mjoin lets us compose function like intsFrom or ints that return Stream s. In fact, this combination is so common that it deserves its own function, which we’ll call mbind :

template<class T, class F> auto mbind(Stream<T> stm, F f) -> decltype(f(stm.get())) { return mjoin(fmap(stm, f)); }

If we use mbind in place of fmap :

mbind(intsFrom(1), [](int z) { return ints(1, z); });

we can produce a flattened list:

1 1 2 1 2 3 1 2 3 4...

But it’s not just the list: Each element of the list comes with variables that are defined in its environment — here the variable z . We can keep chaining calls to mbind and capture more variables in the process:

mbind(intsFrom(1), [](int z) { return mbind(ints(1, z), [z](int x) { return mbind(ints(x, z), [x, z](int y) { ... } } }

At this point we have captured the triples x , y , z , and are ready for the Pythagorean testing. But before we do it, let’s define two additional functions that we’ll use later.

The first one is mthen which is a version of mbind that takes a function of no arguments. The idea is that such a function will be executed for each element of the stream, but it won’t use the value of that element. The important thing is that the function will not be executed when the input stream is empty. In that case, mthen will return an empty stream.

We implement mthen using a slightly modified version of fmap that takes a function f of no arguments:

template<class T, class F> auto fmapv(Stream<T> stm, F f)->Stream<decltype(f())> { using U = decltype(f()); static_assert(std::is_convertible<F, std::function<U()>>::value, "fmapv requires a function type U()"); if (stm.isEmpty()) return Stream<U>(); return Stream<U>([stm, f]() { return Cell<U>(f(), fmapv(stm.pop_front(), f)); }); }

We plug it into the definition of mthen the same way fmap was used in mbind :

template<class T, class F> auto mthen(Stream<T> stm, F f) -> decltype(f()) { return mjoin(fmapv(stm, f)); }

The second useful function is mreturn , which simply turns a value of any type into a one-element Stream :

template<class T> Stream<T> mreturn(T v) { return Stream<T>([v]() { return Cell<T>(v); }); }

We’ll need mreturn to turn our triples into Stream s.

It so happens that a parameterized type equipped with mbind and mreturn is called a monad (it must also satisfy some additional monadic laws, which I won’t talk about here). Our lazy Stream is indeed a monad.

Stream as a Monoid and a Monad Plus

When implementing mjoin we used the function concat to lazily concatenate two Stream s. Its implementation follows the same recursive pattern we’ve seen so many times:

template<class T> Stream<T> concat( Stream<T> lft , Stream<T> rgt) { if (lft.isEmpty()) return rgt; return Stream<T>([=]() { return Cell<T>(lft.get(), concat<T>(lft.pop_front(), rgt)); }); }

What’s interesting is that the concatenation of streams puts them under yet another well known functional pattern: a monoid. A monoid is equipped with a binary operation, just like concat , which must be associative and possess a unit element. It’s easy to convince yourself that concatenation of Stream s is indeed associative, and that the neutral element is an empty Stream . Concatenating an empty Stream , whether in front or in the back of any other Stream , doesn’t change the original Stream .

What’s even more interesting is that being a combination of a monoid and a monad makes Stream into a monad plus, and every monad plus defines a guard function — exactly what we need for the filtering of our triples. This function takes a Boolean argument and outputs a Stream . If the Boolean is false, the Stream is empty (the unit element of monad plus!), otherwise it’s a singleton Stream . We really don’t care what value sits in this Stream — we never use the result of guard for anything but the flow of control. In Haskell, there is a special “unit” value () — here I use a nullptr as its closest C++ analog.

Stream<void*> guard(bool b) { if (b) return Stream<void*>(nullptr); else return Stream<void*>(); }

We can now pipe the result of guard into mthen , which will ignore the content of the Stream but won’t fire when the Stream is empty. When the Stream is not empty, we will call mreturn to output a singleton Stream with the result tuple:

Stream<std::tuple<int, int, int>> triples() { return mbind( intsFrom(1) , [](int z ) { return mbind( ints(1, z) , [z](int x ) { return mbind( ints(x, z) , [x, z](int y ) { return mthen( guard(x*x + y*y == z*z) , [x, y, z]() { return mreturn( std::make_tuple(x, y, z) ); }); }); }); }); }

These singletons will then be concatenated by the three levels of mbind to create one continuous lazy Stream of Pythagorean triples.

Compare this function with its Haskell counterpart:

triples = do z <- [1..] x <- [1..z] y <- [x..z] guard (x^2 + y^2 == z^2) return (x, y, z)

Now, the client can take 10 of those triples from the Stream — and the triples still won’t be evaluated!. It’s the consuming forEach that finally forces the evaluation:

void test() { auto strm = triples().take(10); forEach(std::move(strm), [](std::tuple<int, int, int> const & t) { std::cout << std::get<0>(t) << ", " << std::get<1>(t) << ", " << std::get<2>(t) << std::endl; }); }

Conclusion

The generation of Pythagorean triples is a toy example, but it shows how lazy evaluation can be used to restructure code in order to make it more reusable. You can use the same function triples to print the values in one part of your program and draw triangles in another. You can filter the triples or impose different termination conditions. You can use the same trick to generate an infinite set of approximation to the solution of a numerical problem, and then use different strategies to truncate it. Or you can create an infinite set of animation frames, and so on.

The building blocks of laziness are also reusable. I have used them to implement the solution to the eight-queen problem and a conference scheduling program. Once they made thread safe, the combinators that bind them are thread safe too. This is, in general, the property of persistent data structures.

You might be concerned about the performance of lazy data structures, and rightly so. They use the heap heavily, so memory allocation and deallocation is a serious performance bottleneck. There are many situation, though, where code structure, reusability, maintenance, and correctness (especially in multithreaded code) are more important than performance. And there are some problems that might be extremely hard to express without the additional flexibility gained from laziness.

I made the sources to all code in this post available on GitHub.