The implicit memoization however is a trade-off. We trade space to store expression's result for time gained in avoiding recomputations. The trade-off is particularly worthy if the result takes much less memory than needed for the closure (thunk) of the expression. This is often the case with numeric code. Non-deterministic, probabilistic programming and general AI search problems are the opposite case. A non-deterministic expression is typically represented as a lazy search tree, which is often huge even for small expressions. It becomes a better trade-off to re-evaluate an expression rather than to fill all memory with results.

Alas, GHC is designed for the opposite trade-off. Therefore, using Haskell even for simple search problems is quite a challenge since memoization gets in the way. Preventing the memoization is surprisingly hard, since GHC is very good at finding the opportunities for it, even within thunks. This article uses a typical example of non-deterministic search to illustrate the problem posed by lazy evaluation and to describe a few tricks to prevent memoization. Some of them are unexpected.

Our running example computes and prints the first n elements of the infinite stream of Pythagorean triples pyth , using three infinite streams of integers from 1 . As typical for non-deterministic programs, the example generates candidate solutions and rejects most of them.

from :: MonadPlus m => Int -> m Int from i = return i `mplus` from (i+1) pyth :: MonadPlus m => m (Int,Int,Int) pyth = do x <- from 1 y <- from 1 z <- from 1 if x*x + y*y == z*z then return (x,y,z) else mzero

data Tree1 a = Fail1 | Val1 a | Node1 (Tree1 a) (Tree1 a)

Tree1

Node1 e1 e2 >>= f = Node1 (e1 >>= f) (e2 >>= f) mplus = Node1

To `run' the non-deterministic computation and produce the the stream of triples, we traverse the Tree , extract the successfully produced results from the Val leaves and return them as a lazy list. Different tree traversals correspond to different non-deterministic search strategies. Depth-first traversal (DFS) is the most efficient, needing only O(d) space to examine a node at depth d . Alas, an infinite branch in the tree traps DFS. In our pyth tree, DFS will get stuck chasing an infinite chain of Fail . Breadth-first traversal (BFS) in contrast shall visit any node in a tree, given time. BFS is a complete strategy: if a solution (leaf Val ) exists, BFS will find it. Alas, BFS needs a lot of space to maintain the job queue, the frontier of the search. At search depth d the frontier may take O(2^d) space. Iterative deepening is a hybrid method, complete as BFS yet needing as little of working space as DFS. Iterative deepening explores the progressively long `prefix' of tree with DFS. Each new exploration phase repeats all the work of the previous explorations of shallower prefixes. Iterative deepening clearly trades time for space. Despite its gross wastefulness, the method is quite popular, for example, in automated theorem proving. Its trade-off has proved worthwhile.

Here are the results of computing and printing the first n Pythagorean triples. The code was compiled by GHC 7.0.4 with optimization -O2 .

Mutator time, sec CG time, sec Memory in use, MB Average residency, KB BFS, n=30 13.0 5.0 3 465 Iter Deep, n=30 0.15 0.06 5 1506 Iter Deep, n=100 4.8 1.3 56 20832

Recall that iterative deepening keeps re-traversing the tree. Each exploration cycle redoes all the previous explorations. Lazy evaluation helps, it seems. When we first reach Node e1 e2 , we evaluate e1 and e2 that were stored in the node unevaluated (otherwise, we would have diverged constructing the tree, which is infinite). Lazy evaluation replaces e1 and e2 with their results. When iterative deepening comes across the same node in a new cycle, it gets the results of e1 and e2 right away. That seems like a good thing, until we look at the space. As iterative deepening explores the Tree , it needs more and more memory to store the explored prefix, which is about twice the size of the BFS frontier. Lazy evaluation thus defeats the purpose of iterative deepening, of recomputing the revisited tree nodes to avoid storing them. Lazy evaluation does exactly the wrong thing.

In a strict language, we would have used thunks to represent infinite trees. If tree nodes store thunks, lazy evaluation would memoize thunks -- which evaluate to themselves rather than to trees. It seems therefore the following modification should stop lazy evaluation's meddling in iterative deepening.

data Tree2 a = Fail2 | Val2 a | Node2 (() -> Tree2 a) (() -> Tree2 a) Node2 e1 e2 >>= f = Node2 (\() -> e1 () >>= f) (\() -> e2 () >>= f) mplus e1 e2 = Node2 (\() -> e1) (\() -> e2)

Node

Mutator time, sec CG time, sec Memory in use, MB Average residency, KB BFS, n=30 13.0 5.0 3 509 Iter Deep, n=30 0.3 0.1 8 2964 Iter Deep, n=100 10.6 1.7 96 39244

Such an unexpected result was quite a puzzle. It seems GHC is just too smart. Apparently it notices that a thunk (\() -> e) can only be applied to the same argument. Therefore, the first time the thunk is forced by applying it to () , the result can justifiably be memoized: the next time around the thunk will be applied to the same () , and hence, will give the same result anyway.

The new fix is to deliberately confuse GHC. We obfuscate the tree-construction operations (>>=) and mplus with auxiliary functions app and app1 .

Node3 e1 e2 >>= f = Node3 (app1 e1 f) (app1 e2 f) mplus e1 e2 = Node3 (app e1) (app e2) {-# NOINLINE app #-} app e () = e {-# NOINLINE app1 #-} app1 e f () = e () >>= f

Mutator time, sec CG time, sec Memory in use, MB Average residency, KB BFS, n=30 13.2 4.7 3 413 Iter Deep, n=30 0.4 0.03 2 78 Iter Deep, n=100 13.4 0.9 2 413

We have seen that lazy evaluation is a trade-off, which may be hurtful in some cases, in particular, in search problems over huge data structures, where it is often beneficial to recompute the result than to store it. Preventing lazy evaluation is possible but surprisingly tricky.