13th November 2008, 10:20 pm

I read Max Rabkin’s recent post Beautiful folding with great excitement. He shows how to make combine multiple folds over the same list into a single pass, which can then drastically reduce memory requirements of a lazy functional program. Max’s trick is giving folds a data representation and a way to combine representations that corresponds to combining the folds.

Peeking out from behind Max’s definitions is a lovely pattern I’ve been noticing more and more over the last couple of years, namely type class morphisms.

Folds as data

Max gives a data representation of folds and adds on an post-fold step, which makes them composable.

data Fold b c = forall a. F (a -> b -> a) a (a -> c)

The components of a Fold are a (strict left) fold’s combiner function and initial value, plus a post-fold step. This interpretation is done by a function cfoldl' , which turns these data folds into function folds:

cfoldl' :: Fold b c -> [b] -> c cfoldl' (F op e k) = k . foldl' op e

where foldl' is the standard strict, left-fold functional:

foldl' :: (a -> b -> a) -> a -> [b] -> a foldl' op a [] = a foldl' op a (b:bs) = let a' = a `op` b in a' `seq` foldl' f a' bs

Standard classes

As Twan van Laarhoven pointed out in a comment on Max’s post, Fold b is a functor and an applicative functor, so some of Max’s Fold -manipulating functions can be replaced by standard vocabulary.

The Functor instance is pretty simple:

instance Functor (Fold b) where fmap h (F op e k) = F op e (h . k)

The Applicative instance is a bit trickier. For strictness, Max used used a type of strict pairs:

data Pair c c' = P !c !c'

The instance:

instance Applicative (Fold b) where pure a = F (error "no op") (error "no e") (const a) F op e k <*> F op' e' k' = F op'' e'' k'' where P a a' `op''` b = P (a `op` b) (a' `op'` b) e'' = P e e' k'' (P a a') = (k a) (k' a')

Given that Fold b is an applicative functor, Max’s bothWith function is then liftA2 . Max’s multi-cfoldl' rule then becomes:

forall c f g xs. h (cfoldl' f xs) (cfoldl' g xs) == cfoldl' (liftA2 h f g) xs

thus replacing two passes with one pass.

Beautiful properties

Now here’s the fun part. Looking at the Applicative instance for ((->) a) , the rule above is equivalent to

forall c f g. liftA2 h (cfoldl' f) (cfoldl' g) == cfoldl' (liftA2 h f g)

Flipped around, this rule says that liftA2 distributes over cfoldl' . Or, “the meaning of liftA2 is liftA2 “. Neat, huh?

Moreover, this liftA2 property is equivalent to the following:

forall f g. cfoldl' f <*> cfoldl' g == cfoldl' (f <*> g)

This form is one of the two Applicative morphism laws (which I usually write in the reverse direction):

For more about these morphisms, see Simplifying semantics with type class morphisms. That post suggests that semantic functions in particular ought to be type class morphisms (and if not, then you’d have an abstraction leak). And cfoldl' is a semantic function, in that it gives meaning to a Fold .

The other type class morphisms in this case are

cfoldl' (pure a ) == pure a cfoldl' (fmap h f) == fmap h (cfoldl' f)

Given the Functor and Applicative instances of ((->) a) , these two properties are equivalent to

cfoldl' (pure a ) == const a cfoldl' (fmap h f) == h . cfoldl' f

or

cfoldl' (pure a ) xs == a cfoldl' (fmap h f) xs == h (cfoldl' f xs)

Rewrite rules

Max pointed out that GHC does not handle his original multi-cfoldl' rule. The reason is that the head of the LHS (left-hand side) is a variable. However, the type class morphism laws have constant (known) functions at the head, so I expect they could usefully act as fusion rewrite rules.

Inevitable instances

Given the implementations (instances) of Functor and Applicative for Fold , I’d like to verify that the morphism laws for cfoldl' (above) hold.

Functor

Start with fmap . The morphism law:

cfoldl' (fmap h f) == fmap h (cfoldl' f)

First, give the Fold argument more structure, so that (without loss of generality) the law becomes

cfoldl' (fmap h (F op e k)) == fmap h (cfoldl' (F op e k))

The game is to work backward from this law to the definition of fmap for Fold . I’ll do so by massaging the RHS (right-hand side) into the form cfoldl' (...) , where “ ... ” is the definition fmap h (F op e k) .

fmap h (cfoldl' (F op e k)) == {- inline cfoldl' -} fmap h (k . foldl' op e) == {- inline fmap on functions -} h . (k . foldl' op e) == {- associativity of (.) -} (h . k) . foldl' op e == {- uninline cfoldl' -} cfoldl' (F op e (h . k)) == {- uninline fmap on Fold -} cfoldl' (fmap h (F op e k))

This proof show why Max had to add the post-fold function k to his Fold type. If k weren’t there, we couldn’t have buried the h in it.

More usefully, this proof suggests how we could have discovered the fmap definition. For instance, we might have tried with a simpler and more obvious Fold representation:

data FoldS a b = FS (a -> b -> a) a

Getting into the fmap derivation, we’d come to

h . foldsl' op e

and then we’d be stuck. But not really, because the awkward extra bit ( h . ) beckons us to generalize by adding Max’s post-fold function.

Applicative

Next pure:

cfoldl' (pure a) == pure a

Reason as before, starting with the RHS

pure a == {- inline pure on functions -} const a == {- property of const -} const a . foldl op e == {- uninline cfoldl' -} cfoldl' (F (const a) op e)

The imaginative step was inventing structure to match the definition of cfoldl' . This definition is true for any values of op and e , so we can use bottom in the definition:

instance Applicative (Fold b) where pure a = F undefined undefined (const a)

As Twan noticed, the existential ( forall ) also lets us pick defined values for op and e . He chose (_ _ -> ()) and () .

The derivation of (<*>) is trickier and is the heart of the problem of fusing folds to reduce multiple traversals to a single one. Why the heart? Because (<*>) is all about combining two things into one.

Intermission

I’m taking a break here. While fiddling with a proof of the (<*>) morphism law, I realized a simpler way to structure these folds, which will be the topic of an upcoming post.

I read Max Rabkin’s recent post Beautiful folding with great excitement. He shows how to make combine multiple folds over the same list into a single pass, which can then...