1st March 2011, 02:33 pm

The post Deriving list scans gave a simple specification of the list-scanning functions scanl and scanr , and then transformed those specifications into the standard optimized implementations. Next, the post Deriving parallel tree scans adapted the specifications and derivations to a type of binary trees. The resulting implementations are parallel-friendly, but not work-efficient, in that they perform n log n work vs linear work as in the best-known sequential algorithm.

Besides the work-inefficiency, I don’t know how to extend the critical initTs and tailTs functions (analogs of inits and tails on lists) to depth-typed, perfectly balanced trees, of the sort I played with in A trie for length-typed vectors and From tries to trees. The difficulty I encounter is that the functions initTs and tailTs make unbalanced trees out of balanced ones, so I don’t know how to adapt the specifications when types prevent the existence of unbalanced trees.

This new post explores an approach to generalized scanning via type classes. After defining the classes and giving a simple example, I’ll give a simple & general framework based on composing functor combinators.

Edits:

2011-03-02: Fixed typo. "constant functor is easiest" (instead of "identity functor"). Thanks, frguybob.

2011-03-05: Removed final unfinished sentence.

2011-07-28: Replace " assocL " with " assocR " in prefixScan derivation for g ∘ f .

Generalizing list scans

The left and right scan functions on lists have an awkward feature. The output list has one more element than the input list, corresponding to the fact that the number of prefixes ( inits ) of a list is one more than the number of elements, and similarly for suffixes ( tails ).

While it’s easy to extend a list by adding one more element, it’s not easy with other functors. In Deriving parallel tree scans, I simply removed the ∅ element from the scan. In this post, I’ll instead change the interface to produce an output of exactly the same shape, plus one extra element. The extra element will equal a fold over the complete input. If you recall, we had to search for that complete fold in an input subtree in order to adjust the other subtree. (See headT and lastT and their generalizations in Deriving parallel tree scans.) Separating out this value eliminates the search.

Define a class with methods for prefix and suffix scan:

class Scan f where

prefixScan, suffixScan ∷ Monoid m ⇒ f m → (m, f m)

Prefix scans ( prefixScan ) accumulate moving left-to-right, while suffix scans ( suffixScan ) accumulate moving right-to-left.

A simple example: pairs

To get a first sense of generalized scans, let’s use see how to scan over a pair functor.

data Pair a = a :# a deriving ( Eq , Ord , Show )

With GHC’s DeriveFunctor option, we could also derive a Functor instance, but for clarity, define it explicitly:

instance Functor Pair where

fmap f (a :# b) = (f a :# f b)

The scans:

instance Scan Pair where

prefixScan (a :# b) = (a ⊕ b, (∅ :# a))

suffixScan (a :# b) = (a ⊕ b, (b :# ∅))

As you can see, if we eliminated the ∅ elements, we could shift to the left or right and forgo the extra result.

Naturally, there is also a Fold instance, and the scans produce the fold results as well sub-folds:

instance Foldable Pair where

fold (a :# b) = a ⊕ b

The Pair functor also has unsurprising instances for Applicative and Traversable .

instance Applicative Pair where

pure a = a :# a

(f :# g) <*> (x :# y) = (f x :# g y)



instance Traversable Pair where

sequenceA (fa :# fb) = ( :# ) <$> fa <*> fb

We don’t really have to figure out how to define scans for every functor separately. We can instead look at how functors are are composed out of their essential building blocks.

Scans for functor combinators

To see how to scan over a broad range of functors, let’s look at each of the functor combinators, e.g., as in Elegant memoization with higher-order types.

Constant

The constant functor is easiest.

newtype Const x a = Const x

There are no values to accumulate, so the final result (fold) is ∅ .

instance Scan ( Const x) where

prefixScan ( Const x) = (∅, Const x)

suffixScan = prefixScan

Identity

The identity functor is nearly as easy.

newtype Id a = Id a

instance Scan Id where

prefixScan ( Id m) = (m, Id ∅)

suffixScan = prefixScan

Sum

Scanning in a sum is just scanning in a summand:

data (f + g) a = InL (f a) | InR (g a)

instance ( Scan f, Scan g) ⇒ Scan (f + g) where

prefixScan ( InL fa) = second InL (prefixScan fa)

prefixScan ( InR ga) = second InR (prefixScan ga)



suffixScan ( InL fa) = second InL (suffixScan fa)

suffixScan ( InR ga) = second InR (suffixScan ga)

These definitions correspond to simple "commutative diagram" properties, e.g.,

prefixScan ∘ InL ≡ second InL ∘ prefixScan

Product

Product scannning is a little trickier.

data (f × g) a = f a × g a

Scan each of the two parts separately, and then combine the final ( fold ) part of one result with each of the non-final elements of the other.

instance ( Scan f, Scan g, Functor f, Functor g) ⇒ Scan (f × g) where

prefixScan (fa × ga) = (af ⊕ ag, fa' × ((af ⊕) <$> ga'))

where (af,fa') = prefixScan fa

(ag,ga') = prefixScan ga



suffixScan (fa × ga) = (af ⊕ ag, ((⊕ ag) <$> fa') × ga')

where (af,fa') = suffixScan fa

(ag,ga') = suffixScan ga

Composition

Finally, composition is the trickiest.

newtype (g ∘ f) a = O (g (f a))

The target signatures:

prefixScan, suffixScan ∷ Monoid m ⇒ (g ∘ f) m → (m, (g ∘ f) m)

To find the prefix and suffix scan definitions, fiddle with types beginning at the domain type for prefixScan or suffixScan and arriving at the range type.

Some helpers:

zip ∷ Applicative g ⇒ (g a, g b) → g (a,b)

zip = uncurry (liftA2 (,))



unzip ∷ Functor g ⇒ g (a,b) → (g a, g b)

unzip = fmap fst &&& fmap snd

assocR ∷ ((a,b),c) → (a,(b,c))

assocR ((a,b),c) = (a,(b,c))

adjustL ∷ ( Functor f, Monoid m) ⇒ (m, f m) → f m

adjustL (m, ms) = (m ⊕) <$> ms



adjustR ∷ ( Functor f, Monoid m) ⇒ (m, f m) → f m

adjustR (m, ms) = (⊕ m) <$> ms

First prefixScan :

gofm ∷ (g ∘ f) m

unO '' ∷ g (f m)

fmap prefixScan '' ∷ g (m, f m)

unzip '' ∷ (g m, g (f m))

first prefixScan '' ∷ ((m, g m), g (f m))

assocR '' ∷ (m, (g m, g (f m)))

second zip '' ∷ (m, g (m, f m))

second ( fmap adjustL) '' ∷ (m, g (f m))

second O '' ∷ (m, (g ∘ f) m)

Then suffixScan :

gofm ∷ (g ∘ f) m

unO '' ∷ g (f m)

fmap suffixScan '' ∷ g (m, f m)

unzip '' ∷ (g m, g (f m))

first suffixScan '' ∷ ((m, g m), g (f m))

assocR '' ∷ (m, (g m, g (f m)))

second zip '' ∷ (m, (g (m, f m)))

second ( fmap adjustR) '' ∷ (m, (g (f m)))

second O '' ∷ (m, ((g ∘ f) m))

Putting together the pieces and simplifying just a bit leads to the method definitions:

instance ( Scan g, Scan f, Functor f, Applicative g) ⇒ Scan (g ∘ f) where

prefixScan = second ( O ∘ fmap adjustL ∘ zip )

∘ assocR

∘ first prefixScan

∘ unzip

∘ fmap prefixScan

∘ unO



suffixScan = second ( O ∘ fmap adjustR ∘ zip )

∘ assocR

∘ first suffixScan

∘ unzip

∘ fmap suffixScan

∘ unO

What’s coming up?

What might not easy to spot at this point is that the prefixScan and suffixScan methods given in this post do essentially the same job as in Deriving parallel tree scans, when the binary tree type is deconstructed into functor combinators. A future post will show this connection.

and methods given in this post do essentially the same job as in Deriving parallel tree scans, when the binary tree type is deconstructed into functor combinators. A future post will show this connection. Switch from standard (right-folded) trees to left-folded trees (in the sense of A trie for length-typed vectors and From tries to trees), which reduces the running time from Θ ( n log n ) to Θ n .

to . Scanning in place, i.e., destructively replacing the values in the input structure rather than allocating a new structure.

The post Deriving list scans gave a simple specification of the list-scanning functions scanl and scanr, and then transformed those specifications into the standard optimized implementations. Next, the post Deriving...