There was a recent question on Haskell-cafe about how to parallelize a simple “rule engine”. I spent a while investigating it, and the solution is both pretty and fast. And my response on haskell-cafe is really messy because gmail is silly. So here is my response, cleaned up, in literate Haskell:

> {-# LANGUAGE RankNTypes #-} > > import qualified Data . MemoCombinators as Memo > import qualified Data . Set as Set > import Control . Parallel ( par ) > import qualified Control . Parallel . Strategies as Par > import Data . Monoid ( Monoid ( .. ) ) > import Control . Monad . State > import qualified Data . DList as DList >

First, I want to capture the idea of a generative set like you’re doing. GenSet is like a set, with the constructor “genset x xs” which says “if x is in the set, then so are xs”.

I’ll represent it as a stateful computation of the list of things in the set, threading the set of things we’ve seen so far. It’s redundant information, but sets can’t be consumed lazily, thus the list (the set will follow along lazily :-).

Remember that State s a is just the function (s -> (s,a)). So we’re taking the set of things we’ve seen so far, and returning the new elements added and the set unioned with those elements.

> newtype GenSet a > = GenSet ( State ( Set . Set a ) ( DList . DList a ) ) > > genset :: ( Ord a ) => a -> GenSet a -> GenSet a > genset x ( GenSet f ) = GenSet $ do > seen <- gets ( x ` Set . member ` ) > if seen > then return mempty > else fmap ( DList . cons x ) $ > modify ( Set . insert x ) >> f > > toList :: GenSet a -> [ a ] > toList ( GenSet f ) = DList . toList $ evalState f Set . empty

GenSet is a monoid, where mappend is just union.

> instance ( Ord a ) => Monoid ( GenSet a ) where > mempty = GenSet ( return mempty ) > mappend ( GenSet a ) ( GenSet b ) = > GenSet ( liftM2 mappend a b )

Okay, so that’s how we avoid exponential behavior when traversing the tree. We can now just toss around GenSets like they’re sets and everything will be peachy.

Here’s the heart of the algorithm: the reduce function. To avoid recomputation of rules, we could just memoize the rule function. But we’ll do something a little more clever. The function we’ll memoize (“parf”) first sparks a thread computing its *last* child. Because the search is depth-first, it will typically be a while until we get to the last one, so we benefit from the spark (you don’t want to spark a thread computing something you’re about to compute anyway).

> reduce :: ( Ord a ) => Memo . Memo a -> ( a -> [ a ] ) -> a -> [ a ] > reduce memo f x = toList ( makeSet x ) > where > makeSet x = genset x . mconcat . map makeSet . f' $ x > f' = memo parf > parf a = let ch = f a in > ch `seq` ( f' ( last ch ) `par` ch )

The ch `seq` is there so that the evaluation of ch and last ch aren’t competing with each other.

Your example had a few problems. You said the rule was supposed to be expensive, but yours was cheap. Also, [x-1,x-2,x-3] are all very near each other, so it’s hard to go do unrelated stuff. I made a fake expensive function before computing the neighbors, and tossed around some prime numbers to scatter the space more.

> rule :: Int -> [ Int ] > rule n = expensive `seq` > [ next 311 4 , next 109 577 , next 919 353 ] > where > next x y = ( x * n + y ) `mod` 5000 > expensive = sum [ 1 .. 50 * n ] > > main :: IO () > main = do > let r = reduce Memo . integral rule 1 > print ( length r )

The results are quite promising:

% ghc --make -O2 rules2 -threaded % time ./rules2 5000 ./rules2 13.25s user 0.08s system 99% cpu 13.396 total % time ./rules2 +RTS -N2 5000 ./rules2 +RTS -N2 12.52s user 0.30s system 159% cpu 8.015 total

That’s 40% decrease in running time! Woot! I’d love to see what it does on a machine with more than 2 cores.