

> f1 [] (sofar,max) = (sofar,max)

> f1 (b:bs) (sofar,max) =

> let sofar' = if sofar+b<0

> then 0

> else sofar+b

> max' = if max<sofar'

> then sofar'

> else max

> in f1 bs (sofar',max')





sofar

max

sofar

sum

0

-infinity



> b :: [Double]

> b = [1..5] ++ [5,4..(-10)] ++ [(-2)..6]



> infinity :: Double

> infinity = 1/0

> test1 b = snd $ f1 b (0,-infinity)





0

-infinity

test1 b

sofar

max

sofar

b

max



> f2 [] (sofar,max) = (sofar,max)

> f2 (b:bs) (sofar,max) =

> let sofar' = Prelude.max (sofar+b) 0

> max' = Prelude.max max sofar'

> in f2 bs (sofar',max')



> test2 b = snd $ f2 b (0,-infinity)





max

f2

max

0

1

-infinity

max

0



> f3 [] (sofar,max) = (sofar,max)

> f3 (b:bs) (sofar,max) =

> let sofar' = sofar*b+1

> max' = max+sofar'

> in f3 bs (sofar',max')



> test3 b = snd $ f3 b (1,0)





f3

(sofar,max)

f . g

f3



> f4 [] (sofar,max,i) = (sofar,max,i)

> f4 (b:bs) (sofar,max,i) =

> let sofar' = (sofar * b) + i

> max' = max + sofar'

> i' = i

> in f4 bs (sofar',max',i')



> test4 b = let (_,max,_) = f4 b (1,0,1) in max







> x,y,z :: Num a => (a,a,a)

> x = (1,0,0)

> y = (0,1,0)

> z = (0,0,1)





x,y,z



> matrix f = (f x,f y,f z)







> (a,b,c) .+ (d,e,f) = (a + d,b + e,c + f)

> a .* (b,c,d) = (a * b,a * c,a * d)





f

f



> apply (mx,my,mz) (sofar,max,i) = (sofar .* mx) .+ (max .* my) .+ (i .* mz)





f4

f4



> test5 b = let m = matrix (f4 b)

> (_,max,_) = apply m (1,0,1)

> in max





m

(1,0,1)

b

n



> chop n [] = []

> chop n l = let (a,b) = splitAt n l in a : chop n b







> test6 b = max where

> (_,max,_) = foldr ($) (1,0,1) (reverse b_functions) where

> b_pieces = chop 10 b





map



> b_matrices = map (matrix . f4) b_pieces

> b_functions = map apply b_matrices





max

max



> newtype MaxPlus = M Double deriving (Eq,Show)



> instance Num MaxPlus where

> fromInteger 0 = M (-infinity)

> fromInteger 1 = M 0

> fromInteger _ = error "no conversion from general integer"

> M a + M b = M (max a b)

> M a * M b = M (a+b)

> abs _ = error "no abs"

> signum _ = error "no signum"

> negate _ = error "no negate"







> test7 b = test6 (fmap M b)





fmap M

M

matrix

The other day I came across the paper Parallelizing Complex Scans and Reductions lying on a colleague's desk. The first part of the paper discussed how to make a certain family of algorithms run faster on parallel machines and the second half of the paper went on to show how, with some work, the method could be stretched to a wider class of algorithm. What the authors seemed to miss was that the extra work really wasn't necessary and the methods of the first half apply, with no change, to the second half. But don't take this as a criticism! I learnt a whole new way to approach algorithm design, and the trick to making the second half easy uses methods that have become popular in more recent years. Doing a web search I found lots of papers describing something similar to what I did.This is also a nice example of how the notion of abstraction in computing and the notion of abstraction in mathematics are exactly the same thing. But I'm getting ahead of myself.So here a direct translation from he paper of some procedural code to find the largest sum that can be made from a subsequence of a sequence. This will be our first implementation of the problem examined in the second half of the paper:is a running sum that is reset each time it dips below zero, andkeeps track of the best sum so far. We initialiseandwithand. Here's an example of how to ue it:Notice how we prime the algorithm with a starting vector. Thecorresponds to the fact that at the start we've summed over 0 elements and thecorresponds to the fact that we want the first sum computed to be the highest so far at that point.Test the code with. We'll use a similar pattern all the way through this code.You may see the problem with making this parallelisable: we are maintaining running sums so that the final values ofandall depend on what was computed earlier. It's not obvious that we can break this up into pieces.computes sums of subsequences between resets but chopping the arrayinto pieces might split such subsequences between processors. How can we handle this cleanly?The first step is to write version two of the above function usinginstead of conditionals:But that doesn't appear to make things any easier, we've just buried the conditionals inside, it doesn't make the serial dependency go away.So let's solve another problem instead. InI'll replacewith addition and addition with multiplication.is the identity for addition so we should replace it with the identity for multiplication,. Similarly,is the identity forso we should replace that with. We get:That's all very well but (1) it looks no more parallelisable and (2) it's solving the wrong problem. Let's ignore problem (2) for now.The thing that makeseasier to work with is that it's almost a linear function acting on the vector. Linear functions have one very nice property. If f and g are linear then we can compute f(g(x)) by acting with g first, and then applying f. But we can also compose f and g without reference to x giving us another linear function. We only have to know how f and g act on basis elements and we can immediately compute howacts on basis elements. This is usually expressed by writing f and g as matrices. So let's tweakso it's linear in its last argument:So now I need to write some code to work with linear functions. I'll do it in a very direct style. Here are some tuples representing basis vectors. (I could have written some fancy vector/matrix code but I don't want to distract from the problem in hand.)And here's some code that computes how a function acts on a basis, in effect finding the matrix for our function with respect to the basisSome simple operations on vectors:And now a little function that, given howacts on basis elements, can applyto any vector:Now we can redo the calculation withby first making the matrix for, and then applying that to our starting vector.Note how by time we've computedwe've done almost all of the work even though the code hasn't yet touchedBut now comes the first bit of magic. We can split our listinto pieces. Compute the corresponding matrix for each piece on a separate processor, and then apply the resulting matrices to our starting vector.Let's chop our list of reals into pieces of sizeWe'll use pieces of size 10:The followings should be replaced with a parallel version. It's easy to do this.Great, we've successfully parallelised the code, but it's the wrong algorithm. How can we use this to solve the correct problem? Remember how we replacedwith addition and addition with multiplication. We just have to undo that. That's all! Everything required to prove that the above parallelisation is valid applies over any semiring . At no point did we divide or subtract, and we only used elementary properties of numbers like a*(b+c) = a*b+a*c. That property holds forand addition. In fact a+max(b,c) = max(a+b,a+c). We don't even have to modify the code. We can just define the max-plus semiring as follows:And now all we need is(I wonder if ghc is smart enough to completely eliminate that, after all, on a newtype,should do zero work.)And that's a completely parallelised version of the original algorithm.There is a ton of optimisation that can be performed here. In particular,applies a function to a fixed basis. For the particular function we're using here there's a big win from constant folding. The same constant folding applies in any semiring.And back to the point I made at the beginning. By abstracting from the reals to a general semiring we are able to make the same code perform multiple duties: it can work on functions linear over many semirings, not just the reals. Mathematicians don't work with abstractions just to make life hell for students - they do so because working with abstract entities allows the same words to be reused in a valid way in many different contexts. This benefits both mathematicians and computer scientists.Here's a link you can use to find out more on this technique.Note that in real world usage you wouldn't use lists. -|chop|- would take longer than the rest of the algorithm.PS A curious aside. I spent ages trying to get ghc to compile this code and getting my homebrew HTML markup code to work reliably on it. But eventually I located the problem. I've just returned from Hawai`i where I wrote most of this code. Just for fun I'd put my keyboard into Hawai`ian mode and forgot about it. When I did that, the single quote key started generating the unicode symbol for the Hawai`ian glottal stop. It looks just like a regular single quote in my current terminal font so it was hard to see anything wrong with the code. But, quite reasonably, ghc and many other tools barf if you try to use one where a regular quote is expected.