jim@quickthreadprogramming.com

This article on parallel programming will choose one of those elusive algorithms that upon first glance seem to be neither vectorizable nor parallelizable. The intent of this article is not to address the specific algorithm, but rather to provide you with an approach to problems that share similarities with this algorithm. The elusive algorithm for this article is the inclusive scan:

In: 1 2 3 4 5 6 7 8 …

Out: 1 3 6 10 15 21 28 36 …

Where the output is the sum of the prior output (or 0 for first), and the value of the input. This loop has a temporal dependency that, at first inspection, defies both vectorization and parallelization. This article will describe how you can attain both vectorization and parallelization with results like this: