Thinking functionally with performance in mind

The Sieve Eratosthenes is one of the oldest and perhaps simplest ways known to compute the prime numbers. The algorithm is as follows:

Start with a prime number and eliminate all multiples of it Find the first number not eliminated, mark it as prime Eliminate all multiples of that number Repeat 2–3 until there are no multiples of the first number left to eliminate. The remaining numbers are the primes.

We can visualize it like this for the first 120 primes:

A Naive First Attempt with Streams

For those familiar with functional programming this appears to be just a series of filters. We filter by multiples of two. Then we filter by multiples of three, then five and so on. In effect we’re chaining filters like so:

stream.filter(_ % 2 != 0)

.filter(_ % 3 != 0)

.filter(_ % 5 != 0)

...

.filter(_ % n != 0)

Here 3 and 5 are just the first elements of our stream after our prior filter, as is n. With this knowledge we know we’re just filtering the tail of the list by elements divisible by the head over and over like so:

stream.tail.filter(_ % stream.head != 0)

This fits quite naturally into a recursive definition. With this we can build up an infinite stream of prime numbers in just one line of code:

Here we just start with the head of our stream and append a recursive call with the filtered tail of our stream. We start with only the odd numbers (Stream.from(3, 2)) as a minor performance enhancement. We can use this to calculate the first few prime numbers like so:

primeStream().take(5).toList == 3, 5, 7, 11, 13

This is very elegant and works, but is painfully slow and susceptible to stack overflows. Generating the primes up to 1 million takes several minutes and requires increasing the Java stack space.

The other big problem with what we’ve implemented is it is not the actual sieve!

This is Not the Sieve You’re Looking for

Those who have expertise in this problem will point out what we have implemented isn’t actually the Sieve of Eratosthenes. What we have actually implemented is trial division. The reason this isn’t a true sieve is when we do a filter we have to consider all values in the stream, not just the multiples of a prime.

When we filter by 3 for example, we have to check 5%3, 7%3, 8%3 and so on. This actually gives us a very inefficient algorithm. We can avoid these extra comparisons by only considering multiples of prime numbers while still using streams. Thus directly eliminating 6, 9, 12 … n * 3 and creating a true sieve.

A Functional Sieve without Recursion

What if instead of filtering the numbers that aren’t prime, we instead just kept track of them? Then, once we know all the numbers that aren’t prime, it’s a simple set difference to extract the primes.

To do this let’s create a stream of numbers which we’ll need to examine. We’ll create a stream of all the odd numbers up to our highest desired prime. Note here we only go up the the square root of our last number as a performance optimization.

val odds = Stream.from(3, 2).takeWhile(_ <= Math.sqrt(end).toInt)

Now for each element in the stream we just created we’ll need to keep track of each number that is a multiple of it. Assuming we have a number i we can keep track of all its multiples like so:

Stream.from(i * i, 2 * i).takeWhile(_ <= end)

Here we start at the square of i, which is a common performance optimization. We can do this because any multiple below the square will already have been generated.

We want to apply this for every element in our odd number stream. We can do this with a simple flat map and thus we’ve generated a stream of all the composite numbers.

odds.flatMap(i => Stream.from(i * i, 2 * i).takeWhile(_ <= end))

Given we have a stream of all the composite numbers it’s just a matter of taking the difference between all numbers and our composites to give us all the primes, resulting in a final algorithm like so:

This algorithm is much faster than our original trial division sieve. On my 2.3 GHz core i7 MacBook, this can calculate the first primes up to 1 million in about 800 milliseconds as opposed to minutes.

Comparing with Iteration

The question that comes to mind is how well do these functional sieves compare performance-wise to iterative implementations? Let’s rewrite our stream sieve to use iteration instead:

Here instead of generating composites we “mark” them in our primeIndices array. Then once we’ve marked all the composites, we loop through and generate the primes.

It’s a few more lines of code and relies on mutation, but is much faster than our stream implementation. On my machine this can calculate the primes up to 1 million in about 28 milliseconds, about a 30x speedup.

Can Functional be as Fast as Iterative?

It is left as an exercise to the reader to see if you can create a functional-based approach that can come close to, or match, the iterative version.

The answer is you can likely come close, but this comes at a cost of much greater complexity. Sped up versions will generally rely on structures such as heaps or other more complicated techniques. Links describing these techniques are presented below.

Generally speaking, I prefer simplicity. Functional programming is great, but may not always be the appropriate solution when a speedup is necessary and keeping things functional will add unneeded complexity.

Code with tests is available on GitHub:

Further Reading

The Genuine Sieve of Eratosthenes — Examines the complexity of trial division and techniques to generate an efficient algorithm.

Haskell Prime Number Wiki — Examines several techniques to generate an efficient sieve.