Adventures in Engineering

The wanderings of a modern ronin.



After

listPrimes :: Integer -> [Integer] -> Integer -> [Integer] listPrimes n knownPrimes m | m > n = knownPrimes | not (any (== 0) (map (m `mod`) knownPrimes)) = listPrimes n (m:knownPrimes) (m+2) | otherwise = listPrimes n knownPrimes (m+2)

This generates all primes less than the number n, starting at the number m, and using the initial list of primes given in the "knownPrimes" list. m is assumed to be odd and greater than 2, and only odd numbers less than n are checked for primality. The full list of generated primes is returned.



For example, listPrimes 100 [2] 3 gives [97,89,83,79,73,71,67,61,59,53,47,43,41,3 7,31,29,23,19,17,13,11,7,5,3,2] . (1 is not on the list, because we're assuming that you already know that every number is evenly divisible by one.)





The (map (m `mod`) knownPrimes)) bit is probably a little baffling to anyone without experience in a functional language. map is one of those things that is very common in the functional paradigm, but imperative programmers have rarely seen anything similiar. In functional languages, functions are first-class objects, which means you can use them in many of the same ways you use data values. In particular, you can create a temporary function, and pass it into a subroutine, just like you would with an int or pointer. The subroutine can then use the passed in function however it wants. (If you've ever used C's qsort(), you may have seen this in action.) map makes extensive use of the ability to pass in a function as a parameter to a function. Map takes two parameters: a function, and a list. It applies the function to each item in the list, thus computing a new value for each item. Then the new values are put into a list and returned. So in imperative syntax, what it's saying is: "for each item on the knownPrimes list, compute m mod list item, and return me a list of resulting values."



Now, you may ask, how does this help us? Well, the whole idea behind the Seive of Erastosthenes is that a number M is prime if and only if it is NOT evenly divisible by any prime number smaller than sqrt(M). So, if M mod (any item in knownPrimes) == 0, then M is evenly divisible by one of the primes on the list, and therefore M is not prime. And finally, the (any (== 0) ) thing is similiar in concept to map . It uses the given function ("equals 0"), upon each item in the list (each one an "M mod knownPrime" value) and sees if any of the values on the list are 0. If so, the number is NOT prime. So in order to see if the number IS prime, we need to not the value of the any "M mod knownPrime" == 0 computation. If the number M is prime, we add it to our list of primes with the "newNum : previousList" syntax, and go on the the next M. Otherwise, we just go on to the next M without adding anything to the list of knownPrimes.





The Sieve of Erastosthenes is supposed to be a very efficient way to generate prime numbers, as prime number generation algorithms go. So I was surprised at how slow it was when I tested the new code. You may remember that after a few elementary optimizations, the old, naive, "check possible prime against every odd number < sqrt(possible prime)" code ran pretty quickly on numbers less than 15 digits. The new code - which in theory is much smarter for generating all primes less than sqrt(possible prime), and then testing the possible prime only against those - is actually about an order of magnitude slower.



And I suppose that's not too surprising, if you think about it. "Check every number < sqrt(n)" has roughly a O(sqrt(n)) runtime. But generating the entire list of primes less than sqrt(n) with the Sieve of Erastosthenes means that we have to check every odd number from 3 to sqrt(n) against ln(n) items that are already on the prime list. So it turns out to be about O(sqrt(n) * ln(n)) runtime, and with much larger constant factors on every computation due to running down the list, cons'ing a new item onto the list occasionally, and so on. Here are the actual performance tests:





62615533 - 7 secs confirm non-prime. (Previously - before my finger came off the enter key)



15485863 - 2 secs confirm prime. (Previously - before my finger came off the enter key)



101364623 - 7 secs confirm non-prime. (Not previously tested.)



275604541 - 16 secs confirm prime. (Previously - maybe 200 ms.)



6092426891 - 5 mins confirm non-prime. (Not previously tested. Big jump in perceived time!)



5915587277 - 7 mins confirm prime. (Not previously tested.)



10967535067 - 10 mins confirm non-prime. (Previously - 4 seconds.)



10113352331 - 9 mins confirm prime. (Not previously tested.)





All in all, very interesting results.



Also, I'd like to say thanks to all the smart folks over at irc.freenode.net

After my last adventure in prime testing, I decided that I should implement the Sieve of Erastosthenes and give that a whirl. There's full code at the Haskell pastebin if you're morbidly curious. But most of it is the Newton-Rapheson approximate square root routines. The entire Sieve of Erastosthenes function is only 5 lines long:This generates all primes less than the number n, starting at the number m, and using the initial list of primes given in the "knownPrimes" list. m is assumed to be odd and greater than 2, and only odd numbers less than n are checked for primality. The full list of generated primes is returned.For example,gives. (1 is not on the list, because we're assuming that you already know that every number is evenly divisible by one.)Thebit is probably a little baffling to anyone without experience in a functional language.is one of those things that is very common in the functional paradigm, but imperative programmers have rarely seen anything similiar. In functional languages, functions are first-class objects, which means you can use them in many of the same ways you use data values. In particular, you can create a temporary function, and pass it into a subroutine, just like you would with an int or pointer. The subroutine can then use the passed in function however it wants. (If you've ever used C's qsort(), you may have seen this in action.)makes extensive use of the ability to pass in a function as a parameter to a function. Map takes two parameters: a function, and a list. It applies the function to each item in the list, thus computing a new value for each item. Then the new values are put into a list and returned. So in imperative syntax, what it's saying is: "for each item on the knownPrimes list, compute m mod list item, and return me a list of resulting values."Now, you may ask, how does this help us? Well, the whole idea behind the Seive of Erastosthenes is that a number M is prime if and only if it is NOT evenly divisible by any prime number smaller than sqrt(M). So, if M mod (any item in knownPrimes) == 0, then M is evenly divisible by one of the primes on the list, and therefore M is not prime. And finally, thething is similiar in concept to. It uses the given function ("equals 0"), upon each item in the list (each one an "M mod knownPrime" value) and sees if any of the values on the list are 0. If so, the number is NOT prime. So in order to see if the number IS prime, we need tothe value of thecomputation. If the number M is prime, we add it to our list of primes with the "newNum : previousList" syntax, and go on the the next M. Otherwise, we just go on to the next M without adding anything to the list of knownPrimes.The Sieve of Erastosthenes is supposed to be a very efficient way to generate prime numbers, as prime number generation algorithms go. So I was surprised at how slow it was when I tested the new code. You may remember that after a few elementary optimizations, the old, naive, "check possible prime against every odd number < sqrt(possible prime)" code ran pretty quickly on numbers less than 15 digits. The new code - which in theory is much smarter for generating all primes less than sqrt(possible prime), and then testing the possible prime only against those - is actually about an order of magnitude slower.And I suppose that's not too surprising, if you think about it. "Check every number < sqrt(n)" has roughly a O(sqrt(n)) runtime. But generating the entire list of primes less than sqrt(n) with the Sieve of Erastosthenes means that we have to check every odd number from 3 to sqrt(n) against ln(n) items that are already on the prime list. So it turns out to be about O(sqrt(n) * ln(n)) runtime, and with much larger constant factors on every computation due to running down the list, cons'ing a new item onto the list occasionally, and so on. Here are the actual performance tests:62615533 - 7 secs confirm non-prime. (Previously - before my finger came off the enter key)15485863 - 2 secs confirm prime. (Previously - before my finger came off the enter key)101364623 - 7 secs confirm non-prime. (Not previously tested.)275604541 - 16 secs confirm prime. (Previously - maybe 200 ms.)6092426891 - 5 mins confirm non-prime. (Not previously tested. Big jump in perceived time!)5915587277 - 7 mins confirm prime. (Not previously tested.)10967535067 - 10 mins confirm non-prime. (Previously - 4 seconds.)10113352331 - 9 mins confirm prime. (Not previously tested.)All in all, very interesting results.Also, I'd like to say thanks to all the smart folks over at irc.freenode.net #haskell for their help, support and patience with my clueless n00b questions. Post A Comment | 8 Comments | Share | Flag | Link









You might want to check the primes benchmark, http://www.cse.unsw.edu.au/~dons/code/nobench/spectral/primes/

for some ideas for other sieves.



-- dons Reply | Thread | Link





Thanks, I'll check that out. Be interesting to see how other people have done it.



I suspect, however, that I'm done with prime testing... about time I started playing with monads and IO. Reply | Parent | Thread | Link





I was sceptical that you had somehow unearthed a flaw in one of the most ancient algorithms so I did some poking around and found this[1] paper on how people who think that they are implementing the Sieve in Haskell typically are not. I typed in the algorithm from the paper and got the speedup I expected.



http://www.mail-archive.com/haskell-cafe@haskell.org/msg20899.html

Reply | Parent | Thread | Link





Excellent link, thanks! As soon as I'm done moving (probably monday) I'll go back and make the changes. Still not sure if I need to make a list with all the numbers 4-sqrt(n) and then sieve it or what, but the paper will probably tell me... Reply | Parent | Thread | Link





That was me. Looks like LJ lost my cookie. Reply | Parent | Thread | Link





You can probably gain a bit of time by not calculating the square root. I imagine your condition looks something like

filter (< (sqrt x)) [1..]

or so, but you could use something like this instead:

filter (\y -> y * y < x) [1..]

That way, you're doing one fast operation -- multiplication -- instead of a square root approximation. Saves some on the code, too.



~d Reply | Thread | Link





I only calculate the sqrt once per invocation of isPrime, and The Newton-Rapheson approximate sqrt I coded up is extremely light weight. approximateSqrt only runs for numDigits N `div` 2 iterations, and each iter is only three ops: an Integer div, an Integer add, and an Integer div by 2 (which I assume GHCI is optimizing to a shift). So, on a 15 digit prime, I'm doing something on the order of 7 * 3 integer ops to get my estimated sqrt.



And no, I'm not using sqrt in a filter anywhere. In fact, my code doesn't even use filter. Someone told me that I could potentially break out of a list traversal way early if I used not . any instead of filter. So that's exactly what I did. Reply | Parent | Thread | Link





Old code in the pastebin! I'll annotate that soon... Reply | Parent | Thread | Link





browse May 2015