I just got through creating the release candidates for the next release of Haskell Platform for Mac OS X. This release will come in both 32-bit (i386) and 64-bit (x86_64) versions.

tl;dr: Unless you are writing programs that address over 2G bytes of data at once, install the 32-bit version of the upcoming Haskell Platform.

Given that I had both, it seemed like a good idea to benchmark them somehow, and give people some guidance as to which to install. I choose Johan Tibell’s unordered-containers package, as it has a nice benchmark built with criterion that tests both it, and the common Map and IntMap data types. (And also because he was sitting at the desk next to me at work today, so I could pester him with questions!)

The results were a little surprising: The 64-bit version ran between 0% and 139% slower on most benchmarks, averaging 27% slower. For a small few, it ran a little bit faster (0% to 15%, average 8%). Details and discussion after the break.

Details of what I tested:

Haskell Platform 2011.4.0.0 RC2 – final is out in a week or two

– final is out in a week or two GHC 7.0.4

containers-0.4.0.0

unordered-containers- 0.1.4.3

Hardware: MacBook Pro (2010) Intel Core i5 2.4Ghz, and MacBook Air (2011) Intel Core i7 1.8Ghz — both with 4 GB RAM and SSD disks.

MacBook Pro (2010) Intel Core i5 2.4Ghz, and MacBook Air (2011) Intel Core i7 1.8Ghz — both with 4 GB RAM and SSD disks. Command line: ./benchmark -g -u output.csv +RTS -H

Johan and I discussed this outcome, and I also consulted my resident HW expert (my husband used to be a CPU architect for Intel). On the one hand, code compiled for the 64-bit instruction set should be able to be more efficient in use of HW resources, and be smaller in code size. However, on any given processor, the underlying CPU resources are the same, and so these effects should be moderate (tens of % points). On the other hand, compiling for the ILP64 model (where int , size_t , and pointers are all 64-bits), effectively doubles the size of most Haskell data in memory, increasing memory demand, doubling the scanning time of GC, and making the CPU’s data caches half as effective. Our suspicion is that GHC must not be taking as much advantage of the 64-bit instruction architecture as it could, and that in data heavy (but not overly large) benchmarks like these the memory and cache disadvantages dominate.

It would be interesting to see if these results hold on later versions of GHC, on other OSes, and with other benchmarks.

For those that want the numbers: This chart shows the delta running time on 64-bit over 32-bit as a percentage (positive is slower). MBP and MBA are the two machines I ran these benchmarks on.