Benchmarking ray tracing, Haskell vs. OCaml

Use the type Vector to represent vectors instead of a tuple. This allows the components to be strict. Use the type Scene instead of a tuple to represent a scene. The tuple used in the OCaml code uses the dubious feature of equi-recursive types (even Xavier thinks it's strange enough to have a flag to enable it). Rewrite the loop that computes a pixel's value using an accumulating updatable variable into a list comprehension that sums the list. Finally, the compiler flags needed a bit of tweaking to get good performance, even though " -O3 -funbox-strict-fields -fexcess-precision -optc-ffast-math " were pretty obvious.

+

-

On his web site about OCaml Jon Harrop has a benchmark for a simple ray tracing program written in a number of languages. When I saw it I wondered why Haskell was doing so badly, especially since this benchmark was taken as some kind of proof that Haskell performs poorly in "real life". So I have rerun the benchmarks. I've also rewritten the Haskell versions of the programs. The Haskell versions on Jon's web site were OK, but they were too far from the OCaml versions for my taste. I prefer to keep the programs very similar in a situation like this. My rewrite of the benchmarks from OCaml to Haskell was done with a minimum of intelligence. Here are the only things I did that needed creative thought:In addition to this I made the code look a little more Haskellish, e.g., using overloading to allowandon vectors. This is really just minor syntactic changes, but makes the code more readable.

To make the program size comparison fair I removed some dead code from the OCaml code.

I then reran the benchmarks using Haskell, OCaml, and C++.

The benchmarks are five programs that starts from very simple ray tracing and specializing the program more and more to speed it up.

The numbers are in the tables below. The time is execution time in second, the characters are non-white characters in the file, and the lines are the number of lines in the file. To ease comparison I also include the relative numbers compared to OCaml (smaller numbers are better).

Interestingly, and unlike Jon's benchmark, the Haskell code is always smaller than the OCaml code. Furthermore, the Haskell code ranges from much faster to slightly faster than the OCaml code. Again, this is very unlike Jon's benchmark. I find the unoptimized version of the benchmark especially interesting since Haskell is 5 times(!) faster than OCaml. I've not investigated why, but I suspect laziness.

Results

Haskell: My Haskell code compiled with ghc 6.8.1

Haskell old: Jon's Haskell code, compiled with ghc 6.8.1

Haskell old 6.6: Jon's Haskell code, compiled with ghc 6.1.1

OCaml: Jon's OCaml code

C++: Jon's C++ code

Time: execution time is second

Char: number of non-white chracters in the program

Lines: number of lines in the program

Rel T: execution time relative to OCaml

Rel C: non-white characters relative to OCaml

Rel L: lines relative to OCaml

Mem: Maximum resident memory

ray1 Time Chars Lines Rel T Rel C Rel L Mem Haskell 15.3 1275 51 0.202 0.990 1.020 5M Haskell, old 15.8 1946 88 0.208 1.511 1.760 9M Haskell, old 6.6 28.1 1946 88 0.370 1.511 1.760 9M OCaml 75.9 1288 50 1.000 1.000 1.000 18M C++ 8.1 2633 122 0.106 2.044 2.440 8M

The programs, ray1-ray5, are variations on the ray tracer as given on Jon's web site. I've used the same size metrics as Jon does.

ray2 Time Chars Lines Rel T Rel C Rel L Mem Haskell 11.5 1457 50 0.206 0.912 0.943 12M Haskell, old 12.0 2173 99 0.215 1.360 1.868 35M Haskell, old 6.6 21.1 2173 99 0.379 1.360 1.868 35M OCaml 55.8 1598 53 1.000 1.000 1.000 15M C++ 6.1 3032 115 0.108 1.897 2.170 8M

ray3 Time Chars Lines Rel T Rel C Rel L Mem Haskell 9.7 1794 62 0.970 0.919 0.939 12M Haskell, old 11.1 2312 103 1.112 1.184 1.561 35M Haskell, old 6.6 19.7 2312 103 1.984 1.184 1.561 35M OCaml 10.0 1953 66 1.000 1.000 1.000 15M C++ 5.4 3306 143 0.545 1.693 2.167 8M

ray4 Time Chars Lines Rel T Rel C Rel L Mem Haskell 8.5 1772 66 0.985 0.867 0.957 12M Haskell, old 11.7 2387 110 1.360 1.168 1.594 36M Haskell, old 6.6 19.2 2387 110 2.235 1.168 1.594 35M OCaml 8.6 2043 69 1.000 1.000 1.000 11M C++ 5.0 3348 149 0.584 1.639 2.159 8M

ray5 Time Chars Lines Rel T Rel C Rel L Haskell 7.0 2246 95 0.999 0.878 0.950 OCaml 7.0 2559 100 1.000 1.000 1.000 C++ 4.7 3579 142 0.674 1.399 1.420

The source code is available in a Darcs repository.

Software and hardware details

Hardware: MacBook, Intel Core Duo 2GHz, 2MB L2 Cache, 1GB 667MHz DRAM

Software:

Haskell compiler: ghc-6.8.1

OCaml compiler: 3.10.0

g++: gcc version 4.0.1 (Apple Computer, Inc. build 5367)

ghc: -O3 -fvia-C -funbox-strict-fields -optc-O3 -fexcess-precision -optc-ffast-math -funfolding-keeness-factor=10

OCaml: -rectypes -inline 100 -ffast-math -ccopt -O3

g++: -O3 -ffast-math

Some observations

infinity

epsilon_float

Compilation commands:Target architecture is x86 (even though the processor is x86_64 capable).Haskell should really have the definitions ofandin a library. They are quite useful. Also, having them in a library would have made the Haskell code somewhat shorter and faster.

Converting these programs from OCaml to Haskell was very mechanical; it could almost be done with just sed .

I'm glad version 5 of the benchmark didn't show much improvement, because it's a really ugly rewrite. :)

Note that the code is all Haskell98, no strange extensions (even though -funbox-strict-fields deviates subtly from H98).

Conclusion

Benchmarking is tricky. I'm not sure why my and Jon's numbers are so different. Different hardware, slightly different programs, different software.

Haskell is doing just fine against OCaml on this benchmark; the Haskell programs are always smaller and faster.

Edit: Updated tables with more numbers.

PS: Phil Armstrong wrote the Haskell code on Jon's web site and I took some code from his original.

Labels: Benchmark, Haskell