Why Ocaml?

In Fall 2002, I started a project which involved a good mix of string processing, simple statistics and some simple data structures like hash tables and trees. A preliminary prototype in matlab was very slow, so I wanted to look for a more suitable language to implement it in. My desiderata are listed below.

The language should

Have an intepreter for rapid prototyping, ease of debugging, and maximum fun.

Have a native code (not just byte code) compiler that produces fast code that can be run stand-alone or called from the interactive environment.

Have good support for vectors, multi-dimensional arrays, strings, hash tables, etc. in the standard library.

Have a free implementation.

Work under linux and windows. (so I can transfer code easily between my desktop and my laptop.)

ML vs Lisp Popularity/ familiarity . Lisp is more widely known/used than ML (especially in AI). There is a lot of code already written in Lisp.

. Lisp is more widely known/used than ML (especially in AI). There is a lot of code already written in Lisp. Type checking . ML is statically type checked (unlike lisp), which reduces errors and improves efficiency. Although Lisp allows one to declare types to improve efficiency, it is a bit ugly and not as powerful as ML. In addition, ML has type inference, which means it is not necessary to explicitly declare types. (The CMU CL compiler also does some type inference.)

. ML is statically type checked (unlike lisp), which reduces errors and improves efficiency. Although Lisp allows one to declare types to improve efficiency, it is a bit ugly and not as powerful as ML. In addition, ML has type inference, which means it is not necessary to explicitly declare types. (The CMU CL compiler also does some type inference.) Compilers . For ML, there are two free compilers: The Standard ML of New Jersey and Ocaml. The Ocaml compiler is somewhat more efficient than the SML/NJ compiler (see "Do you blow SML/NJ's socks off?"). In addition, Ocaml comes with some excellent libraries, and support for objects, making it preferable to SML in my opinion. For Lisp, there are several compilers. Allegro lisp compiler is expensive. GNU Clisp compiler is free and portable, but has poor floating point performance. CMU common lisp is free and has good floating point performance, but only has a unix port.

. For ML, there are two free compilers: The Standard ML of New Jersey and Ocaml. The Ocaml compiler is somewhat more efficient than the SML/NJ compiler (see "Do you blow SML/NJ's socks off?"). In addition, Ocaml comes with some excellent libraries, and support for objects, making it preferable to SML in my opinion. Speed . According to The great computer language shootout, (see also the newer Computer language shootout benchmarks) Ocaml is the second fastest language - slower than C, but faster than C++. No matter how I changed the weights reflecting relative importance of speed, memory usage, lines of code, mathematical vs string processing, etc., it always came out in the top 3. I was skeptical, but the same results hold true in the Win32 version of the shootout, implemented independently.

. According to The great computer language shootout, (see also the newer Computer language shootout benchmarks) Ocaml is the second fastest language - slower than C, but faster than C++. No matter how I changed the weights reflecting relative importance of speed, memory usage, lines of code, mathematical vs string processing, etc., it always came out in the top 3. I was skeptical, but the same results hold true in the Win32 version of the shootout, implemented independently. Syntax. I have not yet gotten used to lisp syntax (it is said that lisp stands for "lost in superfluous parentheses"). On the other hand, Ocaml also has a few quirks, e.g., one must remember to write +. for real addition and + for integer addition. However, this seems quite natural. More importantly, people claim Lisp's macros can be used to define fancy syntactic sugar. Ocaml also has a preprocessor, but I haven't learned how to use it yet. Speed of OCaml The benchmarks above suggests the Ocaml compiler generates the second fastest code of any of the currently available compilers (gcc and the Intel C compilers being first). Given that Ocaml is also a beautiful language to program in, this is pretty compelling. But maybe the benchmarks are unreliable? See eg Ocaml is only fast if used imperatively, Slashdot 14 March 2005. This is possible. However, I found several other favorable reports on Ocaml's performance. e.g., this example, which implements the Sieve of Eratosthenes for computing primes in Ocaml and C. The Ocaml code is faster, even though the C code is well-written.

In addition, I found this quote from Doug McClain, on a detailed comparative study of C++, IDL, Fortran, SML, Ocaml, Dylan, Erlang, Clean, Haskell, Lisp, Mathematica for scientific computing: "And most importantly, the CAML version works, and it works properly every time. I am assured, having monitored its runtime behavior that there are no memory leaks. Furthermore, the quality of code generated by the CAML compiler has been analyzed by the Intel VTune system and it show no pipeline stalls, maximum parallelism between integer and floating point units, and machine assembly code that is as good or better than can be achieved by hand coding."

So I did my own experiment. It involved a lot of simple floating point arithmetic, plus some string matching. I found the following speedups relative to intrepreted matlab 6.1: Ocaml native code compiler: 10 times faster, Ocaml bytecode compiler: 2 times faster, Matlab mcc compiler: 1.4 times faster. (The matlab code has 670 lines of code, the equivalent Ocaml code has 989 lines.)

Ocaml links

Comparison of other languages

Going back to Matlab...