Erlang and the Great Computer Language Shootout

Brief Adventures Optimizing Code in a Very High Level Language

James Hague October 23-28, 2002 james.hague@gmail.com

Doug Bagley's Great Computer Language Shootout is a neat idea: a collection of little benchmarks written in several dozen languages for the purposes of comparing speed, memory usage, and lines of code. The concurrent functional language Erlang wasn't doing well in the rankings, so I decided to try to improve the Erlang benchmark performance. Now micro-benchmarks can be misleading or even worthless. Why bother computing the primes between 2 and 8192? We already know what they are! And isn't the point to write clean code, and not to trade increased performance for complexity and obscurity?

The "clean code" aspect is what made this interesting for me. Some of the programs written in other very high level languages didn't look at all like they would if the author were presenting them in an introductory textbook. A pretty version of the code might use higher order functions, for example, but the speed-is-everything code uses unchecked array types and a programming style that makes ML look like C. To me, the point of higher level languages is to leave as much to the computer as possible. Dynamic typing? You bet. Garbage collection? Definitely. No worrying about how to represent an integer, even if that integer is too large to fit in a 32-bit word? Exactly. So I don't see it as a failing that Erlang isn't in the top spot. I see it as an amazing achievement that such an expressive language fares so well.

The following notes are about how I approached some of the benchmarks. Not all of these improvements have been submitted to the shootout yet. Note that Doug stopped maintaining his version of the shootout, but new version lives on.

Array Access Benchmark

This program didn't exist in Erlang, so I wrote one. Array handling is one of the weaker aspects of the current Erlang implementation. Tuples can be indexed and updated like arrays, but the updating is not in place. Changing element 50 of a 1,000,000 element tuple results in a new 1,000,000 element tuple. I suspect this is why no one attempted the array access benchmark in Erlang.

My first attempt was to use the process dictionary, which is just a hash table. The benchmark uses two arrays, x and y, so I chose to represent an array element as {x,5}, for example. This put the benchmark on the map, but unfortunately the performance was down at the bottom of the list with TCL.

The benchmark works like this: fill array x with integers, fill array y with zeroes, then add array x to array y one element at a time. Using the process dictionary, the array addition step required three hashes per element: one to read a value from x, one to read a value from y, and one to write the sum back into y. By switching from the process dictionary to an Erlang Term Storage (ETS) table, I was able to use the ets:update_counter function to reduce this to two hashes per element. This resulted in a 30% speedup, moving Erlang up past Ruby and a few other languages.

Finally, I noticed that the x array was never updated after being initialized. If it were a tuple, then it could be indexed directly with element/2 , dropping to only one hash per element. This reduced the y = y + x loop to:

calc(X, 0) -> ok; calc(X, N) -> ets:update_counter(y, N, element(N, X)), calc(X, N - 1).

I waffled a bit over whether or not this met the spirit of the benchmark, but I think it does. The critical portion of the benchmark, the above loop, is implemented with indexing, not using lists. This version took only 1/3 the execution time of the original, putting Erlang ahead of Guile, Perl, Icon, and Python, which is respectable for a language with poor array handling.

Hash Access Benchmark

This is an odd benchmark, in that half of the program is a format_hex function, and there are list_to_atom calls in both loops. It took me a while realize that the list_to_atom calls were only there because an atom is presumably the closest analog to a hash key in languages like Perl. But in Erlang you can use any data object as the key for an ETS table. It's faster to use a list like "1234" as a key directly, than it is to convert it to the atom '1234' first, at least in this benchmark. Getting rid of all the atom construction sped up the program by 400%!

Hashes, Part II

The key to this benchmark turned out to be, surprisingly, the following expression:

lists:append("foo_", integer_to_list(I))

lists:append , if you peek into the lists module, is simply defined as:

append(L1, L2) -> L1 ++ L2.

This is because the "++" operator was a later addition to the language. The first expression above can be written directly as:

"foo_" ++ integer_to_list(I)

Though I wouldn't have guessed it, switching to the latter version significantly speeds up the program. In addition to avoiding an inter-module function call, I can only assume that the compiler generates special code to handle appending to a constant list.

Sieve of Erathostenes

This is a particularly interesting benchmark in a functional language, because it is written using lists rather than an array of booleans. Instead of marking all multiples of a number as not prime in an array, a new list is created which simply does not contain those multiples. Not surprisingly, all of the heavy list rewriting makes this benchmark very dependent on the speed of memory management and garbage collection. Other than removing a bit of unnecessary consing, I initially didn't make much progress on speeding this benchmark up. Even some cheating, like just counting primes instead of building a list of them and getting the length of that list at the end, didn't make a noticeable difference. Neither did caching the initial list of 8191 values between iterations. Run times varied widely, too.

A neat realization, though, was that the six line sieve function which removes all multiples of a number from a list, can be replaced with a concise list comprehension: