How to Crash Erlang

Now that's a loaded title, and I know some people will immediately see it as a personal slam on Erlang or ammunition for berating the language in various forums. I mean neither of these. Crashing a particular language, even so-called safe interpreted implementations, is not particularly challenging. Running out of memory or stack space are two easy options that work for most languages. There are pathological cases for regular expressions that may not truly crash, but result in such an extended period of unresponsiveness on large data sets that the difference is moot. In any language that allows directly linking to arbitrary operating system functions...well, that's just too easy.

Erlang, offering more complex features than many languages, has some particularly interesting edge cases.

Run out of atoms. Atoms in Erlang are analogous to symbols in Lisp--that is, symbolic, non-string identifiers that make code more readable, like green or unknown_value --with one exception. Atoms in Erlang are not garbage collected. Once an atom has been created, it lives as long as the Erlang node is running. An easy way to crash the Erlang virtual machine is to loop from 1 to some large number, calling integer_to_list and then list_to_atom on the current loop index. The atom table will fill up with unused entries, eventually bringing the runtime system to halt.

Why is this is allowed? Because garbage collecting atoms would involve a pass over all data in all processes, something the garbage collector was specifically designed to avoid. And in practice, running out of atoms will only happen if you write code that's generating new atoms on the fly.

Run out of processes. Or similarly, "run out of memory because you've spawned so many processes." While the sequential core of Erlang leans toward being purely functional, the concurrent side is decidedly imperative. If you spawn a non-terminating, unlinked process, and manage to lose the process id for it, then it will just sit there, waiting forever. You've got a process leak.

Flood the mailbox for a process. This is something that most new Erlang programmers do sooner or later. One process sends messages to another process without waiting for a reply, and a missing or incorrect pattern in the receive statement causes the receiver to ignore all messages...so they keep piling up until the mailbox fills all available memory, and that's that. Another reminder that concurrency in Erlang is imperative.

Create too many large binaries in a single process. Large--greater than 64 byte--binaries are allocated outside of the per-process heap and are reference counted. The catch is that the reference count indicates how many processes have access to the binary, not how many different pointers there are to it within a process. That makes the runtime system simpler, but it's not bulletproof. When garbage collection occurs for a process, unreferenced binaries are deleted, but that's only when garbage collection occurs. It's possible to create a large process with a slowly growing heap, and create so much binary garbage that the system runs out of memory before garbage collection occurs. Unlikely, yes, but possible.

permalink June 15, 2009

previously