A Ramble Through Erlang IO Lists

The IO List is a handy data type in Erlang, but not one that's often discussed in tutorials. It's any binary. Or any list containing integers between 0 and 255. Or any arbitrarily nested list containing either of those two things. Like this:

[10, 20, "hello", <<"hello",65>>, [<<1,2,3>>, 0, 255]]

The key to IO lists is that you never flatten them. They get passed directly into low-level runtime functions (such as file:write_file ), and the flattening happens without eating up any space in your Erlang process. Take advantage of that! Instead of appending values to lists, use nesting instead. For example, here's a function to put a string in quotes:

quote(String) -> $" ++ String ++ $".

If you're working with IO lists, you can avoid the append operations completely (and the second "++" above results in an entirely new version of String being created). This version uses nesting instead:

quote(String) -> [$", String, $"].

This creates three list elements no matter how long the initial string is. The first version creates length(String) + 2 elements. It's also easy to go backward and un-quote the string: just take the second list element. Once you get used to nesting you can avoid most append operations completely.

One thing that nested list trick is handy for is manipulating filenames. Want to add a directory name and ".png" extension to a filename? Just do this:

[Directory, $/, Filename, ".png"]

Unfortunately, filenames in the file module are not true IO lists. You can pass in deep lists, but they get flattened by an Erlang function ( file:file_name/1 ), not the runtime system. That means you can still dodge appending lists in your own code, but things aren't as efficient behind the scenes as they could be. And "deep lists" in this case means only lists, not binaries. Strangely, these deep lists can also contain atoms, which get expanded via atom_to_list .

Ideally filenames would be IO lists, but for compatibility reasons there's still the need to support atoms in filenames. That brings up an interesting idea: why not allow atoms as part of the general IO list specification? It makes sense, as the runtime system has access to the atom table, and there's a simple correspondence between an atom and how it gets encoded in a binary; 'atom' is treated the same as "atom". I find I'm often calling atom_to_list before sending data to external ports, and that would no longer be necessary.

permalink June 13, 2010

previously