A Deeper Look at Tail Recursion in Erlang

The standard "why tail recursion is good" paragraph talks of reducing stack usage and subroutine calls turning into jumps. An Erlang process must be tail recursive or else the stack will endlessly grow. But there's more to how this works behind the scenes, and it directly affects how much code is generated by similar looking tail recursive function calls.

A long history of looking at disassemblies of C code has made me cringe when I see function calls containing many parameters. Yet it's common to see ferocious functions in Erlang, like this example from erl_parse (to be fair, it's code generated by yecc):

yeccpars2(29, __Cat, __Ss, __Stack, __T, __Ts, __Tzr) -> yeccpars2(yeccgoto(expr_700, hd(__Ss)), __Cat, __Ss, __Stack, __T, __Ts, __Tzr);

It's tail recursive, yes, but seven parameters? Surely this turns into a lot of pushing and popping behind the scenes, even if stack usage is constant. The good news is that, no, it doesn't. It's more verbose at the language level than what's really going on.

There's a simple rule about tail recursive calls in Erlang: If a parameter is passed to another function, unchanged, in exactly the same position it was passed in, then no virtual machine instructions are generated. Here's an example:

loop(Total, X, Size, Flip) -> loop(Total, X - 1, Size, Flip).

Let's number the parameters from left to right, so Total is 1, X is 2, and so on. Total enters the function in parameter position 1 and exits, unchanged, in position 1 of the tail call. The value just rides along in a parameter register. Ditto for Size in position 3 and Flip in position 4. In fact, the only change at all is X, so the virtual machine instructions for this function look more or less like:

parameter[2]-- goto loop

Perhaps less intuitively, the same rule applies if the number of parameters increases in the tail call. This idiom is common in functions with accumulators:

count_pairs(List, Limit) -> count_pairs(List, Limit, 0).

The first two parameters are passed through unchanged. A third parameter--zero--is tacked onto the end of the parameter list, the only one of the three that involves any virtual machine instructions.

In fact, just about the worst thing you can do to violate the "keep parameters in the same positions" rule is to insert a new parameter before the others, or to randomly shuffle parameters. This code results in a whole bunch of "move parameter" instructions:

super_deluxe(A, B, C, D, E, F) -> super_deluxe(F, E, D, C, B, A).

while this code turns into a single jump:

super_deluxe(A, B, C, D, E ,F) -> super_deluxe(A, B, C, D, E, F).

These implementation techniques, used in the Erlang BEAM virtual machine, were part of the Warren Abstract Machine developed for Prolog.

permalink November 2, 2007