Tim Bray is spreading more misinformation about tail recursion. He describes it this way:

It looks like a subroutine call, but in the case where it occurs as the last thing in the routine, it magically, silently, and automatically gets turned into, now how did I put it? “A highly controlled and structured GOTO.”

A tail-call is a subroutine call. The efficient implementation does not magically transformed into something else; if it doesn't create a stack frame on such a call, it's because one simply isn't relevant.

The essential observation behind the efficient-tail-call implementation (not "optimization"—more on which in a moment) is as follows: For most programming languages, a stack frame is needed not for a subroutine call but only for an argument evaluation, that is, an evaluation whose result is temporary and needs further processing. Calls in the middle of a procedure are "argument" evaluations, because their results need further processing. It's really the temporary, non-final natural of the result that forces us to do the book-keeping that remembers where to come back to.

Another confusion is between semantics and cost model:

Call me old-fashioned, but it just seems wrong when the semantics of two identical function invocations vary wildly as a function of whether there’s an executable statement between the invocation and the function exit.

The semantics of the call doesn't change; the result and the side-effects are the same (that's what's usually meant by "semantics" anyway). The cost, on the other hand, might be quite different depending on whether a stack frame is needed.

Unfortunately, efficient tail recursion has often been described as a transparent "optimization," so that it might or might not be efficient and the programmer can't tell in advance.

Efficient tail calls (or space-consuming ones) really should be part of the official "cost model" of the language, something that goes along with the semantics, as a peer to the semantics, in fact. The cost model tells you how expensive you can expect things to be, but should be a little less binding than the semantics, since language implementors should have some freedom to do additional optimizations.

The idea that stack frames should correspond directly to the call structure is just odd. Maybe we want to know the call structure at runtime; in that case, we should capture that as debugging information, or as a reflection feature of the runtime system, but not as a core language design. Let the language implementation use the stack as efficiently as possible!

To summarize: