On the Madness of Optimizing Compilers

There's the misconception that the purpose of a compiler is to generate the fastest possible code. Really, it's to generate working code--going from a source language to something that actually runs and gives results. That's not a trivial task, mapping any JavaScript or Ruby or C++ program to machine code, and in a reliable manner.

That word any cannot be emphasized enough. If you take an existing program and disassemble the generated code, then it's easy to think "It could have been optimized like this and like that," but it's not a compiler designed for your program only. It has to work for all programs written by all these different people working on entirely different problems.

For the compiler author, the pressure to make the resultant programs run faster is easy to succumb to. There are moments, looking at the compiled output of test programs, where if only some assumptions could be made, then some of these instructions could be removed. Those assumptions, as assumptions tend to be, may look correct in a specific case, but don't generalize.

To give a concrete example, it may be obvious that an object could be allocated on the stack instead of the heap. To make that work in the general case, though, you need to verify that the pointer to the object isn't saved anywhere--like inside another object--so it outlives the data on the stack. You can trace through the current routine looking for pointer stores. You can trace down into local functions called from the current routine. There may be cases where the store happens in one branch of a conditional, but not the other. As soon as that pointer is passed into a function outside of the current module, then all bets are off. You can't tell what's happening, and have to assume the pointer is saved somewhere. If you get any of this wrong, even in an edge case, the user is presented with non-working code for a valid program, and the compiler writer has failed at his or her one task.

So it goes: there are continual, tantalizing cases for optimization (like the escape analysis example above), many reliant on a handful of hard to prove, or tempting to overlook, restrictions. And the only right thing to do is ignore most of them.

The straightforward "every program all the time" compiler is likely within 2-3x of the fully optimized version (for most things), and that's not a bad place to be. A few easy improvements close the gap. A few slightly tricky but still safe methods make up a little more. But the remainder, even if there's the potential for 50% faster performance, flat out isn't worth it. Anything that ventures into "well, maybe not 100% reliable..." territory is madness.

I've seen arguments that some people desperately need every last bit of performance, and even a few cycles inside a loop is the difference between a viable product and failure. Assuming that's true, then they should be crafting assembly code by hand, or they should be writing a custom code generator with domain-specific knowledge built-in. Trying to have a compiler that's stable and reliable and also meets the needs of these few people with extreme, possibly misguided, performance needs is a mistake.

(If you liked this, you might enjoy A Forgotten Principle of Compiler Design.)

permalink March 7, 2016

previously