This is an attempt to throw a valuable debugging heuristic into the ether where future Google searches will see it.

Yesterday, my friend and regular A&D commenter Jay Maynard called me about a bug in Hercules, an IBM360 emulator that he maintains. It was segfaulting on interpretation of a particular 360 assembler instruction. But building the emulator with either -g for symbolic debugging or its own internal trace facility enabled made the bug go away.

This is thus a classic example of heisenbug, that goes away when you try to observe or probe it. When he first called, I couldn’t think of anything helpful. But there was a tickle in the back of my brain, some insight trying to break into full consciousness, and a few minutes later it succeeded.

I called Jay back and said “Turn off your compiler’s optimizer”.

Compiler optimizers take the output stream from some compiler stage and transform it to use fewer instructions. They may operate at the level of serialized expression trees, or of a compiler intermediate representation at a slightly later stage, or on the stream of assembler instructions emitted very late (just before assembly and linking). They look for patterns in the output and rewrite them into more economical patterns.

Optimizer pattern rewrites aren’t supposed to change the behavior of the code in any way other then making it faster and smaller. Unfortunately, proving the correctness of an optimization is excruciatingly difficult and mistakes are easy. Mistaken optimizations that almost always work are, though rare in absolute terms, among the most common compiler bugs.

Optimization bugs have a strong tendency to be heisenbugs. Enabling debugging symbols with -g can change the output stream just enough that the optimizer no longer sees the pattern that triggers the defective rule. So can enabling the conditioned-out code for a trace facility.

When I told Jay this, he reported that Hercules normally builds with -O3, which under GCC is a very aggressive (that is to say somewhat risky) optimization level.

“OK, set your optimizer to -O0,”, I told Jay, “and test. If it fails to segfault, you have an optimizer bug. Walk the optimization level upwards until the bug reproduces, then back off one.”

I knew of this technique because I’ve been in this kind of mess myself more than once – most recently the code for interpreting IS-GPS-200, the low-level bit-serial protocol used on GPS satellite-to-ground radio links. It was compromised by an optimizer heisenbug that was later fixed in GCC 4.0.

This morning Jay left a message in my voicemail confirming that my diagnosis was correct.

I said above that optimizer bugs have a strong tendency to be heisenbugs. If you are coding with an optimizing compiler, the reverse implication is also true, especially of segfault heisenbugs. The first thing to try when you trip over one of these is to turn off your optimizer.

You won’t hit this failure case very often — I’ve seen it maybe three or four times in nearly thirty years of C programming. But when you do, knowing this heuristic can save you many, many hours of grief.