A gentle introduction to jump threading optimizations

As part of the GCC developers‘ on-demand range work for GCC 10, I’ve been playing with improving the backward jump threader so it can thread paths that are range-dependent. This, in turn, had me looking at the jump threader, which is a part of the compiler I’ve been carefully avoiding for years. If, like me, you’re curious about compiler optimizations, but are jump-threading-agnostic, perhaps you’ll be interested in this short introduction.

At the highest level, jump threading’s major goal is to reduce the number of dynamically executed jumps on different paths through the program’s control flow graph. Often this results in improved performance due to the reduction of conditionals, which in turn enables further optimizations. Typically, for every runtime branch eliminated by jump threading, two or three other runtime instructions are eliminated.

Simplification of control flow also dramatically reduces the false-positive rates from warnings such as -Wuninitialized . False positives from -Wuninitialized typically occur because there are paths through the control flow graph that cannot occur at runtime, but remain in the internal representation of the code.

GCC developers have found a strong correlation between false positives from -Wuninitialized and missed optimization opportunities. Thus, the GCC developers are keenly interested in any false-positive report for -Wuninitialized .

The classic jump thread example is a simple jump to jump optimization. For instance, it can transform the following:

if (a > 5) goto j; stuff (); stuff (); j: goto somewhere;

into the more optimized sequence below:

if (a > 5) goto somewhere; stuff (); stuff (); j: goto somewhere;

However, jump threading can also thread two partial conditions that are known to overlap:

void foo(int a, int b, int c) { if (a && b) foo (); if (b || c) bar (); }

The above is transformed into:

void foo(int a, int b, int c) { if (a && b) { foo (); goto skip; } if (b || c) { skip: bar (); } }

An even more interesting sequence is when jump threading duplicates blocks to avoid branching. Consider a slightly tweaked version of the above:

void foo(int a, int b, int c) { if (a && b) foo (); tweak (); if (b || c) bar (); }

The compiler cannot easily thread the above, unless it duplicates tweak() , making the resulting code larger:

void foo(int a, int b, int c) { if (a && b) { foo (); tweak (); goto skip; } tweak (); if (b || c) { skip: bar (); } }

Thanks to the code duplication, the compiler is able to join the two overlapping conditionals with no change in semantics. By the way, this is the ultimate goal of jump threading: avoiding expensive conditional branches, even though it may come at the expense of more code.

GCC does have a limit for how many instructions or basic blocks it is willing to duplicate in its quest for faster run speeds. Various compilation tweaks force the jump threader to consider longer sequences. One such option is --param max-fsm-paths-insns=500 , which causes the threader to thread sequences that could potentially duplicate up to 500 instructions per sequence (as opposed to the 100 default). Also there is --param max-fsm-thread-length , which similarly expands the threader maximum, but by basic block length instead of instruction length. As with all --param options, use them for self-amusement and clever party tricks, as they are subject to change without notice.

Jump threading is enabled by default for -O2 and above, but unfortunately, it is intertwined with the various value range propagation (VRP) passes and there is no independent way of turning it off. The deceptive -fno-thread-jumps flag turns off jump threading only in the low-level RTL optimizers, which handle only a minuscule number of jump threads in a typical compilation. Making -fno-thread-jumps applicable to all jump threading throughout the compiler, as well as disentangling VRP from jump threading as a whole, are on our to-do list.

If you’d like to see the jump threader in action, compile a sufficiently complex program with -fdump-tree-all-details -O2 and look at *.c*{ethread, thread1, thread2, thread3, thread4} as well as the VRP dumps (*.c*{vrp1, vrp2}) . You should see things like Threaded jump 3 --> 4 to 7 .

Enjoy!

Also read

More articles for C/C++ developers

Build apps on Red Hat Enterprise Linux RHEL is a robust development platform, giving you the latest versions of: Java, GCC, Clang/LLVM, .NET Core, Go, Python, PHP, Ruby, Rust, and a whole lot more. Download RHEL and start coding today. START CODING TODAY