July 25, 2016

nullprogram.com/blog/2016/07/25/

Today there was a question on /r/C_Programming about the effect of C’s const on optimization. Variations of this question have been asked many times over the past two decades. Personally, I blame naming of const .

Given this program:

void foo ( const int * ); int bar ( void ) { int x = 0 ; int y = 0 ; for ( int i = 0 ; i < 10 ; i ++ ) { foo ( & x ); y += x ; // this load not optimized out } return y ; }

The function foo takes a pointer to const, which is a promise from the author of foo that it won’t modify the value of x . Given this information, it would seem the compiler may assume x is always zero, and therefore y is always zero.

However, inspecting the assembly output of several different compilers shows that x is loaded each time around the loop. Here’s gcc 4.9.2 at -O3, with annotations, for x86-64,

bar: push rbp push rbx xor ebp , ebp ; y = 0 mov ebx , 0xa ; loop variable i sub rsp , 0x18 ; allocate x mov dword [ rsp + 0xc ], 0 ; x = 0 .L0: lea rdi , [ rsp + 0xc ] ; compute &x call foo add ebp , dword [ rsp + 0xc ] ; y += x (not optmized?) sub ebx , 1 jne .L0 add rsp , 0x18 ; deallocate x mov eax , ebp ; return y pop rbx pop rbp ret

The output of clang 3.5 (with -fno-unroll-loops) is the same, except ebp and ebx are swapped, and the computation of &x is hoisted out of the loop, into r14 .

Are both compilers failing to take advantage of this useful information? Wouldn’t it be undefined behavior for foo to modify x ? Surprisingly, the answer is no. In this situation, this would be a perfectly legal definition of foo .

void foo ( const int * readonly_x ) { int * x = ( int * ) readonly_x ; // cast away const ( * x ) ++ ; }

The key thing to remember is that const doesn’t mean constant. Chalk it up as a misnomer. It’s not an optimization tool. It’s there to inform programmers — not the compiler — as a tool to catch a certain class of mistakes at compile time. I like it in APIs because it communicates how a function will use certain arguments, or how the caller is expected to handle returned pointers. It’s usually not strong enough for the compiler to change its behavior.

Despite what I just said, occasionally the compiler can take advantage of const for optimization. The C99 specification, in §6.7.3¶5, has one sentence just for this:

If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.

The original x wasn’t const-qualified, so this rule didn’t apply. And there aren’t any rules against casting away const to modify an object that isn’t itself const . This means the above (mis)behavior of foo isn’t undefined behavior for this call. Notice how the undefined-ness of foo depends on how it was called.

With one tiny tweak to bar , I can make this rule apply, allowing the optimizer do some work on it.

const int x = 0 ;

The compiler may now assume that foo modifying x is undefined behavior, therefore it never happens. For better or worse, this is a major part of how a C optimizer reasons about your programs. The compiler is free to assume x never changes, allowing it to optimize out both the per-iteration load and y .

bar: push rbx mov ebx , 0xa ; loop variable i sub rsp , 0x10 ; allocate x mov dword [ rsp + 0xc ], 0 ; x = 0 .L0: lea rdi , [ rsp + 0xc ] ; compute &x call foo sub ebx , 1 jne .L0 add rsp , 0x10 ; deallocate x xor eax , eax ; return 0 pop rbx ret

The load disappears, y is gone, and the function always returns zero.

Curiously, the specification almost allows the compiler to go further. Consider would would happen if x were allocated somewhere off the stack in read-only memory. That transformation would look like this:

static const int __x = 0 ; int bar ( void ) { for ( int i = 0 ; i < 10 ; i ++ ) foo ( & __x ); return 0 ; }

We would see a few more instructions shaved off (-fPIC, small code model):

section .rodata x: dd 0 section .text bar: push rbx mov ebx , 0xa ; loop variable i .L0: lea rdi , [ rel x ] ; compute &x call foo sub ebx , 1 jne .L0 xor eax , eax ; return 0 pop rbx ret

Because the address of x is taken and “leaked,” this last transform is not permitted. If bar is called recursively such that a second address is taken for x , that second pointer would compare equally ( == ) with the first pointer depsite being semantically distinct objects, which is forbidden (§6.5.9¶6).

Even with this special const rule, stick to using const for yourself and for your fellow human programmers. Let the optimizer reason for itself about what is constant and what is not.

Travis Downs nicely summed up this article in the comments: