Do Understandable If Statements Run Slower?

Aadam, my esteemed teammate, walked down to me right after reading the last post on Fluent C++, How to Make If Statements More Understandable, with a question. In fact this post made quite a few people think and get back to me with feedback and questions, for which I’m very grateful. If it did just that, then it’s already achieved one of its major goals.

Anyway let’s get to Aadam’s question: “Jonathan, he said, I get the idea of rolling out an if statment for it to match the specifications. But does this have any sort of impact on performance?”

This is a great question, and he wasn’t the only one bringing up this topic.

I had a hunch about the answer, but hunches are worth nothing when it comes to performance, right? So we did the only thing we could do: measure!

To perform all our measurements we’ve used Fred Tingaud’s popular tool: quick-bench.com.

Does the compiler understand understandable if statements?

We’ve selected one particular question for our measurements: we saw in the last post that sometimes, following the specifications leads us to have an if inside an if, as opposed to cramming two conditionals into a logical AND expression:

if (condition1) { if (condition2) { ... 1 2 3 4 5 if ( condition1 ) { if ( condition2 ) { . . . if (condition1 && condition2) { ... 1 2 3 if ( condition1 && condition2 ) { . . .

So does one have a better performance than the other one? And even before this: does the compiler understands that the two snippets are equivalent, and generates the same code for them?

We throw these two pieces of code into quick-bench, that also generates the assembly code for each one. The configuration is clang++ 3.8 launched with -O1 as an optimization flag. We used random numbers for the conditions, in order to make sure they were actually executed at runtime. Here is our quick-bench if you’re curious to have a look.

Here are the two pieces of assembly code that clang generated:

push %r14 push %rbx push %rax mov %rdi,%r14 callq 404ce0 <benchmark::State::KeepRunning()> test %al,%al je 404ab6 <if_if(benchmark::State&)+0x56> mov $0x270f,%ebx data16 nopw %cs:0x0(%rax,%rax,1) callq 404b80 <getPositive()> test %eax,%eax jle 404a9c <if_if(benchmark::State&)+0x3c> callq 404be0 <getNegative()> test %eax,%eax jle 404a9c <if_if(benchmark::State&)+0x3c> movl $0x2a,0x23442c(%rip) # 638ec8 <c> test %ebx,%ebx lea -0x1(%rbx),%eax mov %eax,%ebx jne 404a80 <if_if(benchmark::State&)+0x20> mov %r14,%rdi callq 404ce0 <benchmark::State::KeepRunning()> test %al,%al mov $0x270f,%ebx jne 404a80 <if_if(benchmark::State&)+0x20> add $0x8,%rsp pop %rbx pop %r14 retq 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 push %r 14 push %rbx push %rax mov %rdi , %r 14 callq 404ce0 < benchmark :: State :: KeepRunning ( ) > test %al , %al je 404ab6 < if _ if ( benchmark :: State & ) + 0x56 > mov $0x270f , %ebx data16 nopw %cs : 0x0 ( %rax , %rax , 1 ) callq 404b80 < getPositive ( ) > test %eax , %eax jle 404a9c < if _ if ( benchmark :: State & ) + 0x3c > callq 404be0 < getNegative ( ) > test %eax , %eax jle 404a9c < if _ if ( benchmark :: State & ) + 0x3c > movl $0x2a , 0x23442c ( %rip ) # 638ec8 < c > test %ebx , %ebx lea - 0x1 ( %rbx ) , %eax mov %eax , %ebx jne 404a80 < if _ if ( benchmark :: State & ) + 0x20 > mov %r 14 , %rdi callq 404ce0 < benchmark :: State :: KeepRunning ( ) > test %al , %al mov $0x270f , %ebx jne 404a80 < if _ if ( benchmark :: State & ) + 0x20 > add $0x8 , %rsp pop %rbx pop %r 14 retq push %r14 push %rbx push %rax mov %rdi,%r14 callq 404ce0 <benchmark::State::KeepRunning()> test %al,%al je 404b16 <if_and(benchmark::State&)+0x56> mov $0x270f,%ebx data16 nopw %cs:0x0(%rax,%rax,1) callq 404b80 <getPositive()> test %eax,%eax jle 404afc <if_and(benchmark::State&)+0x3c> callq 404be0 <getNegative()> test %eax,%eax jle 404afc <if_and(benchmark::State&)+0x3c> movl $0x2a,0x2343cc(%rip) # 638ec8 <c> test %ebx,%ebx lea -0x1(%rbx),%eax mov %eax,%ebx jne 404ae0 <if_and(benchmark::State&)+0x20> mov %r14,%rdi callq 404ce0 <benchmark::State::KeepRunning()> test %al,%al mov $0x270f,%ebx jne 404ae0 <if_and(benchmark::State&)+0x20> add $0x8,%rsp pop %rbx pop %r14 retq 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 push %r 14 push %rbx push %rax mov %rdi , %r 14 callq 404ce0 < benchmark :: State :: KeepRunning ( ) > test %al , %al je 404b16 < if _ and ( benchmark :: State & ) + 0x56 > mov $0x270f , %ebx data16 nopw %cs : 0x0 ( %rax , %rax , 1 ) callq 404b80 < getPositive ( ) > test %eax , %eax jle 404afc < if _ and ( benchmark :: State & ) + 0x3c > callq 404be0 < getNegative ( ) > test %eax , %eax jle 404afc < if _ and ( benchmark :: State & ) + 0x3c > movl $0x2a , 0x2343cc ( %rip ) # 638ec8 < c > test %ebx , %ebx lea - 0x1 ( %rbx ) , %eax mov %eax , %ebx jne 404ae0 < if _ and ( benchmark :: State & ) + 0x20 > mov %r 14 , %rdi callq 404ce0 < benchmark :: State :: KeepRunning ( ) > test %al , %al mov $0x270f , %ebx jne 404ae0 < if _ and ( benchmark :: State & ) + 0x20 > add $0x8 , %rsp pop %rbx pop %r 14 retq

As you can see, except for the memory addresses this is exactly the same generated code. So with -O1, clang figures out that the two pieces of code are equivalent, and therefore they have the same performance.

Now let’s try with -O0 (no optimization):

push %rbp mov %rsp,%rbp sub $0x10,%rsp mov %rdi,-0x8(%rbp) mov -0x8(%rbp),%rdi callq 404d80 <benchmark::State::KeepRunning()> test $0x1,%al jne 404962 <if_if(benchmark::State&)+0x22> jmpq 4049b3 <if_if(benchmark::State&)+0x73> movl $0x2710,-0xc(%rbp) mov -0xc(%rbp),%eax mov %eax,%ecx add $0xffffffff,%ecx mov %ecx,-0xc(%rbp) cmp $0x0,%eax je 4049ae <if_if(benchmark::State&)+0x6e> callq 404ad0 <getPositive()> cmp $0x0,%eax jle 4049a9 <if_if(benchmark::State&)+0x69> callq 404b60 <getNegative()> cmp $0x0,%eax jle 4049a4 <if_if(benchmark::State&)+0x64> movl $0x2a,0x638ecc jmpq 4049a9 <if_if(benchmark::State&)+0x69> jmpq 404969 <if_if(benchmark::State&)+0x29> jmpq 40494c <if_if(benchmark::State&)+0xc> add $0x10,%rsp pop %rbp retq 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 push %rbp mov %rsp , %rbp sub $0x10 , %rsp mov %rdi , - 0x8 ( %rbp ) mov - 0x8 ( %rbp ) , %rdi callq 404d80 < benchmark :: State :: KeepRunning ( ) > test $0x1 , %al jne 404962 < if _ if ( benchmark :: State & ) + 0x22 > jmpq 4049b3 < if _ if ( benchmark :: State & ) + 0x73 > movl $0x2710 , - 0xc ( %rbp ) mov - 0xc ( %rbp ) , %eax mov %eax , %ecx add $0xffffffff , %ecx mov %ecx , - 0xc ( %rbp ) cmp $0x0 , %eax je 4049ae < if _ if ( benchmark :: State & ) + 0x6e > callq 404ad0 < getPositive ( ) > cmp $0x0 , %eax jle 4049a9 < if _ if ( benchmark :: State & ) + 0x69 > callq 404b60 < getNegative ( ) > cmp $0x0 , %eax jle 4049a4 < if _ if ( benchmark :: State & ) + 0x64 > movl $0x2a , 0x638ecc jmpq 4049a9 < if _ if ( benchmark :: State & ) + 0x69 > jmpq 404969 < if _ if ( benchmark :: State & ) + 0x29 > jmpq 40494c < if _ if ( benchmark :: State & ) + 0xc > add $0x10 , %rsp pop %rbp retq push %rbp mov %rsp,%rbp sub $0x10,%rsp mov %rdi,-0x8(%rbp) mov -0x8(%rbp),%rdi callq 404d80 <benchmark::State::KeepRunning()> test $0x1,%al jne 4049e2 <if_and(benchmark::State&)+0x22> jmpq 404a2e <if_and(benchmark::State&)+0x6e> movl $0x2710,-0xc(%rbp) mov -0xc(%rbp),%eax mov %eax,%ecx add $0xffffffff,%ecx mov %ecx,-0xc(%rbp) cmp $0x0,%eax je 404a29 <if_and(benchmark::State&)+0x69> callq 404ad0 <getPositive()> cmp $0x0,%eax jle 404a24 <if_and(benchmark::State&)+0x64> callq 404b60 <getNegative()> cmp $0x0,%eax jle 404a24 <if_and(benchmark::State&)+0x64> movl $0x2a,0x638ecc jmpq 4049e9 <if_and(benchmark::State&)+0x29> jmpq 4049cc <if_and(benchmark::State&)+0xc> add $0x10,%rsp pop %rbp retq 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 push %rbp mov %rsp , %rbp sub $0x10 , %rsp mov %rdi , - 0x8 ( %rbp ) mov - 0x8 ( %rbp ) , %rdi callq 404d80 < benchmark :: State :: KeepRunning ( ) > test $0x1 , %al jne 4049e2 < if _ and ( benchmark :: State & ) + 0x22 > jmpq 404a2e < if _ and ( benchmark :: State & ) + 0x6e > movl $0x2710 , - 0xc ( %rbp ) mov - 0xc ( %rbp ) , %eax mov %eax , %ecx add $0xffffffff , %ecx mov %ecx , - 0xc ( %rbp ) cmp $0x0 , %eax je 404a29 < if _ and ( benchmark :: State & ) + 0x69 > callq 404ad0 < getPositive ( ) > cmp $0x0 , %eax jle 404a24 < if _ and ( benchmark :: State & ) + 0x64 > callq 404b60 < getNegative ( ) > cmp $0x0 , %eax jle 404a24 < if _ and ( benchmark :: State & ) + 0x64 > movl $0x2a , 0x638ecc jmpq 4049e9 < if _ and ( benchmark :: State & ) + 0x29 > jmpq 4049cc < if _ and ( benchmark :: State & ) + 0xc > add $0x10 , %rsp pop %rbp retq

There is one more line in the code that has two ifs:

jmpq 4049a9 <if_if(benchmark::State&)+0x69> 1 jmpq 4049a9 < if _ if ( benchmark :: State & ) + 0x69 >

which corresponds to a “jump”, the implemenatation of an if statement in assembly code.

Can the CPU live with understandable if statements?

Since the code is different, let’s see how this impacts the time of execution. Let’s give only positive values to a so that the inner if is always executed:

(this image was generated with quick-bench.com)

The version that has the two conditionals on the same line is about 7% faster! So in the case we followed a specifications that led us roll out an if statement like the one in this example, we’ve made the application slower. Blimey!

And now let’s test it with random values for a that can be 0 or 1 with equal probability:

(this image was generated with quick-bench.com)

This time the second version is about 2% faster, certainly because the execution doesn’t always reach the inner if.

Can I afford understandable if statements??

Let’s analyse the situation calmly.

First of all, if you’re compiling at a sufficient level of optimization, you’re fine. No performance penalty if you choose the if that matches your specifications better. Now the right level of optimization depends on your compiler, but in this experiment it was -O1 for clang. I’ve also generated the code for the latest version of gcc on godbolt (quick-bench doesn’t support gcc as of this writing) for the two ifs and for the if and AND expression. And while the code is also different for -O0, it becomes the same for -O1.

Now if you’re not compiling with optimization, maybe the faster one corresponds to your specifications, in which case you’re also fine. There is not one version of the if that is more understandable in itself, it depends on the flow of the spec.

If your specifications are expressed with the slower if, and this piece of code is not in a critical section for performance, you’re fine again. Indeed, as Scott Meyers explains it in Item 16 of More Effective C++, most of the code isn’t relevant for performance optimizations, and you need to profile your code to figure out which parts are. So 7%, or 2%, or whatever value corresponds to your architecture on that particular line can go completely unnoticed, and it would be a shame to sacrifice its expressiveness for it.

If a certain alignements of the planets causes that particular if to be the bottleneck of your program, then you have to change it. But when you do so, try to do it in a way that would make sense for the specifications. Consult with your domain people if necessary. This way you’re saving the readability of this piece of code in the future.

And if even that isn’t possible, only then can you forgo the readability of this particular line.

But before you get into that extreme situation, you will have saved hundreds of other if statements, that will live on a peaceful life and will thank you for it.

Related articles:

Share this post! Don't want to miss out ?