Something You May Not Know About the Switch Statement in C/C++

Introduction

Many programming languages such as C/C++, C#, Java, and Pascal provide the switch statement to let us implement selection logic. In some scenarios, it's a good alternative to if-then-else , making code clearer and more readable. When using switch in practice, you may want to know:

How the switch block is executed at runtime?

block is executed at runtime? Is it running faster than an if-then-else for a long list of conditions?

for a long list of conditions? For n conditions, what is the switch time complexity?

The C/C++ standard defines the specification of language elements, but it doesn't say anything about how to implement the switch statement. Every vendor is free to use any implementation as long as it fits the standard. This article discusses what happens when running a switch statement in Visual C++, with a few examples under different conditions. We'll analyze these examples by using the Microsoft Visual Studio IDE, since it can generate the corresponding assembly listing on compiling. Thus, a general understanding of Intel (x86) assembly language is assumed. As you can see shortly, all results here are based on reverse engineering, and so the article is never a comprehensive description of switch implementations in compilers. If you are learning Assembly Language Programming, this article might be a study material to read.

Our first example is switch1.cpp, a commonly used simple block as below:

#include " functions.h" int main() { int i = 3 ; switch (i) { case 1 : f1(); break ; case 2 : f2(); break ; case 5 : f1(); break ; case 7 : f2(); break ; case 10 : f1(); break ; case 11 : f2(); break ; case 12 : f2(); break ; case 17 : f1(); break ; case 18 : f1(); break ; default : f3(); } return 0 ; }

where three functions are defined in functions.cpp:

void f1() { cout " f1 called

" ; } void f2() { cout " f2 called

" ; } void f3() { cout " f3 called

" ; }

Could the worst case be thought of as i=3 or i=20 ? How does it execute: does it exhaust all nine cases and finally reach the default to call f3 ? Let's answer it from the assembly translations.

In order to generate an assembly listing in Visual Studio, open the switch1.cpp Property dialog and select the Output Files category under C/C++. On the right pane, choose the Assembly With Source Code (/FAs) option as shown below:

Then, when you compile switch1.cpp, an assembly file called switch1.asm will be generated. Using this option, the listing includes the C++ source code, which is commented by the semi-colon with a line number as shown in the next section.

The Two-Level Jump Table

Let's analyze the assembly listing from top down. Here is where the switch starts:

mov DWORD PTR _i$[ ebp ], 3 mov eax , DWORD PTR _i$[ ebp ] mov DWORD PTR tv64[ ebp ], eax mov ecx , DWORD PTR tv64[ ebp ] sub ecx , 1 mov DWORD PTR tv64[ ebp ], ecx cmp DWORD PTR tv64[ ebp ], 17 ja SHORT $LN1@main mov edx , DWORD PTR tv64[ ebp ] movzx eax , BYTE PTR $LN15@main[ edx ] jmp DWORD PTR $LN16@main[eax*4]

Assuming that the symbol _i$[ebp] is an alias of i , tv64[ebp] is another name of i2 , $LN15@main is renamed as table1 , $LN16@main as table2 , and $LN1@main is a label named Default_Label . The snippet merely does this in pseudocode:

i2 = i; i2 = i2-1; if i2 > 17 goto Default_Label; goto table2[4*table1[i2]];

Here, 17 indicates the last case condition value, because the integers 1 to 18 are mapped from 0 to 17. This is why i2 is decremented to make it a zero-based integer as an index. Now, if i2 is greater than 17 (e.g., n=20 ), the control goes to the default . Otherwise, it goes to where table2[4*table1[i2]] is pointed.

How about when i is zero? Then i2 becomes -1. Worry about the index out of range error? No, it will never happen. Back to the assembly listing, you can see that -1 is saved as DWORD , the double word as an unsigned 4-byte integer. Hence, it must be greater than 17 and goes to default .

Let's look at two tables and see how they work together. The table1 is pretty simple, with a starting address at $LN15@main , which you can just think of as an array name.

$LN15@main: DB 0 DB 1 DB 9 DB 9 DB 2 DB 9 DB 3 DB 9 DB 9 DB 4 DB 5 DB 6 DB 9 DB 9 DB 9 DB 9 DB 7 DB 8

With this array, table1[0] is 0, table1[1] is 1, table1[2] and table1[3] are 9, etc. These values are created for the calculation of the index of table2 , which starts at $LN16@main :

$LN16@main: DD $LN10@main DD $LN9@main DD $LN8@main DD $LN7@main DD $LN6@main DD $LN5@main DD $LN4@main DD $LN3@main DD $LN2@main DD $LN1@main

The above labels, from $LN10@main to $LN1@main , are ten calling targets in C++, for nine cases plus one default . Notice that DB represents defining byte (8 bits), while DD defines the double word type of four bytes (32 bits). This is why we need to multiply 4 in table2[4*table1[i2]] . By this formula, we calculate the calling address via table1 and table2 :

if i equals 1, i2 is 0 and table1[0] is 0, jump to $LN10@main by table2[0] , the first case.

equals 1, is 0 and is 0, jump to by , the first case. if i equals 2, i2 is 1 and table1[1] is 1, jump to $LN9@main by table2[4*1] , the second case.

equals 2, is 1 and is 1, jump to by , the second case. if i equals 3, i2 is 2 and table1[2] is 9, jump to $LN1@main by table2[4*9] , the default.

equals 3, is 2 and is 9, jump to by , the default. ... ...

Now we come to the code segment labeled by LN10@main to $LN1@main as the calling targets:

$LN10@main: call ?f1@@YAXXZ jmp SHORT $LN11@main $LN9@main: call ?f2@@YAXXZ jmp SHORT $LN11@main $LN8@main: call ?f1@@YAXXZ jmp SHORT $LN11@main $LN2@main: call ?f1@@YAXXZ jmp SHORT $LN11@main $LN1@main: call ?f3@@YAXXZ $LN11@main:

Functions f1 , f2 , and f3 are converted to the decorated assembly procedures, and prefixed with a question mark, like ?f1@@YAXXZ , ?f2@@YAXXZ , and ?f3@@YAXXZ . Notice that $LN10@main is a label of case 1 to call f1 , $LN9@main for case 2 to call f2 , and $LN1@main for default to call f3 .

An additional label is $LN11@main , pointing to the location after the last default clause. As soon as each case operation is done, the control jumps to $LN11@main . This implements the break statement. Without it, the control goes to the next case. This is why the break statement is necessary in the C/C++ switch block.

Obviously, based on such a two-level table mechanism, we have one comparison, one multiplication, and two address jumps. The time complexity of this switch pattern should be O(1). Put this all together, we have a big picture like this:

Recall that we use DB to define index values in table1 , and DD for table2 . In this example, there are only 18 case values for table1 . But DB defining byte means the range is only from zero to 255. What if there are more cases in switch , with the number of values over 255?

The One-Level Jump Table

Here comes the second example, switch2.cpp with 1000 cases. For simplicity, we start from the condition value zero:

int main2() { int i = 1000 ; switch (i) { case 0 : f1(); break ; case 1 : f1(); break ; case 2 : f2(); break ; case 3 : f1(); break ; case 995 : f1(); break ; case 996 : f1(); break ; case 997 : f1(); break ; case 998 : f2(); break ; case 999 : f1(); break ; default : f3(); } return 0 ; }

Don't think it makes things complicated. On the contrary, it becomes simpler. This is the switch2.asm that Visual Studio generates for us:

mov DWORD PTR _i$[ ebp ], 1000 mov eax , DWORD PTR _i$[ ebp ] mov DWORD PTR tv64[ ebp ], eax cmp DWORD PTR tv64[ ebp ], 999 ja $LN1@main2 mov ecx , DWORD PTR tv64[ ebp ] jmp DWORD PTR $LN1006@main2[ecx*4]

Likewise, assume that the symbol _i$[ebp] is an alias of i . Because the first case starts from zero, no mapping is needed. So i2 (from tv64[ebp] ) equals i . The array $LN1006@main2 is an only jump table, and the label $LN1@main2 is Default_Label . This snippet simply does this:

i2 = i; if i2 > 999 goto Default_Label; goto table[4*i2];

By searching $LN1006@main2 in switch2.asm, we find the code labels all defined by double words:

$LN1006@main2: DD $LN1001@main2 DD $LN1000@main2 DD $LN999@main2 DD $LN998@main2 DD $LN997@main2 DD $LN996@main2 DD $LN5@main2 DD $LN4@main2 DD $LN3@main2 DD $LN2@main2

Totally 1000 calling targets, each label is followed by a procedure call, downwards from $LN1001@main2 to $LN2@main2 :

$LN1001@main2: call ?f1@@YAXXZ jmp $LN1002@main2 $LN1000@main2: call ?f1@@YAXXZ jmp $LN1002@main2 $LN999@main2: call ?f2@@YAXXZ jmp $LN1002@main2 $LN3@main2: call ?f2@@YAXXZ jmp SHORT $LN1002@main2 $LN2@main2: call ?f1@@YAXXZ jmp SHORT $LN1002@main2 $LN1@main2: call ?f3@@YAXXZ $LN1002@main2:

Here, $LN1@main2 directs to default , and the additional label $LN1002@main implements break . Although the complexity is also O(1), this mechanism should be a little bit efficient since only one address jump is required. The following is a picture of this example:

Until now, we see the switch illustrations executed so well. Is it really better to use switch than if-then-else ? For n conditions without a match, definitely, if-then-else must check each condition until the last, which is the worst case of O(n). But for switch , do we always expect its complexity to be O(1)? Or is there some kind of code that would lead its execution beyond that?

Using Binary Search

We will give the third example showing the large gap between case condition values in switch3.cpp, where the switch execution behaves as the binary search:

int main3() { int i = 1 ; switch (i) { case 100 : f1(); break ; case 200 : f2(); break ; case 250 : f2(); break ; case 500 : f1(); break ; case 700 : f2(); break ; case 750 : f2(); break ; case 800 : f2(); break ; case 900 : f1(); break ; default : f3(); } return 0 ; }

By generating the assembly listing switch3.asm, this time, we take a look from bottom up by checking the case operations first:

$LN9@main3: call ?f1@@YAXXZ jmp SHORT $LN10@main3 $LN8@main3: call ?f2@@YAXXZ jmp SHORT $LN10@main3 $LN7@main3: call ?f2@@YAXXZ jmp SHORT $LN10@main3 $LN6@main3: call ?f1@@YAXXZ jmp SHORT $LN10@main3 $LN5@main3: call ?f2@@YAXXZ jmp SHORT $LN10@main3 $LN4@main3: call ?f2@@YAXXZ jmp SHORT $LN10@main3 $LN3@main3: call ?f2@@YAXXZ jmp SHORT $LN10@main3 $LN2@main3: call ?f1@@YAXXZ jmp SHORT $LN10@main3 $LN1@main3: call ?f3@@YAXXZ $LN10@main3:

Pretty neat, with eight cases plus one default , we can easily write this part into a list of nine rows, each corresponding to a target label, a C++ case, and a procedure name:

Besides $LN9@main3 to $LN1@main3 , we also have $LN10@main3 implement the break statement. Next, how do we jump to these calling targets? Look at the following code:

mov DWORD PTR _i$[ ebp ], 1 mov eax , DWORD PTR _i$[ ebp ] mov DWORD PTR tv64[ ebp ], eax cmp DWORD PTR tv64[ ebp ], 700 jg SHORT $LN14@main3 cmp DWORD PTR tv64[ ebp ], 700 je SHORT $LN5@main3 cmp DWORD PTR tv64[ ebp ], 250 jg SHORT $LN15@main3 cmp DWORD PTR tv64[ ebp ], 250 je SHORT $LN7@main3 cmp DWORD PTR tv64[ ebp ], 100 je SHORT $LN9@main3 cmp DWORD PTR tv64[ ebp ], 200 je SHORT $LN8@main3 jmp SHORT $LN1@main3 $LN15@main3: cmp DWORD PTR tv64[ ebp ], 500 je SHORT $LN6@main3 jmp SHORT $LN1@main3 $LN14@main3: cmp DWORD PTR tv64[ ebp ], 750 je SHORT $LN4@main3 cmp DWORD PTR tv64[ ebp ], 800 je SHORT $LN3@main3 cmp DWORD PTR tv64[ ebp ], 900 je SHORT $LN2@main3 jmp SHORT $LN1@main3

The logic is not too hard to understand. At first, the condition value i is saved in both _i$[ebp] and tv64[ebp] . To simplify, we just make use of all the labels here by removing the leading "$" and the tailing "@main3". Rewrite the snippet like this:

i2 = i; if i2 > 700 goto LN14; if i2 == 700 goto LN5; if i2 > 250 goto LN15; if i2 == 250 goto LN7; if i2 == 100 goto LN9; if i2 == 200 goto LN8; goto LN1; LN15: if i2 == 500 goto LN6; goto LN1; LN14: if i2 == 750 goto LN4; if i2 == 800 goto LN3; if i2 == 900 goto LN2; goto LN1;

Surely, the compiler optimizes the code, because it first chooses 700 to compare, instead of the beginning case value 100 in the switch block. Although this switch is implemented with ten if-then statements for nine conditions, it actually applies the binary search mechanism. The following shows the equivalent decision tree of above code, a Binary Search Tree that you may be familiar with:

For all internal nodes colored yellow in circle, we make the comparisons, while all leave nodes in oval represent the successful ending. Each failure goes to LN1 by default. The inorder traversal sequence for comparison nodes is 100, 200, 250, 500, 700, 750, 800, and 900; while the sequence for all leaves in the inorder traversal is exactly LN9 , LN8 , LN7 , LN6 , LN5 , LN4 , LN3 , and LN2 , as ordered in the previous table. Essentially, this is a binary search algorithm with the complexity of O(log n). When i =1, it will pass through the first six comparisons and then reach default at LN1 . But when i =500, just four comparisons are required.

As we know, the prerequisite of binary search is that the input data is sorted. I wonder if this is just because I use the ascending case values in switch in switch3.cpp. The curiosity brings out the following disordered conditions in switch4.cpp.

int main4() { int i = 1 ; switch (i) { case 750 : f2(); break ; case 700 : f2(); break ; case 250 : f2(); break ; case 500 : f1(); break ; case 800 : f2(); break ; case 900 : f1(); break ; case 100 : f1(); break ; case 200 : f2(); break ; default : f3(); } return 0 ; }

To my surprise, the same binary search strategy appears in code of switch4.asm, with exact the same decision tree as shown above. Only difference is that the labels are renumbered - that's quite reasonable, because we have just reordered them! You can examine the attached switch4.asm for details.

This experiment definitely brings us some hint to find out how the compiler does its magic to sort the case condition values. A sorting algorithm is over O(log n) and not worthwhile to have it at runtime. Notice that sorting is not visible from generated assemblies. This means that sorting is not contained in the assembly instructions which will execute at runtime. Also notice that all the condition values are constant and available before the compiler translates the C++ to machine code. Now it’s reasonable to think of the preprocessor; the sorting with all known case values could be simply done in compilation. This is why the translated assembly code only reflects the binary search without sorting code. The static sorting behavior (as opposed to dynamic behavior at runtime), could be implemented with a macro procedure, assembly directives and operators. Such an preprocessing example can be found in References at the end of the article.

The Hybrid

At this moment, I can show an example of combination of a jump table and the binary search as in switch5.cpp.

int main5() { int i = 1 ; switch (i) { case 3 : f1(); break ; case 5 : f2(); break ; case 6 : f2(); break ; case 7 : f2(); break ; case 100 : f1(); break ; case 200 : f2(); break ; case 250 : f2(); break ; case 700 : f2(); break ; case 750 : f2(); break ; case 900 : f2(); break ; default : f3(); } return 0 ; }

The following is how this switch block is implemented in the corresponding switch5.asm:

mov DWORD PTR _i$[ ebp ], 1 mov eax , DWORD PTR _i$[ ebp ] mov DWORD PTR tv64[ ebp ], eax cmp DWORD PTR tv64[ ebp ], 100 jg SHORT $LN16@main5 cmp DWORD PTR tv64[ ebp ], 100 je $LN7@main5 mov ecx , DWORD PTR tv64[ ebp ] sub ecx , 3 mov DWORD PTR tv64[ ebp ], ecx cmp DWORD PTR tv64[ ebp ], 4 ja $LN1@main5 mov edx , DWORD PTR tv64[ ebp ] jmp DWORD PTR $LN18@main5[edx*4] $LN16@main5: cmp DWORD PTR tv64[ ebp ], 700 jg SHORT $LN17@main5 cmp DWORD PTR tv64[ ebp ], 700 je SHORT $LN4@main5 cmp DWORD PTR tv64[ ebp ], 200 je SHORT $LN6@main5 cmp DWORD PTR tv64[ ebp ], 250 je SHORT $LN5@main5 jmp SHORT $LN1@main5 $LN17@main5: cmp DWORD PTR tv64[ ebp ], 750 je SHORT $LN3@main5 cmp DWORD PTR tv64[ ebp ], 900 je SHORT $LN2@main5 jmp SHORT $LN1@main5

It consists of two parts, a one-level jump table first and then the binary search code. Using the previous notations of i , i2 and label naming, let $LN18@main5 as a table and $LN1@main5 as default . This snippet does this:

i2 = i; if i2 > 100 goto LN16; if i2 == 100 goto LN7; // go case 100 // now i2 < 100 i2 = i2-3; // map to zero-based index if i2 > 4 goto LN1; // go default over 7 goto table[4*i2]; // go jump table LN16: // binary search if i2 > 700 goto LN17; if i2 == 700 goto LN4; // go case 700 if i2 == 200 goto LN6; // go case 200 if i2 == 250 goto LN5; // go case 250 goto LN1; // go default LN17: if i2 == 750 goto LN3; // go case 750 if i2 == 900 goto LN2; // go default goto LN1;

where the table is defined as:

$LN18@main5: DD LN11 DD LN1 DD LN10 DD LN9 DD LN8

An interesting change is to remove the case 6 in switch5.cpp. Just because of this, the hybrid becomes the combination of a two-level jump table and the binary search. For details, look at switch6.cpp and switch6.asm in downloadable samples. If you try to add extra cases in some order, you can find two individual jump tables combined with the binary search.

More Questions and Beyond

Now you have learned something about switch from some typical examples. You should understand now why the C/C++ switch only supports an integral data type for its conditional expression, such as char and enumeration type, rather than the float point or string type. Besides these, more intuitive questions might come to your head:

Is it necessary to maintain the cases in the order of condition values for a jump table?

table? How about if we use negative integers as case values?

What if the default label is missing, or appears anywhere in the block not at the last?

I believe you can answer these just by analyzing an assembly listing of the C++ code containing these questions. For convenience, I attached the following switch7.cpp with its switch7.asm in samples.

int main7() { int i = 15 ; switch (i) { case 2 : f2(); break ; case 1 : f1(); break ; case 5 : f1(); break ; case 10 : f1(); break ; case 7 : f2(); break ; default : f3(); break ; case - 3 : f2(); break ; case 12 : f1(); break ; case 17 : f2(); break ; case 18 : f1(); break ; } return 0 ; }

Although the switch performance looks better than if-than-else in these examples, we still have questions unanswered. Obviously, it's not possible to enumerate all the switch executions in practice. A comprehensive analysis of switch implementations should be written by a compiler developer, not by a blind black-box tester. So, we are not able to determine the worst case in the switch execution, whether it would reach O(n) or not. Also we don't know under which conditions, the compiler chooses one implementation instead of another or even other methods unmentioned here.

As an example from a reader in discussion, one concern is about the memory waste of above said jump tables when implementing sparsely populated switch cases. How about an extreme example with only two case entries 1 and 0x7fffffff as below? Would the compiler consume most useless entries in a jump table?

int main8() { int i = 3 ; switch (i) { case 1 : f1(); break ; case 0x7fffffff: f2(); break ; default : f3(); break ; } return 0 ; }

This is not true for our smart compiler. As mentioned, we don't know how the compiler chooses one implementation instead of the other. Please see here, this two-case example doesn't choose the jump table. It just simply converts switch to if-then without consuming any more memory:

mov DWORD PTR _i$[ ebp ], 3 mov eax , DWORD PTR _i$[ ebp ] mov DWORD PTR tv64[ ebp ], eax cmp DWORD PTR tv64[ ebp ], 1 je SHORT $LN3@main8 cmp DWORD PTR tv64[ ebp ], 2147483647 je SHORT $LN2@main8 jmp SHORT $LN1@main8 $LN3@main8: call ?f1@@YAXXZ jmp SHORT $LN4@main8 $LN2@main8: call ?f2@@YAXXZ jmp SHORT $LN4@main8 $LN1@main8: call ?f3@@YAXXZ $LN4@main8:

Again, no way to affect or predict the compiler's choices with such a black box testing. Depending on the problem, either sparsely or densely populated switch cases, the compiler adapts them obviously very well.

Summary

By scrutinizing the above examples, we expose something that you may not know about the switch at runtime. To analyze a C/C++ program in Visual Studio, we can have both static and dynamic analyses. For this article, we make use of the assembly listings generated by the compiler. On the other hand, we can also monitor binary executions at runtime with the VS Disassembly window in debug, where the labels and procedures in the listing are translated to memory addresses. This way, you can trace registers and memory allocations to understand data endianness, stack frames, etc. Here, for our purpose, the assembly listing seems sufficient and easy to indicate a jump from here to there. The downloadable zip file contains seven examples, with both the .cpp source code and the .asm listings in VS 2010. The same listings can be generated by an earlier version of VS.

This article does not involve hot technologies like .NET or C#. Assembly Language is comparatively traditional without much sensation these days. However, in applications, different types of assembly languages play an important role in new device development. Academically, Assembly Language Programming is considered as a demanding course in Computer Science. I am teaching Assembly Language Programming, CSCI 241, at Fullerton College, where the concept of high-level language interfacing with low-level execution is an essential part in teaching. In particular, one of the main topics that students are interested in is how C/C++ or Java runs on a machine, represented by low-level data structures and algorithms. Therefore, I hope this article could also serve as one of the basic problem-solving examples for students. Any comments and suggestions are welcome.

Acknowledgment

I would like to sincerely thank wtwhite, Charvak Karpe, and Harold Aptroot for their valuable discussions about the binary search here. Without them, the section "Using Binary Search" would not be like it as today.

References

History