Developers want code that runs fast and users do too. However, it is easy to fall into the trap of early optimization; this story shows that even in non-trivial cases, modern compilers can do wonders. On the other hand, the myth of the “sufficiently smart compiler” is still very relevant. The best way is to look at what is actually generated for an example and that is what I’ll do in this article. Enter the SentinelInputRange.

A range?

D introduces the concept of a range. For a C++ programmer, a range can be seen as a pair of iterators. For a Java programmer, it can be seen as an InputStream. It is defined as an entity with a property front (which returns the current item in the range), a method popFront, which pops an item from the range, and a property empty, a boolean that becomes true when all elements of the range have been popped. It is obviously invalid to call popFront or access front on an empty range.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

// Some range :

struct MyRange {

private T [ ] data ;



@property

T front ( ) {

return data [ 0 ] ;

}



void popFront ( ) {

data = data [ 1 .. $ ] ;

}



@property

bool empty ( ) {

return data. length == 0 ;

}

}

In D, foreach can handle ranges properly, but to understand our problem let’s consume it manually to print its elements:

1

2

3

4

while ( ! r. empty ) {

writeln ( r. front ) ;

r. popFront ( ) ;

}

BTW, as of now, “r” is what I’ll use for a range in D code.

A SentinelInputRange?

A very common way to consume a range is to use a switch statement on the value of front, performing different actions depending on the value encountered. As the discussion on D newsgroup started with a lexer in mind, I’ll reuse that example.

A lexer is a piece of code that consume characters and create tokens. This is the first thing a compiler does once it has loaded your file. For instance, a lexer will recognize that « if » or « is » are keywords, and create the according token; conversely « igloo » is not a keyword and will be recognised as an identifier. Using a range, it is done as follows:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

if ( r. empty ) {

// Output some error.

}



switch ( r. front ) {

case 'i' :

r. popFront ( ) ;

if ( r. empty ) return lexIdentifier ( r , "i" ) ;

switch ( r. front ) {

case 'f' :

r. popFront ( ) ;

return lexKeyword ( r , "if" ) ;



case 's' :

r. popFront ( ) ;

return lexKeyword ( r , "is" ) ;



default :

return lexIdentifier ( r , "i" ) ;

}



default :

// Do something else.

}

Now imagine that the characters come from a C string. C string are consecutive characters in memory followed by a 0 that indicates the end of the string. We call that 0 a sentinel and it allow for something much better for our code above:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

switch ( * s ) {

case ' \0 ' :

// Output some error.



case 'i' :

s ++;

switch ( * s ) {

case 'f' :

s ++;

return lexKeyword ( s , "if" ) ;



case 's' :

s ++;

return lexKeyword ( s , "is" ) ;



default :

return lexIdentifier ( s , "i" ) ;

}



default :

// Do something else.

}

See how awesome it is? As the end of the input is indicated by a special value, we can handle the stream without checking for the end all the time. When we have a non 0 value, we can directly pop the pointer as we know that something comes after. This is impossible to do with ranges, as accessing front before checking for empty is invalid. We end up doing way more checks, hence the code is slower.

But wait a minute? Can’t I create a range that wraps a C string?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

struct CString {

private char * s ;



@property

char front ( ) {

return * s ;

}



void popFront ( ) {

s ++;

}



@property

bool empty ( ) {

return * s == ' \0 ' ;

}

}

So far so good, but our lexer doesn’t know about CString and so can’t take advantage of its nice capabilities (without breaking encapsulation). It has to check empty before accessing front. To solve this a new type of Range has been proposed: the SentinelInputRange. This new range would define a sentinel property in addition to the other features of a range to indicate to the consumer that this value is the end of the range. The consumer can now take advantage of this, isn’t it great?

In fact it isn’t, as our lexer would either have to have 2 code-paths for SentinelInputRange and other ranges (extra complexity) or forget about one of the cases (limiting either its speed or its generality).

We will discover that in fact, we can all of it for free if we use LLVM.

Let’s summon LLVM to help us!

LLVM is a compiler infrastructure. It defines an intermediate language called LLVM IR, that allow the components of LLVM to talk to each other. Typical components are front-ends that transform programming language such as D or C++ into LLVM IR, the optimizer that consume LLVM IR and produce another LLVM IR that is equivalent in functionality but faster and finally back-ends that transform LLVM IR into assembly.

Let’s compile the example into LLVM IR, using the range lexer and the CString struct. Here is a plausible output (I used SDC but then edited it for more readability) :

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

%CString = type { i8 * }



@.cstr = [ 2 x i8 ] c "i\00"

@.cstr1 = [ 3 x i8 ] c "if\00"

@.cstr2 = [ 3 x i8 ] c "is\00"



define void @lex ( %CString %arg.r ) {

body :

%r = alloca %CString

store %CString %arg.r , %CString * %r

%0 = call i1 @empty ( %CString * %r )

br i1 %0 , label %then , label %else



then: ; preds = %body

call void @lexError ( )

ret void



else: ; preds = %body

%1 = call i8 @front ( %CString * %r )

%cond = %1 , 105

br i1 %cond , label %.case , label % default



.case: ; preds = %else

call void @popFront ( %CString * %r )

%2 = call i1 @empty ( %CString * %r )

br i1 %2 , label %then1 , label %else2



then1: ; preds = %.case

%3 = load %CString * %r

call void @lexIdentifier ( %CString %3 , { i64 , i8 * } { i64 1 , i8 * getelementptr inbounds ( [ 2 x i8 ] * @.cstr , i32 0 , i32 0 ) } )

ret void



else2: ; preds = %.case

%4 = call i8 @front ( %CString * %r )

switch i8 %4 , label %default3 [

i8 102 , label %.case4

i8 115 , label %.case5

]



.case4: ; preds = %else2

call void @popFront ( %CString * %r )

%5 = load %CString * %r

call void @lexKeyword ( %CString %5 , { i64 , i8 * } { i64 2 , i8 * getelementptr inbounds ( [ 3 x i8 ] * @.cstr1 , i32 0 , i32 0 ) } )

ret void



.case5: ; preds = %else2

call void @popFront ( %CString * %r )

%6 = load %CString * %r

call void @lexKeyword ( %CString %6 , { i64 , i8 * } { i64 2 , i8 * getelementptr inbounds ( [ 3 x i8 ] * @.cstr2 , i32 0 , i32 0 ) } )

ret void



default3: ; preds = %else2

%7 = load %CString * %r

call void @lexIdentifier ( %CString %7 , { i64 , i8 * } { i64 1 , i8 * getelementptr inbounds ( [ 2 x i8 ] * @.cstr , i32 0 , i32 0 ) } )

ret void



default : ; preds = %else

call void @lexSomething ( %CString %arg.r )

ret void

}



define i1 @empty ( %CString * nocapture %this ) nounwind readonly {

body :

%0 = getelementptr inbounds %CString * %this , i32 0 , i32 0

%1 = load i8 ** %0

%2 = load i8 * %1

%3 = %2 , 0

ret i1 %3

}



define i8 @front ( %CString * nocapture %this ) nounwind readonly {

body :

%0 = getelementptr inbounds %CString * %this , i32 0 , i32 0

%1 = load i8 ** %0

%2 = load i8 * %1

ret i8 %2

}



define void @popFront ( %CString * nocapture %this ) nounwind {

body :

%0 = getelementptr inbounds %CString * %this , i32 0 , i32 0

%1 = load i8 ** %0

%2 = getelementptr inbounds i8 * %1 , i64 1

store i8 * %2 , i8 ** %0

ret void

}



declare void @lexError ( )

declare void @lexIdentifier ( %CString , { i64 , i8 * } )

declare void @lexKeyword ( %CString , { i64 , i8 * } )

declare void @lexSomething ( %CString ) private unnamed_addr constant private unnamed_addr constant private unnamed_addr constant body icmp eq i8 body icmp eq i8 bodybody

The first interesting step we are concerned about is inlining. LLVM will put the calls to front, popFront and empty directly into lex as they are very simple. See what lex becomes after that operation:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

define void @lex ( %CString %arg.r ) {

body :

%r = alloca %CString

store %CString %arg.r , %CString * %r

%0 = getelementptr inbounds %CString * %r , i32 0 , i32 0

%1 = load i8 ** %0

%2 = load i8 * %1

%3 = %2 , 0

br i1 %3 , label %then , label %else



then: ; preds = %body

call void @lexError ( )

ret void



else: ; preds = %body

%4 = getelementptr inbounds %CString * %r , i32 0 , i32 0

%5 = load i8 ** %4

%6 = load i8 * %5

%cond = %6 , 105

br i1 %cond , label %.case , label % default



.case: ; preds = %else

%7 = getelementptr inbounds %CString * %r , i32 0 , i32 0

%8 = load i8 ** %7

%9 = getelementptr inbounds i8 * %8 , i64 1

store i8 * %9 , i8 ** %7

%10 = getelementptr inbounds %CString * %r , i32 0 , i32 0

%11 = load i8 ** %10

%12 = load i8 * %11

%13 = %12 , 0

br i1 %13 , label %then1 , label %else2



then1: ; preds = %.case

%14 = load %CString * %r

call void @lexIdentifier ( %CString %14 , { i64 , i8 * } { i64 1 , i8 * getelementptr inbounds ( [ 2 x i8 ] * @.cstr , i32 0 , i32 0 ) } )

ret void



else2: ; preds = %.case

%15 = getelementptr inbounds %CString * %r , i32 0 , i32 0

%16 = load i8 ** %15

%17 = load i8 * %16

switch i8 %17 , label %default3 [

i8 102 , label %.case4

i8 115 , label %.case5

]



.case4: ; preds = %else2

%18 = getelementptr inbounds %CString * %r , i32 0 , i32 0

%19 = load i8 ** %18

%20 = getelementptr inbounds i8 * %19 , i64 1

store i8 * %20 , i8 ** %18

%21 = load %CString * %r

call void @lexKeyword ( %CString %21 , { i64 , i8 * } { i64 2 , i8 * getelementptr inbounds ( [ 3 x i8 ] * @.cstr1 , i32 0 , i32 0 ) } )

ret void



.case5: ; preds = %else2

%22 = getelementptr inbounds %CString * %r , i32 0 , i32 0

%23 = load i8 ** %22

%24 = getelementptr inbounds i8 * %23 , i64 1

store i8 * %24 , i8 ** %22

%25 = load %CString * %r

call void @lexKeyword ( %CString %25 , { i64 , i8 * } { i64 2 , i8 * getelementptr inbounds ( [ 3 x i8 ] * @.cstr2 , i32 0 , i32 0 ) } )

ret void



default3: ; preds = %else2

%26 = load %CString * %r

call void @lexIdentifier ( %CString %26 , { i64 , i8 * } { i64 1 , i8 * getelementptr inbounds ( [ 2 x i8 ] * @.cstr , i32 0 , i32 0 ) } )

ret void



default : ; preds = %else

call void @lexSomething ( %CString %arg.r )

ret void

} body icmp eq i8 icmp eq i8 icmp eq i8

This is fairly simple, calls have been removed and replaced with the function instructions themselves. The second thing that LLVM will do for us is to simplify the code by removing redundant loads, propagating constants, and so on:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

define void @lex ( %CString %arg.r ) {

body :

%r = alloca %CString

store %CString %arg.r , %CString * %r

%0 = getelementptr inbounds %CString * %r , i32 0 , i32 0

%1 = load i8 ** %0

%2 = load i8 * %1

%3 = %2 , 0

br i1 %3 , label %then , label %else



then: ; preds = %body

call void @lexError ( )

ret void



else: ; preds = %body

%cond = %2 , 105

br i1 %cond , label %.case , label % default



.case: ; preds = %else

%4 = getelementptr inbounds i8 * %1 , i64 1

store i8 * %4 , i8 ** %0

%5 = load i8 * %4

%6 = %5 , 0

br i1 %6 , label %then1 , label %else2



then1: ; preds = %.case

%7 = load %CString * %r

call void @lexIdentifier ( %CString %7 , { i64 , i8 * } { i64 1 , i8 * getelementptr inbounds ( [ 2 x i8 ] * @.cstr , i32 0 , i32 0 ) } )

ret void



else2: ; preds = %.case

switch i8 %5 , label %default3 [

i8 102 , label %.case4

i8 115 , label %.case5

]



.case4: ; preds = %else2

%8 = getelementptr inbounds i8 * %4 , i64 1

store i8 * %8 , i8 ** %0

%9 = load %CString * %r

call void @lexKeyword ( %CString %9 , { i64 , i8 * } { i64 2 , i8 * getelementptr inbounds ( [ 3 x i8 ] * @.cstr1 , i32 0 , i32 0 ) } )

ret void



.case5: ; preds = %else2

%10 = getelementptr inbounds i8 * %4 , i64 1

store i8 * %10 , i8 ** %0

%11 = load %CString * %r

call void @lexKeyword ( %CString %11 , { i64 , i8 * } { i64 2 , i8 * getelementptr inbounds ( [ 3 x i8 ] * @.cstr2 , i32 0 , i32 0 ) } )

ret void



default3: ; preds = %else2

%12 = load %CString * %r

call void @lexIdentifier ( %CString %12 , { i64 , i8 * } { i64 1 , i8 * getelementptr inbounds ( [ 2 x i8 ] * @.cstr , i32 0 , i32 0 ) } )

ret void



default : ; preds = %else

call void @lexSomething ( %CString %arg.r )

ret void

} body icmp eq i8 icmp eq i8 icmp eq i8

Now comes the interesting part: LLVM will analyse and simplify the control flow graph. This pass will effectively do the sentinel optimization for us. Remember, calls to CString methods have been inlined, so all the required information is in the lex function now. The compiler can see that we branch on empty and then switch on front, but now it does know that both are in fact branching based on the same value and merge everything!

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

define void @lex ( %CString %arg.r ) {

body :

%r = alloca %CString

store %CString %arg.r , %CString * %r

%0 = getelementptr inbounds %CString * %r , i32 0 , i32 0

%1 = load i8 ** %0

%2 = load i8 * %1

switch i8 %2 , label % default [

i8 0 , label %then

i8 105 , label %.case

]



then: ; preds = %body

call void @lexError ( )

ret void



.case: ; preds = %body

%3 = getelementptr inbounds i8 * %1 , i64 1

store i8 * %3 , i8 ** %0

%4 = load i8 * %3

switch i8 %4 , label %default3 [

i8 0 , label %then1

i8 102 , label %.case4

i8 115 , label %.case5

]



then1: ; preds = %.case

%5 = load %CString * %r

call void @lexIdentifier ( %CString %5 , { i64 , i8 * } { i64 1 , i8 * getelementptr inbounds ( [ 2 x i8 ] * @.cstr , i32 0 , i32 0 ) } )

ret void



.case4: ; preds = %.case

%6 = getelementptr inbounds i8 * %3 , i64 1

store i8 * %6 , i8 ** %0

%7 = load %CString * %r

call void @lexKeyword ( %CString %7 , { i64 , i8 * } { i64 2 , i8 * getelementptr inbounds ( [ 3 x i8 ] * @.cstr1 , i32 0 , i32 0 ) } )

ret void



.case5: ; preds = %.case

%8 = getelementptr inbounds i8 * %3 , i64 1

store i8 * %8 , i8 ** %0

%9 = load %CString * %r

call void @lexKeyword ( %CString %9 , { i64 , i8 * } { i64 2 , i8 * getelementptr inbounds ( [ 3 x i8 ] * @.cstr2 , i32 0 , i32 0 ) } )

ret void



default3: ; preds = %.case

%10 = load %CString * %r

call void @lexIdentifier ( %CString %10 , { i64 , i8 * } { i64 1 , i8 * getelementptr inbounds ( [ 2 x i8 ] * @.cstr , i32 0 , i32 0 ) } )

ret void



default : ; preds = %body

call void @lexSomething ( %CString %arg.r )

ret void

} body

At this point, LLVM continues its optimizations, but I will stop dumping them; the point is made.

Conclusion

LLVM is able to provide the benefits of the SentinelInputRange to a function that is written for a generic range, rendering the whole concept of SentinelInputRange obsolete. While many compilers are unable to perform such optimizations, it is certainly not worth increasing the standard library complexity to work around their deficiencies.

Update: I want to thank John Colvin for his contribution to the article and the English language in general.