Overview

The C++11 standard has added a new operator that can be overloaded. Literals in programming languages are hard-coded constants in programs. For example, writing 1.2L , "Hello World!" , 4096 , etc. are all literals (i.e., the first is a long double value, the second is a const char[13] value, and the third is an int ). The C++11 standard allows one to define custom literal types that can be transformed at compile-time or run-time into appropriate values. This post explores such using g++ v4.7 snapshot 20111029.

Update (Dec. 16, 2011): A small update was made to the general case definition of binary_literal_impl to provide more user-friendly compiler error messages if incorrect digits are used.

The Goal

I don't know about you, but I've always wanted to be able to write binary numbers into my program code and to have it store such optimally. Using C++11, we'll be able to do exactly this using code like this:

int main ( )

{

using namespace std ;



const unsigned long long bits =

11011110101011011011111011101111 _binary ;



cout << "The number is: " << hex << bits << endl ;

}

provided we write a literal operator function whose suffix is _binary .

I believe I read somewhere that literal names not starting with an underscore are reserved by the C++ standard. If you know definitively or otherwise, kindly let me know. 🙂 Update: In a reddit post's comment, zootsuit notes, "From the N3291 draft/17.6.4.3.5: Literal suffix identifiers that do not start with an underscore are reserved for future standardization." –which is likely where I read it: in one of the draft standards.

I am going to introduce an added twist: the conversion must be done at compile-time. Why? Efficiency! For any binary number hard-coded into the program, it must be encoded as a single integer value in the executable (to ensure minimum space usage and maximum efficiency) converted properly at compile time with an error if it is not proper. Certainly, no programmer wants the binary number to be stored in the executable as a string that is converted at run-time into an integer! Ack! The latter is both a waste of space and time.

Let's Do A Simple Example First

Before messing around with template metaprogramming (which is probably bewildering until you know how to read/understand it), let's write a literal whose suffix is _square that computes the square of a long double number it is associated with and returns the result:

#include <iostream>



// Insert literal operator _square definition here. (See below.)



int main ( )

{

using namespace std ;



const long double num = 25.5 _square ;



cout << num << endl ;

}

which would output 650.25 . To do this, the following function needs to be written:

constexpr long double operator "" _square ( long double num )

{

return num * num ;

}

The literal operator's name is operator"" and its suffix is given after it (i.e., _square ). The return type can be anything but it is set to long double because that is the computed value's type here. The constexpr keyword implies and requires that the compiler must be able to compute the result as a compile-time constant. If this is not possible, then it will fail to compile. In general, if a literal operator overload is not written as a template function and does not use constexpr , the compiler will invoke the literal operator at run-time.

That's it! Simply compile the above code, it will store 650.25 in the executable binary as a hard-coded long double value!

Literal Operator Function Parameters

Be aware that the literal operator only allows a fixed set of function arguments:

const char *

unsigned long long int

long double

char

wchar_t

char16_t

char32_t

const char * , std :: size_t

const wchar_t * , std :: size_t

const char16_t * , std :: size_t

const char32_t * , std :: size_t

or if there are no arguments at all, then the literal operator must be defined as a template function whose template arguments are a char template parameter pack, i.e.,

template < char ... CS >

some_return_type operator "" _some_suffix_name ( ) ;

Also notice that all of the function argument types, except for the character types, are the largest-range types of their kind (i.e., unsigned long long is the largest-range integer type, long double is the largest-range floating-point type) as the compiler can easily cast any value to a smaller type at compile-time. Since there are numerous character and string literal types (including the new Unicode and raw literals in C++11) the remaining parameters listed handle these special types of literals.

Implementing The _binary Literal Operator

Recall the earlier code that permits one to write a binary number in the C++:

#include <iostream>



// Insert definition of _binary literal and associated code here.



int main ( )

{

using namespace std ;



const unsigned long long bits =

11011110101011011011111011101111 _binary ;

cout << "The number is: " << hex << bits << endl ;

}

i.e., the number is 0xDEADBEEF , which is what the program will output.

To ensure that the conversion occurs at compile-time and to be able to easily implement it (as it is a non-trivial function), the implementation of the _binary literal will use a class template with partial template specialization. To understand this better, let's first start by defining the _binary literal:

template < char ... Digits >

constexpr unsigned long long operator "" _binary ( )

{

return binary_literal_impl < Digits... > :: to_ulonglong ( ) ;

}

Notice that the _binary literal operation has no arguments. This is because the char values in the string before _binary are being passed as a char template parameter pack.

Template parameter packs represent a sequence of template arguments. They are not types and to extract them they must be expanded with the template parameter pack expansion operator, ... .

Within the _binary literal operator definition, the char template parameter pack needs to be expanded and processed into an unsigned long long . To accomplish this, the work will be delegated to a static function inside the class template binary_literal_impl as this will allow writing clean, recursively defined code that processes the char template parameter pack which (should be only!) composed of '0' and '1' characters.

Implementing The binary_literal_impl Class Template

The binary_literal_impl class template allows code to be written that recognizes the following properties about its template arguments:

when a '0' appears first, possibly followed by more characters,

appears first, possibly followed by more characters, when a '1' appears first, possibly followed by more characters, and,

appears first, possibly followed by more characters, and, when there are no characters.

Additionally, if any other (invalid) char values occur, then a compile-time error will be generated (as there will be no definition that exists for binary_literal_impl for such arguments. To accomplish this, partial template specialization is needed so the general case needs to be (forward) declared and not defined (as we want errors if there are no matches!) first:

template < char ... Digits >

struct binary_literal_impl ;

This is needed first so the compiler knows template arguments for the binary_literal_impl class template must be a parameter pack of char values. It is very important that there are no braces used here: this avoids defining what is associated with binary_literal_impl if there are no partial matches with the code that is written below. (If the compiler cannot find a matching definition, a compile-time error will occur.)

If you are used to functional programming in Miranda or Haskell, C++ requires the reverse order of what would be done in those languages when using partial specialization: the general case is written first, then the specialized cases follow.

Even better one can write the above general case of binary_literal_impl to use static_assert to trigger a very nice compiler error messages when an incorrect digit is used. (If you are new to this style of programming, I encourage you to write also try out the above definition to see the differences in compiler output.)

// Alternative user-friendly general case

// (i.e., any digits other than '0' or '1')...

template < char ... Digits >

struct binary_literal_impl

{

static constexpr unsigned long long to_ulonglong ( )

{

static_assert ( false , "Digit characters must either be '0' or '1'." ) ;

return 0 ;

}

} ;

If the first char (template argument) value is '0' , then there is no one bit to shift and the result is simply to return the integer value computed on the rest of the characters in the char parameter pack:

// If the next digit is zero, then compute the rest...

template < char ... Digits >

struct binary_literal_impl < '0' , Digits... >

{

static constexpr unsigned long long to_ulonglong ( )

{

return binary_literal_impl < Digits... > :: to_ulonglong ( ) ;

}

} ;

Notice that the template argument, however, is now one shorter than what it was. If this is not obvious, then know when one is using partial template specialization, what is inside < and > after the class template name is what is being matched. Thus, since '0', Digits... appears, that is what is being matched: a '0' character followed by a parameter pack called Digits with Digits defined to be char... in template <char... Digits> . Thus, Digits... represents the expansion of all char template argument values after the first one! 🙂

If the first char (template argument) value is '1' , then there is a one bit to shift left which must be bitwise-OR'd with the result computed on the remaining arguments. Since the binary digits are being processed from left to right, the one bit should be shifted left by the number of digits that remain to be processed. The C++11 sizeof... operator allows one to know the size of a parameter pack at compile time, so the definition of this case becomes:

// If the next digit is one, then shift 1 and compute the rest...

template < char ... Digits >

struct binary_literal_impl < '1' , Digits... >

{

static constexpr unsigned long long to_ulonglong ( )

{

return ( 1ULL << sizeof ... ( Digits ) )

| binary_literal_impl < Digits... > :: to_ulonglong ( ) ;

}

} ;

Again notice that the number of characters remaining to process becomes one shorter when recursively calling to_ulonglong() .

Finally, at some point there will be no digits left to process in the recursively defined code above. When this occurs, the computed answer should be zero:

// Base case: No digits, so return 0...

template <>

struct binary_literal_impl <>

{

static constexpr unsigned long long to_ulonglong ( )

{

return 0 ;

}

} ;

i.e., notice binary_literal_impl<> has no contained values.

That's it!

Importantly, since we did not write any code to handle characters other than '0' or '1' using any other values (e.g., try putting a 2 or a in the number) will cause compilation to fail. This is a good thing because a binary number should only contain '0' s and '1' s!

Closing Comments

As with many template metaprogramming techniques in C++, no matter how complicated the metaprogramming code is, the use of it is often very straight-forward. With C++11 supporting literals, code will be easier to read and write since more meaningful values can now appear as literals in code instead of equivalent hard-to-understand, machine-specific, hard-coded character or integer arrays. Nicely, any literal definitions/prototypes can be hidden away in header files: the end user does not need to know the details. Why? The end user only needs to know how to use the literal operator, i.e., what is written in the documentation about it! In this instance understanding how to use the _binary literal is easy: it must be preceded by a valid binary number –one doesn't need to see its definition at all to be able to use it: he/she only needs to see its documentation. 🙂

For your convenience, this is the entire program presented above: