Let’s start with the problem. I want to check whether a program received a text message that consists of four consecutive zeroes. Not '0' , but the numeric zero. I will create a constant std::string representing the special sequence and compare the messages (also stored as std::string s) I receive. I cannot just type:

const std::string Special = "\0\0\0\0";

As this would be interpreted, according to C rules, as a zero-length string. But there is another constructor exactly for our case, where I can say “character C repeated N times”. So, I type:

const std::string Special ('\0', size_t(4));

I could have used braces for the initialization, but since I am aware of the std::initializer_list gotchas (like the one described here), I am following my rule of thumb, “if you suspect the type might expose an initializer-list constructor, and you want to initialize through a different constructor, do not use braces”.

Now I can test my program. It still doesn’t work correctly for some reasons. It does not recognize the special messages as special, even though I would expect it to. I test and debug it for quite some time, until finally do I realize that my string Special has a different value than I think.

The constructor that I intended, “character C repeated N times,” has signature:

basic_string( size_type count, CharT ch, const Allocator& alloc = Allocator() );

I gave the arguments in the wrong order, and most likely I have inadvertently selected a different constructor. Type std::basic_string has quite a lot of constructors that differ in the combinations of types. Just see here.

Now, we have two problems here. The first is that the program has a bug, and I should provide the arguments in the inverse order. This is fixed quickly. The other — and this is the one we will focus on in this post — is that there is something in the interface of the library, or in my way of using the library or language that makes a bug like this one easy to occur. I have lost some time, which could have been avoided, and there was a risk that the program with this bug would be shipped to users. The fix here would be to identify the source of the problem and to provide some guidelines which when followed are likely to prevent similar bugs in the future.

First, let’s figure out which constructor actually got called. To make the things more interesting, it is a different constructor in different versions of C++. In C++11 it is:

basic_string( const CharT* s, size_type count, const Allocator& alloc = Allocator() );

That is, '\0' binds to type const CharT* , and size_t(4) — unsurprisingly — binds to type size_type . For this constructor, passing a null pointer where a pointer to an array is expected, is an undefined behavior.

That '\0' should be a good initializer for an object of type const CharT* is quite surprising. This is a defect in C++11. It is a consequence of the following definition (§ 4.10 ¶ 1):

A null pointer constant is an integral constant expression prvalue of integer type that evaluates to zero or a prvalue of type std::nullptr_t .

This means that any literal or temporary of integral type ( char is one) that can be evaluated at compile-time to value zero can be used to initialize a pointer to a null pointer value. For instance:

constexpr int zero() { return 0; } int * p = zero(); // ok in C++11 int * q = (2 + 2) - (2 * 2); // ok in C++11

Now, I am only telling you what the Standard says. What compilers actually do is a different story. Clang never implemented this requirement, and GCC implemented it with varying accuracy in different versions. I do not know about other compilers.

Anyway, C++14 fixes this bug. The offending definition is changed to:

A null pointer constant is an integer literal with value zero or a prvalue of type std::nullptr_t .

'\0' is not an integral literal. This narrows the creative ways of initializing a null pointer down to:

int * p1 = 0X0UL; // hex + unsigned long int * p2 = 0B00L; // binary literal int * p3 = 0'0'0; // oct + digit separators

With this change, the constructor of basic_string selected previously is no longer a match, so a different one is chosen. In fact, the compiler chooses the constructor we intended:

basic_string( size_type count, CharT ch, const Allocator& alloc = Allocator() );

But with the arguments bound inversely: '\0' binds to type count (type promotion), and size_t(4) binds to ch (type conversion). So, instead of “char '\0' repeated 4 times”, I get “char '\4' repeated 0 times”, which makes an empty string.

Now, we can see this works this way because expression size_t(4) is converted to a smaller type: char . And I told you I am not using brace initialization. Brace initialization prevents narrowing, so maybe using braces for initialization would have prevented this constructor from being selected, and I would get a compiler error. No. If I initialize my constant like this:

const std::string Special {'\0', size_t(4)};

a yet another, initializer-list constructor is selected and interprets the parameters as two characters: '\0' and '\4' : I get a two-character string. Narrowing is not prevented, because there is no narrowing here (as explained in this post).

What can we do?

The fact that I was able to put my arguments in inverse order and still get the same constructor to be selected means that the design for this constructor ignores an important guideline (I.24): “do not have two adjacent arguments to the function of the same type”. Here the types are actually different, but because of the implicit conversions, the constructor still falls into this category. This implies that the problem is better fixed in the interface of the library. We cannot change the standard library, but we can speculate on how a robust interface would look like.

When we use factories, rather than constructors directly, we can use the function name to indicate how the arguments will be interpreted:

const String Special = String::make_repeated_char(size_t(4), '\0');

This would protect us against inadvertently selecting a different constructor, but the arguments can still be passed in the inverse order. As mentioned in the other post, we can emulate named function parameters using tags:

constexpr struct with_size_t{} with_size{}; constexpr struct with_value_t{} with_value{};

These are empty classes, each with different type, and this is the distinct type that is the only relevant information in them. With tags in the library, we could initialize our string as follows:

const String Special {with_size, 4, with_value, '\0');

A number of people suggested to me that another solution would be to pass only the values but wrapped into distinct types:

const String Special {with_size{4}, with_value{'\0'});

Indeed, this solves the problem most of the time. The problem with the latter solution is that with_value looks like a normal value-semantic type with a meaningful value, and someone might at some point want to use it as a regular type:

vector<with_size> v {with_size{10}}; // value 10? // size 10?

In the case of initializing a string, this problem does not occur, but if you consider a general solution for any type, the trick with with_size{10} does not come without its own problems. If I ware to go with it, I would make type with_size non-assignable and non-comparable, so that it is unattractive for any use other than passing arguments to functions.

Another thing we can do is to use clang-tidy . It is a tool for automated code analysis. In version 7, it has a group of checks bugprone- . If we run it against our example, it will immediately detect the problem (even though it is valid C++ code), and give us a message:

warning: string constructor parameters are probably swapped; expecting string(count, character) [bugprone-string-constructor] const std::string Special ('\0', size_t(4)); ^ ~~~~~ ~~~~~~~~~~ size_t(4) '\0'

So the advice of the day is, if possible, use clang-tidy .