Consider the following piece of program code.

void process_names() { const char * fileName = "contents.txt"; const char * foundName = 0; // will be assigned to later Names names = parse_file(foundName); foundName = find(names); // ... }

This program is expected to open file contents.txt , read names from it; then find a desired name, and do something with it.

But the program will not do it, because I have a bug in this code. Accidentally, I have passed the wrong pointer to function parse_file : a null pointer. I confused them because names were so similar (I didn’t notice that my IDE suggested the wrong name in the hint); the types happened to match…

Now, would you like that this sort of bugs were detected at compile time? The correct answer is “yes”, but is it at all possible?

In general, compiler (or any static analyzer) cannot just warn you that you might have wanted to pass a different variable to a function than you did:

void fill (int h, int w) { r.height(w); // did you want to pass h? r.width(h); // did you want to pass w? // ... }

In 99.99% of all function calls you pass the correct variables to functions. If compiler started sending warnings in all these cases, there would not be sufficient memory in the buffers to display them; and the usefulness of such information would be zero.

Therefore, to be sure that it gives you no (or, almost no) false positives, compiler has to see something really fishy in the program. Something that almost everyone would agree that it must be a bug.

One example of such clear situation is where we define an automatic variable in function scope but never read its value. There must be something wrong with it because why define a variable if you do not need it? You probably meant to use it but accidentally used something else instead, or forgot to use it at all.

Indeed, going back to our function process_names , if we intended to read from variable fileName only once, and forgot, now it is never read from, and a decent compiler like GCC or Clang (with option -Wall ) will report a warning. If you treat compiler warnings seriously, you will be alerted by the compiler that something is wrong with the usage (or lack thereof) of variable fileName , and will have a chance to identify and fix the bug.

Note that in some sense in this scenario, having found our bug is a side effect of looking for declared but unread variables. That is, the core of our bug is not that we never read from fileName : the real cause is that we misspelled the variable name. To understand this better, let’s look at a slightly modified example, where we make a second use of variable fileName :

void process_names() { const char * fileName = "contents.txt"; const char * foundName = 0; // will be assigned to later Names names = parse_file(foundName); // bug foundName = find(names); // ... std::clog << "done processing " << fileName << std::endl; }

Now, the check for unread variables cannot detect anything wrong with our function; it will not be able to help us. But the bug is still there.

However, unread variables is only one of many checks that a compiler can do. If we are lucky, there may be a different warning that checks for something else, which can similarly — as a side effect — detect our bug.

A good candidate for this would be a check for invoking memory access violation. For sure, it is (almost) never the programmer’s intention to invoke it. So, if there is one in the program, we can be quite sure that the programmer intended to write something else, and we can safely warn him. It so happens that a null pointer is being passed to a function that apparently expects a file name. Thus, invalid memory access might be in the air here, but we (the readers) do not know that, because we cannot see the declaration of function parse_file . In particular, we cannot see the type of the function’s argument.

Invoking memory access violation

For now, let’s use my own implementation of a string:

struct MyString { char str [64]; MyString(const char * s) : str() // zero-initialize the array { for (int i = 0 ; *s != 0 ; ++s) str[i++] = *s; } };

For the record: I do not recommend creating strings classes like that. It has a lot of problems. In particular, it makes a silent assumptions that the passed null-terminated character sequences will be really short. My only goal is to illustrate my point: we have a string implementation that assumes it will never be given a null pointer, and based on this assumption, it just dereferences the pointer without any defensive checks.

At least in theory, we could expect that a smart compiler could detect that a program can potentially invoke an illegal memory access. So, here is a full program. We can test it with GCC ( -Wall -Wextra ). For the results see here. There is no warning, and when the program is run, we get a “Segmentation fault”. The same with Clang (see here): no warning.

One may now wonder, should it not be quite easy these days to detect a memory access violation confined to a single file, by means of only inspecting the code? Well, conceptually these checks are doable, but somewhat outside the scope of the compiler. A compiler has to generate an efficient code, but it also has to generate code efficiently. Compilers do some extra work to emit some warnings, but the checks they do are limited to things that fit into the normal compiler’s flow. So, let’s try a dedicated static analyzer: clang-tidy . (I am using llvm version 3.4.) These are the results:

$ clang-tidy test.cpp -- test.cpp:26:3: warning: Single-argument constructors must be explicit MyString(const char * s) ^ test.cpp:29:22: warning: Dereference of null pointer (loaded from variable 's') for (int i = 0 ; *s != 0 ; ++s) ^ test.cpp:63:3: note: Calling 'process_names' process_names(); ^ test.cpp:53:3: note: 'foundName' initialized to a null pointer value const char * foundName = 0; ^ test.cpp:56:28: note: Passing null pointer value via 1st parameter 's' Names names = parse_file(foundName) ; ^ test.cpp:56:28: note: Calling constructor for 'MyString' Names names = parse_file(foundName); ^ test.cpp:29:22: note: Dereference of null pointer (loaded from variable 's') for (int i = 0 ; *s != 0 ; ++s)

It finds more than I wanted (the implicit conversion is there on purpose), but it finds our bug. Or, to be more precise, it finds a memory access violation, but while following the message chain, we will reach to the place where we misspelled the variable name.

A static analyzer can find our bug, owing to the fact that our implementation of a string-like type — MyString — can potentially dereference a null pointer. This is not a bug in the implementation of MyString ’s constructor. But invoking the constructor with a null-pointer value is a bug.

Defensive checks

Now, instead of using a hand crafted MyString let’s try our buggy function with std::string . In the test program we only need to change one alias (and include the corresponding header):

typedef std::string String;

When I run clang-tidy on the test program using std::string , I get a number of warnings about the contents of header <string>, but the interesting warning about memory access violation is not there!

What if I compile the program with GCC? No warning or error. What if I run it? I get the following results:

terminate called after throwing an instance of 'std::logic_error' what(): basic_string::_M_construct null not valid Aborted

This means that std::string ’s constructor threw an exception. Indeed, if we look at the implementation of the constructor in cstdlib++ (see here), it calls an auxiliary function, _S_construct (or _M_construct in the trunk), and this function performs a defensive check for null pointer, and throws an exception.

Is the Standard Library implementation allowed to do that?

According to the C++ Standard, initializing an std::string with a null pointer is an undefined behavior. It is not specified directly, but is implied by the specification:



basic_string(const charT* s, const Allocator& a = Allocator()); Requires: s points to an array of at least traits::length(s) + 1 elements of charT .



“Undefined behavior” means that the implementation is allowed to do anything. In particular, it is allowed to throw an exception. So, the authors of std::basic_string in libstdc++ decided to throw an exception. This is for a good reason. This addresses (in some sense) the situation when the user forgets about the constructor’s requirement, and accidentally or ignorantly passes a null pointer nonetheless. In this case an exception is thrown and the memory access violation prevented.

But this defensive check has a negative side effect. It prevents the static analyzer from finding the bug. Now, there is no potential memory access violation that the analyzer can see. Instead, the bug detection is postponed till run-time.

Interestingly, if there were no defensive checks (as in the original test example), and we get a segmentation fault, it is also due to a run-time check performed by the operating system. So the only added value of an exception throw is that a program can intercept the exception and respond at run-time.

In case the bug is found during the static analysis, or at compile-time, the natural response is for the programmer to go in and fix the bug. But in case the bug is found in run-time there is no right response. Remember that memory access violation is only a symptom of the bug: not the bug itself. The bug is somewhere else, perhaps in a remote place of the program. Owing to the static analyzer output, the programmer can trace back from the symptom to the root cause. At run-time you only detect a symptom and nothing more. Even if you catch the exception at some point, the bug may be still outside the try-catch block, and the program may still be suffering from other symptoms, not detectable by defensive if-statements.

In fact, a program halt and a core dump might turn out to be a better solution, because the dump might contain sufficient information for the programmer to pin down the bug.

A comment inside the std::string ’s constructor in cstdlib++, just above the defensive check, says:

// NB: Not required, but considered best practice.

I have doubts about such “best practice.” In a particular case, this is the thing you might need to resort to, but it comes with consequences. It is controversial.

One other interesting thing to be observed here, is that there are different “kinds” of undefined behavior. An UB at the C++ Standard level (like the one with the std::string ’s interface) is not necessarily a UB at compiler level. The compiler or the Standard Library implementer can legally chose to make a Standard-level UB a perfectly well defined behavior on a given platform. In such case, even an UB-based sanitizer cannot detect it. For instance, if I test the std::string version of my test program with GCC’s UB sanitizer (option -fsanitize=undefined ), (or Clang’s UB sanitizer, provided that I use libstdc++), I get no UB notifications. (See here for GCC test.)

Concealing bugs

But a defensive check and an exception throw, while not an ideal, are still a way of signalling that something has gone wrong. There exists an another way of approaching the problem of potential memory access violation, which people sometimes chose, but which makes bug detection even harder. It can be summarized as: “if a given input would be dangerous, behave as though we were given another, safer, input.”

At some point this has been considered as an addition to the to-be-standard string_view . The idea is the following. Because this works:

int n = 0; std::string_view sv = {"file.txt", n}; // only first n chars assert (sv.size() == 0);

And this works:

std::string_view sv = {nullptr, 0}; // only first n chars assert (sv.size() == 0);

Make also the following work:

std::string_view sv = nullptr;

With the semantics identical to the former example.

The idea is noble: rather than crashing, or disturb the callers with annoying exceptions, try to do anything. Quietly. The effect is, however, that if I change my example to use std::string_view with this “improvement”, I get neither a static analyzer warning, nor any notification at run-time. (See here.)

The function behaves as though it were correct, it is doing something. But it cannot be doing the right thing, because we know it has a bug. But the function now keeps it secret to itself and never notifies anyone.

Assisting the analyzer

In order to assist a static analyzer, or an UB sanitizer, you might want to explicitly put some indications about illegal state of your program inside the code. One of tools for that in GCC and Clang is __builtin_unreachable() , which says more-less, “I claim that the control flow will never reach this statement.” Now, the tools can make use of your claim:

They can check whether you are correct (this is a no-trust mode), and/or They can use your claim for optimizations (this is a trust mode).

In order to give the programmer even more tools for giving hints to machines, the C++ Standard Committee is working on adding a new feature to the language known as contracts. This allows you to give a clear hint to static analyzers, compilers and other programmers, that certain values of objects are considered erroneous at certain points in program execution, e.g., at the point of passing arguments to functions. Class MyString could use a contract in the following way:

struct MyString { char str [64]; MyString(const char * s) [[expects: s != nullptr]]; };

This expects inside the attribute means that we are defining a precondition. This means that this is a bug to call this constructor with a null pointer. A static analyzer can make use of this information. The precondition is tied to function declaration. This means that static analyzer does not even have to peek inside the function body (it might be contained in a different file) in order to detect its invalid usage.

Contracts allow you to take the best of both worlds: you can at the same time instruct the static analyzer about potential invalid usages of the function, and request the compiler to insert a run-time defensive check. This latter possibility is totally optional, and if you do not want even the slightest run-time overhead, you can tell the compiler not to inject any defensive checks. But contracts belong to the future.