Today, let’s take a short test. Find what is likely to be a bug in the following code and suggest how to fix it.

void Catalogue::populate(vector<string> const& names) { vec_.clear(); vec_.resize(names.size()); for (size_t i = 0; i < names.size(); ++i) vec_[i] = make_unique<Entry>(names[i]); }

You may object to thus stated task. We only see a small portion of the code. We do not know what it is supposed to do in the first place, so how could we possibly figure out what is wrong with it. True, but this unfortunate situation is similar to what we face in real production code (at least I do): no documentation, no clue what it is supposed to be doing. Yet people must deal with such code: fix it, improve it. And sometimes there is something characteristic about the code pattern that makes you suspicious about a potential bug. I claim this is the case here (although the task isn’t easy).

On new and delete

BTW, do you know what this make_unique is? In C++14 this is how you are supposed to create a unique_ptr . It is more-less equivalent to:

vec_[i] = unique_ptr<Entry>{new Entry(names[i])};

But two things: creating an object on the free store and committing to destroy it automatically by wrapping it into a unique , are performed in one go, without the possibility of them being interleaved by other operations. This avoids the possibility of a memory leak as in the case described by Herb Sutter in GotW #102.

It offers one other useful thing: you no longer need to have a naked new in your code. This is very important. In my work, I have a quick memory leak detection criterion. If I see a delete used in the code, then we have a memory leak. Although it is theoretically possible to write a program that has a delete and does not have a leak (this is how smart pointers are defined after all), every time I have seen a developer (however skilled) in my work type a delete , there was a memory leak: even if it was wrapped in a try-catch or inside the destructor. You will not get it right! Use values instead of pointers, or type-erased values, or use smart pointers.

This could not be said about new though, because even if you used std::unique_ptr , you still had to call a naked new . This has changed in C++14. Now you can use std::make_unique and apply a ‘best practise’ “never use naked new ” (unless there is a very good exceptional reason to do otherwise).

The solution

Now, back to the solution. By using make_unique we out-ruled the possibility of having a memory leak caused by indeterminately sequenced evaluations of sub-expressions. So, what can we tell about this short code snippet? What can we tell about variable vec_ ? Form its member functions resize and operator[] we can gather that this is a sequence container. Judging from the name, most probably a vector . We cannot see its declaration inside the function; this means its life-time started outside the function: it is either a global or a class member. Because it ends with an underscore, I would guess it is a class member. Because we assign to its elements objects of type unique_ptr<Entry> , we can assume, with certain degree of probability, that vec_ is a class member of type std::vector<std::unique_ptr<Entry>> , or (and this is even more likely) std::vector<std::unique_ptr<Base>> (because otherwise we would be storing a vector of values Entry instead of pointers).

Now, what is this function’s purpose? It looks like it totally destroys the previous state of vec_ . It calls resize to avoid too many allocations and content moves. By resizing, it can destroy some elements. Even if some are retained, they will be overwritten by per-element assignments. We can say that the functions resets the value of vec_ .

Is it possible that we are accessing a non-existent vector element? No: the resize guarantees that every index in range [ 0 ; names.size() ) is valid, and the index in the loop does not go outside this range.

Is it possible that the function is exception-unsafe? This is worth examining. In the context of C++ we often talk about three kinds of exception safety guarantees:

basic — no resource is leaked on failure, objects can be destroyed w/o causing a crash,

strong — on failure the function leaves everything untouched,

no-fail — the function never fails.

By default, all functions are expected to satisfy the basic guarantee. Does our function meet this expectation? I.e., if our function ends in exception, can the object of type Catalogue we were modifying be safely destroyed? If one composes one’s class of components that manage their resources (like vector and unique_ptr ), one does not have to explicitly define a destructor: according to what R. Martinho Fernandes calls the rule of zero, the implicitly generated destructor will just do the right thing. If we start with empty vec_ and resize it, we get a range of null unique_ptr s. Then, if somewhere in the middle of the range an exception is thrown by Entry ’s constructor, it leaves some pointers with null value. If vec_ is destroyed, null unique_ptr ’s destructor will behave safely: it will do nothing. Basic exception safety is satisfied.

Does our function satisfy strong exception safety guarantee? Definitely not. Is this a bug? It depends. A strong (commit-or-roll-back) guarantee is not something that you specify in the language: it is something you have to announce informally in the form of documentation to your users. We cannot see such information in our example, so for the remainder of this article we will assume that the author did not declare that the function offers commit-or-roll-back semantics. If you somehow feel wrong about this assumption, your intuition is right. We will address this at the end. But for now, please proceed with my assumption.

Then, perhaps there is no bug in the above code snippet, after all? This is a very reasonable hypothesis. You do not suspect every single piece of code of having a bug. There are lots of pieces of code out there that are bug-free. True, I somehow implied in this article that some bug exists, but in real life, in your projects there is noöne to imply such things. And besides, I could have just tricked you.

I do see a bug, though. We have to imagine how other pieces of code will be using the state that our function populate sets. One of the most common task you do on an array ( vector is sort of an array with some bonus features) is to read or modify each element in the sequence. So, I am pretty sure that somewhere in the program we will have a function that does this:

for (auto && entry : vec_) entry->clean();

Even if you are cautious and write defensive if s everywhere (which I do not recommend), you can be sure that someone will at some point try to access the object without the check for null address. This is not a problem in itself; but it may happen that two other things occur: function Catalogue::populate throws exception, and the exception is caught sooner than the Catalogue object goes out of scope:

{ Catalogue catalogue; try { catalogue.populate(names); // throws } catch (exception & ex) { log(ex.what()); } catalogue.cleanAllEntries(); // dereferences all pointers }

Of course, real-life code will be more complicated, and what I have shown in a couple of lines could be scattered among a number of files. When this code is executed, exception thrown, then caught, our catalogue contains null pointers, and when they are dereferenced, we will enter what the Standard calls undefined behaviour, which will most likely end in a program crash.

Note that this problem would not occur if we weren’t catching the exception:

{ Catalogue catalogue; catalogue.populate(names); // throws catalogue.cleanAllEntries(); // this is skipped } // destructor can deal with null pointers

If an exception is not caught too early, the object gets destroyed and noöne can access it and cause a UB. There is one function that still has to be called for our object: the destructor. However, most likely it will be safe; as noted above, the destructor is most likely implicitly defined, and it will know that it shouldn’t delete the objects under null address. This short digression illustrates how a ‘defensive’, premature try – catch can cause a program crash. Yet, there are people that would apply this philosophy: “if it throws, I need to catch it as soon as possible; otherwise it will terminate the program.”

So, an unchecked dereference of a pointer can in certain situations cause a crash. Should I put defensive if s everywhere? Well defensive if s come with a number of serious disadvantages. First, they make the program go slower. If you choose C++, runtime-performance is probably dear to you. Second, and this is probably more important, defensive checks make the program more obfuscated, difficult to understand, cause confusion, and indirectly cause bugs. This deserves a separate post, but for now let’s focus on the problem.

If I were to write an if in my code that checks for a null address, I would try to check, if it is intended for these pointers to be null, and if such null pointer conveys any meaning. Such a check would not be defensive: it would be part of program logic. So, I would look at function Catalogue::populate and find that it tries to assign a proper (non-null) address to every vector element. So, the content of Catalogue::populate implies its postcondition (or perhaps even an invariant) that pointers would not be null. With this implied postcondition, other functions can make an implicit precondition that pointers are not null.

Given that we imply the postcondition, the proper way of handling the bug is to make function Catalogue::populate satisfy it. But apart from the logical correctness, the function guarantees that there will be only one memory allocation. Our fix shouldn’t compromise it. And it will not. We can use reserve :

void Catalogue::populate(vector<string> const& names) { vec_.clear(); vec_.reserve(names.size()); for (size_t i = 0; i < names.size(); ++i) vec_.push_back( make_unique<Entry>(names[i]) ); }

Now we only push these elements to a vector that did not throw upon creation. Due to the call to reserve , we guarantee that the calls to push_back will not throw. (If we were sure that vec_ is of type vector<Entry> we could use emplace_back instead — for performance.)

We should feel obliged to rewrite Catalogue::populate because it avoids subtle bugs and at the same time doesn’t make the function slower or bigger. But I would go even further and rewrite the function to:

void Catalogue::populate(vector<string> const& names) { decltype(vec_) tmp; tmp.reserve(names.size()); for (size_t i = 0; i < names.size(); ++i) tmp.push_back( make_unique<Entry>(names[i]) ); swap(tmp, vec_); }

This implements the “do work on the side and swap” idiom, and thereby guarantees the commit-or-roll-back semantics; and still, it only requires one allocation. It appears reasonable to upgrade the exception safety guarantee (from basic to strong) when it doesn’t cost us in term of run-time performance.

Note also that I used good old swap even though in C++11 we have move assignment, which probably makes the intent more clear. This is for a reason. In case of vector (and most other STL containers), its move constructor and move assignment (and default constructor) can throw exceptions! You might find it strange: STL containers encourage you that their elements should have noexcept move constructor/assignment, but the containers themselves do not have this feature. This is so, to allow certain optimized implementations of containers that start (default-constructed) with some preallocated memory (e.g., a sentinel node). This also affects move operations, because semantically they are equivalent to a combination of the default constructor and a swap .

In contrast, swap on vector is guaranteed not to throw exceptions, even though it is noexcept(false) . This is worth noting: it is not noexcept that gives us the no-fail guarantee, but the fact that that somebody commits to not throwing exceptions.

And if fact, we can improve the example even further, as suggested by Herb Sutter:

void Catalogue::populate(vector<string> const& names) { decltype(vec_) tmp; tmp.reserve(names.size()); for (auto const& name : names) tmp.push_back( make_unique<Entry>(name) ); swap(tmp, vec_); }

This is not just a cosmetic change. It is not that it is shorter. It is higher level. It makes the intention more clear, and avoids a bunch of potential bugs resulting from over-indexing the array, or confusing operator< with operator<= . Not that we had such a bug before, but this is how you fight with the bugs: you avoid doing things that are correct but “bug prone.”