Language designers, compiler and library vendors make a great effort to make your programs run faster and faster. This post is a tour of some common performance optimizations in C++.

Consider the following code that deals with std::string :

string makeText() { string s{"Hello World!"}; // C++11 init syntax return s; } int main() { string t = makeText(); cout << t << endl; }

std::string is a specialization of template std::basic_string which takes an allocator as one of template arguments. This allocator is used — unsurprisingly — for allocating memory. Memory allocation is a fairly slow operation that is best avoided. The question is: how many times function allocate (on the allocator) will be invoked for the purpose of memory allocation during this program’s execution? 3, 2, 1, 0?

When I started learning C++ my answer would be “3”: one to initially allocate memory for the string s , next two in order to return by value: first, copy-initialize a temporary from s ; next, copy-initialize t from the temporary.

Then I learned about named return value optimization and copy elision, and realized that when you return by value, a clever compiler can avoid creating three objects and instead use one: to assign the initial value in makeText and read it in main . So, from then on my answer would be “3 to 1” depending on how clever the compiler is; most of them are clever enough already.

C++11 introduced move semantics, and from now on compilers are required not to copy upon returning by value if our type provides a move constructor. std::string does come with a move constructor, which does not allocate memory, but intercepts the memory already allocated by the moved-from object. So, from then on my answer is “exactly 1”.

But note one thing: move construction is not a no-op. For std::string it will require putting the moved-from object to a state where it knows it need not, and in fact must not, release its resources. This is a certain (albeit small) cost compared with a no-op copy elision. Luckily, the requirement to use move construction instead of copy construction upon returning by value does not prohibit or disable the ability for a compiler to still apply copy elision. Thus, in our example, most probably there will be no move constructions.

But that’s not all. std::string is not required to allocate heap memory (or any other kind of memory its allocator is aware of). It is only required to be able to store anyhow a string of characters of arbitrary length. True, in general case it implies allocating memory at some point; but our case is not a general case: it is our case. Our text "Hello World!" is fairly short (compared, say, to the contents of a 4GB xml file). It is 12 characters; 13 if we count the terminating null character. If on a 64-bit machine the size of a pointer is that of 8 characters, our text is less than the size of two pointers. The typical efficient implementation of a std::string requires three pointers. The text is small enough to fit into the stack-allocated part. This is a perfect candidate for small buffer optimization.

Small buffer optimization

Below, I enclosed a short example of what an SBO would look like. Please do not treat it as a production-strength implementation.

A possible implementation of an STL-like string holds a pointer to the begin of the sequence, the size of the string and the size of the allocated buffer (the capacity):

class string { char* _begin; size_t _size; size_t _capacity; // ... };

The pointer _begin refers in general case to the heap allocated memory. The three members form what we call a handle. This is the part of the object’s state allocated on the stack. Expression sizeof(string) evaluates to 24 (or something different, depending on the platform, alignment, etc.) which indicates the size of the handle. When an object is moved its handle is copied.

Now we want to say that the part of the handle occupied by address _begin should be reused as a buffer for holding a small string. We can implement it as a union:

class string { union Buffer { char* _begin; char[16] _local; }; Buffer _buffer; size_t _size; size_t _capacity; // ... };

I have ignored the issues with alignment in order to make the example simple. Now the size of our class string grew a bit (on my machine, to the equivalent of 32 characters). This does not visibly affect the run-time performance of copy operations. It is not the sizeof of the type that is the crucial factor of copying performance, but the time it gets to allocate all the required resources for the second time.

But how do we know if our union is currently used as a pointer or as a small buffer? We can use member _capacity . We can pick some special value, likely 0 , that can never be used as the proper size of the allocated buffer, and have it indicate that a small buffer is used. Now, accessing the value is slower, because we have to evaluate an if -statement checking where the contents are stored, but what we get instead is that in a number of cases we avoid using memory allocation altogether.

Such an optimization is implemented in my version of STD on Visual C++. It is also available in Boost.Container’s implementation of string , without any sizeof overhead.

Going back to the original question in this post, in my environment the number of allocations is 0. (It may be 1 on yours, though.) Is that not incredible?