What I have to say today is fairly obvious, and you probably know about it. But because I have observed the following pattern in my project’s code fairly frequently, I feel it needs to be recorded. Consider this code:

bool fun (const string& code) { assert (code.length() >= 2); if (code.substr(0, 2) == string("XX")) { // ... } // ... }

Can you see what is wrong about it? Please, do not focus on the usage of assert . I only put in order for you to take for granted that the string has two or more characters.

Apparently, this condition checks if string code starts with sequence "XX" . Given the assumption that code is at least 2-character long, it appears to be doing the right thing. Provided that the only thing we are concerned with is the correct result of the expression.

Quite often, however, we choose to use C++ in hope of achieving maximum performance of our program. If this is our goal, the above code looks wrong. In order to check if code starts with "XX" we are creating two temporary string s, each of which can potentially allocate heap memory. One could argue that std::string should be able to implement a short string optimization (SSO) for 2-letter sequences, but (1) even then, there is some cost involved that cannot be easily optimized away and (2) not every implementation uses SSO. For instance, I am using GCC 4.4.7 and it does not implement SSO on string s.

The interface of class template std::basic_string is very complicated, as nicely explained in GotW #84. It offers so many member functions that it feels ungrateful not to use any of them. At the same time, because there are so many of them, one feels reluctant to re-parse them time and again.

Because the programmer vaguely remembers that operator== applied to naked null-terminated byte strings (NTBS) (which tend to be converted to const char * ) does the wrong thing, he avoids this by making sure that this is two std::string s that are compared. Also, he might be thinking, there is no harm done because literal "XX" would be implicitly converted to std::string anyway, before the operator== is invoked. But this is wrong. The Standard also provides the mixed versions of operator== :

bool operator==(const std::string& lhs, const char* rhs); bool operator==(const char* lhs, const std::string& rhs);

Of course, in reality they are function templates with many parameters, but you get what I mean. std::string can be compared to a NTBS without the necessity of creating any temporary std::string . Our example can be easily optimized by removing the explicit creation of a temporary:

if (code.substr(0, 2) == "XX") // ...

Next, while admittedly using operator== somewhere in comparison looks elegant, it is wrong to make a brand new string managing its own resource only to inspect a portion of the original. The programmer’s primary goal is not to make the program elegant. Indeed, if we dig into the documentation (e.g., here), we will find that std::basic_string offers a way to compare its sub-string against an NTBS:

if (code.compare(0, 2, "XX") == 0) // ...

The comparison is a three-way one, with 0 indicating equality. It can be performed in-place, without creating any temporary string .

To be honest, I did not know about this member function overload prior to writing this post. I am not really found of learning more than 100 members of basic_string . Also, while the above is definitely a performance improvement, I am not satisfied with it. While it does the correct thing, it may be difficult, when you encounter it the first time, to immediately grasp what it tries to do. My ideal solution, if you can afford using Boost, is to use one of the algorithms from String Algorithms Library:

#include <boost/algorithm/string/predicate.hpp> bool fun (const string& code) { // ... if (boost::algorithm::starts_with(code, "XX")) // ... // ... }

This just says what I mean, without adding any unnecessary overhead at the same time.

And that’s it for today. As a side note, in order to test memory allocations of my std::basic_string implementation, I used a custom allocator that apart from allocating, performs the necessary statistics. As I am not good at implementing custom allocators, I used this nice allocator ‘cheat sheet’ by Howard Hinnant from this location.