C++03 Expression Templates

Efficient string concatenation using expression templates

Craig Henderson

Concatenating text strings is a very common task, but is generally an expensive operation because it involves dynamic (heap) memory allocation, copying, and de-allocation.

Consider an example where a series of STL string objects hold fragments of text that need to be joined together and returned from a function.

std::string get_message(void) { std::string hello("Hello"); std::string space(" "); std::string world("World"); std::string exclamation("!"); return (hello + space + world + ' ' + exclamation + space + exclamation); }

This perfectly legal implementation will generate the result that is required, but what of its efficiency? To understand the performance of this, let's look at how the return expression is evaluated. The compiler will evaluate the expression as a series of sub-expressions from left-to-right, looking at each addition in turn and matching an appropriate operator, so the expression is equivalent to

return ((((((hello + space) + world) + ' ') + exclamation) + space) + exclamation);

The STL basic_string class defines an operator+() free function that will be invoked by each of these additions.

21.3.7.1 operator+ template<class charT, class traits, class Allocator> basic_string<charT,traits,Allocator> operator+(const basic_string<charT,traits,Allocator>& lhs, const basic_string<charT,traits,Allocator>& rhs); Returns: basic_string<charT,traits,Allocator>( lhs).append( rhs)

The implementation of the function is dependent on which library you're using, but my library implements the function exactly as above, that is:

template<class charT, class traits, class Allocator> inline basic_string<charT, traits, Allocator> operator+(const basic_string<charT, traits, Allocator> &lhs, const basic_string<charT, traits, Allocator> &rhs) { return (basic_string<charT, traits, Allocator>(lhs) += rhs); }

While this is functional, it's not very efficient. First, the lhs parameter is copied to another basic_string , which involves memory allocation and a memory copy. Then the operator+=() is invoked on the new object, which ultimately calculates the length of the resulting string, allocates a buffer for it, copies the string into the new buffer and then copies the parameter (the second string) onto the end of the first. When operator+=() returns, the temporary object that now hold the result is returned to the caller by value, invoking another object copy. That's a lot of copying, and with a non-optimised build of the above test, no less that 14 memory allocations are performed.

It's not necessarily as bad as that in the real world though, as some optimisations can be, and are, made in different implementations. Over-allocating the size of the array to a block size beyond what is actually required in anticipation of another concatenation is a common technique used by library writers. Other authors use the copy-on-write (COW) technique to reference-count the actual string and only perform a memory allocation and copy is the buffer is going to be written to [1]. But regardless of what optimisations can be made, the fundamental problem still exists, and the optimisations just reduce the impact to some degree. The real problem is the creation of lots of temporary objects to hold the result of each operator+() and to be fed into the next operator+() . Let's look again at the original expression, in a construction of a new string.

std::string result (hello + space + world + ' ' + exclamation + space + exclamation);

This will produce five temporary objects that cannot be omitted through optimisations.

hello + space ? temporary basic_string1 temporary basic_string1 + world ? temporary basic_string2 temporary basic_string2 + ' ' ? temporary basic_string3 temporary basic_string3 + exclamation ? temporary basic_string4 temporary basic_string4 + space ? temporary basic_string5 temporary basic_string5 + exclamation ? std::basic_string result

Optimisation techniques should, whenever possible, be transparent to the user. The techniques such as over-allocation and COW are ones that can be encapsulated within the class implementation and therefore have no usability issues for the programmers using the library. There are some times though, where this goal is not achievable within the language constraints, but a small requirement on the programmer can yield great performance results, and this is one just occasion.

Expression Templates

I was introduced to the intricacies of Expression Templates by the excellent book C++ Templates by Vandevoorde and Josuttis [2], who used the technique to implement delayed evaluation of matrix arithmetic. The basic idea is that instead of operator+() performing the concatenation and returning the result, it records the operation and it's parameters and returns an abstraction of the operation. The entire expression comes to be represented by series of abstractions and it is not until the assignment to the result type is evaluated that the entire expression extraction is evaluation in full. At this time, the abstraction can provide more information about what the result of the entire expression is going to be, specifically the length of the resulting string, and one memory allocation can be made that will host the result.

The requirement on the programmer is a small change to the expression. The expression hello + world , where the two parameters are std::string types, will invoke the STL operator+() from 21.3.7.1 (see above). We need to force our own operator+() to be called so that we can create the abstraction and delay the evaluation, and to do that we introduce a simple wrapper class, concat_string , and re-write the expression

std::string str(hello + world);

to read

std::string str(concat_string (hello) + world);

the compiler will create a call to operator+(concat_string<std::string> const &, std::string const &) , which we'll define later.

First we need an abstraction class for the addition of two character based data types so that we can add string classes, character pointers and single characters in any combination that the language rules allow. Let's call it ... addition

template<typename Res, typename T1, typename T2> class addition { private: typedef typename addition_traits<T1>::type type1; typedef typename addition_traits<T2>::type type2; type1 first_; type2 second_; public: typedef Res value_type; addition(type1 first, type2 second) : first_(first), second_(second) { } type1 first(void) const { return first_; } type2 second(void) const { return second_; } };

Nothing too surprising there, just a simple constructor to store the two values and a couple of access functions. I use a traits class to determine how to store the values. Literal constants and pointers to literal constants need to be stored by value as the parameters' lifetimes may be shorter than the addition object. All other data are stored as reference-to-const to avoid copying objects â€“ after all that's the whole point of the exercise. So the traits class is straight forward too.

template<typename T> struct addition_traits { typedef T const &type; }; struct addition_traits<char> { typedef char type; }; struct addition_traits<char const> { typedef char const type; }; struct addition_traits<char *> { typedef char * type; }; struct addition_traits<char const *> { typedef char const * type; };

There are seven operators, to handle combinations of char , char * , std::string and other concat_string<> parameters. Each operator constructs a concat_string object to store an addition object that in turn will reference or store the operator parameter.

template<typename R1> concat_string< addition<std::string, R1, char const> > const operator+(concat_string<R1> const &first, char const second) { typedef concat_string< addition<std::string, R1, char const> > Ret; return Ret(Ret::value_type(first.data(), second)); } template<typename R2> concat_string< addition<std::string, char const, R2> > const operator+(char const first, concat_string<R2> const &second) { typedef concat_string< addition<std::string, char const, R2> > Ret; return Ret(Ret::value_type(first, second.data())); } template<typename R1> concat_string< addition<std::string, R1, char const *> > const operator+(concat_string<R1> const &first, char const *second) { typedef concat_string< addition<std::string, R1, char const *> > Ret; return Ret(Ret::value_type(first.data(), second)); } template<typename R2> concat_string< addition<std::string, char const *, R2> > const operator+(char const *first, concat_string<R2> const &second) { typedef concat_string< addition<std::string, char const *, R2> > Ret; return Ret(Ret::value_type(first, second.data())); } template<typename R1> concat_string< addition<std::string, R1, std::string> > const operator+(concat_string<R1> const &first, std::string const &second) { typedef concat_string< addition<std::string, R1, std::string> > Ret; return Ret(Ret::value_type(first.data(), second)); } template<typename R2> concat_string< addition<std::string, std::string, R2> > const operator+(std::string const &first, concat_string<R2> const &second) { typedef concat_string< addition<std::string, std::string, R2> > Ret; return Ret(Ret::value_type(first, second.data())); } template<typename R1, typename R2> concat_string< addition<std::string, R1, R2> > const operator+(concat_string<R1> const &first, concat_string<R2> const &second) { typedef concat_string< addition<std::string, R1, R2> > Ret; return Ret(Ret::value_type(first.data(), second.data())); }

Each of these operators return a concat_string object, so each subsequent sub-expression will also use these operators and not the STL string operators.

concat_string(hello) + space ? concat_string< addition<>1 concat_string< addition<> >1 + world ? concat_string< addition<> >2 concat_string< addition<> >2 + ' ' ? concat_string< addition<> >3 concat_string< addition<> >3 + exclamation ? concat_string< addition<> >4 concat_string< addition<> >4 + space ? concat_string< addition<> >5 concat_string< addition<> >5 + exclamation ? concat_string< addition<> >6 concat_string< addition<> >6 ? std::basic_string [invoking evaluation]

The concat_string class needs to store a copy of the abstraction object because the lifetime of the concat_string object will extend beyond the lifetime of the temporary abstraction object returned by the operators. Aside from the constructor and an accessor function to return a reference-to-const of the abstraction object stored within, there is only one other member function in the class, and this is where the full evaluation of the expression is performed. The member function is a template conversion operator to a basic_string type [3], and this is called when a concat_string object is assigned to a basic_string type.

template<typename Rep> class concat_string { private: Rep data_; private: concat_string(); concat_string &operator=(concat_string const &); public: typedef Rep value_type; concat_string(Rep const &data) : data_(data){ } Rep const &data() const { return data_; } template<typename C, typename T, typename A> operator std::basic_string<C, T, A>() const { std::basic_string<C, T, A> data; data.reserve(length(data_)); appender<std::basic_string<C, T, A> > append(data); append(data_); return data; } };

In the conversion operator, a basic_string object is created locally and reserve() is called to pre-allocate enough memory to hold the entire result string (see later for calculating the length of the new string). The text from the abstraction objects is then appended into this string using a helper class appender which can handle copying single characters, string types and abstraction objects into a basic_string class. The local string object is then returned from the conversion operator, by value. If the compiler supports named return value optimisations, NRVO, then no more temporaries will be created here.

Bits and pieces

The appender class provides a generic means of copying data into a basic_string object. This is needed because the STL basic_string class knows nothing of the addition class. appender provides three function call operators; one for a single character, one for a basic_string object, and the third for an addition object.

template<typename S> struct appender { private: S &buffer_; public: explicit appender(S &buffer) : buffer_(buffer) { } template<typename Res, typename T1, typename T2> void operator()(addition<Res, T1, T2> const &app) const { (*this)(app.first()); (*this)(app.second()); } void operator()(S const &data) const { buffer_ += data; } void operator()(typename S::value_type const &data) const { buffer_ += data; } };

The final piece of scaffolding is the mechanism for calculating the length of the result string. This is done by recursively totalling the sum of the lengths of the component strings. This is done through a series of free functions that know about the length of a specific type; one for a single character, a point to characters, a basic_string object, and an addition object.

template<typename CharT, typename CharTraits, typename Allocator> std::size_t const length(std::basic_string<CharT, CharTraits, Allocator> const &str) { return str.length(); } std::size_t const length(char) { return 1; } template<typename CharT> std::size_t const length(CharT const *str) { return std::char_traits<CharT>::length(str); } template<typename Res, typename T1, typename T2> std::size_t const length(addition<Res, T1, T2> const a) { return length(a.first()) + length(a.second()); }

Results

The success of this technique is difficult to measure accurately as a runtime performance saving, so instead I measured the number of requests to the heap manager for memory allocation. Using the sample code below, I ran the tests with T1 and T2 both as std::string types, and again with T1 as concat_string<std::string> and T2 as std::string .

std::string hello("Hello"); T1 space(" "); std::string world("World"); std::string exclamation("!"); // heap allocations were counted from here std::string str(hello + space + world + ' ' + exclamation + space + exclamation+'

');

Using Microsoft Visual C++ v7.0, the first test compiled without optimisations with native std::string types resulted in 14 memory allocation requests, and the second test using the new technique results in 3 allocation requests.

Conclusion

Expression templates are a powerful technique for delayed evaluation of expression, and we can take advantage of the delay and allocate just one buffer that is large enough to contain the string that results from the entire expression. The expression is evaluated only when it is assigned to a basic_string object, so if the expression is never assigned, then the expression is never evaluated. This is contrary to the normal behaviour that would create a host of temporary strings and then just throw them away.

The support scaffolding is fairly extensive, but the end result is well worth it. I have shown a saving of 11 memory allocations in the running example, and this is just the start of the savings. By eliminating the temporary objects that are created during normal evaluation of a string concatenation expression, I have prevented not only the heap allocations that I've already demonstrated, but the associated memory copying and de-allocation too. Perhaps more importantly, the new allocation measure is a constant value regardless of the number of additions in the expression, the same number of memory allocations will be performed.

[1] The STL shipped with this compiler employs an optimisation for small strings that use an internal buffer until the string exceeds the buffer length and then moves to heap allocation. For the testing, I added a length of text to the "Hello" string to exceed the internal buffer limit immediately.

[2] Vandevoorde and Josuttis, C++ Templates, Addison-Wesley, 2003, ISBN 0201734842

[3] Microsoft Visual C++ v7.0 fails to compile with this template conversion operator, so a std::string operator is provided for this compiler.