How much is std::string_view faster than standard std::string operations?

Have a look at a few examples where I compare std::string_view against std::string .

Intro

I was looking for some examples of string_view , and after a while, I got curious about the performance gain we might get.

string_view is conceptually only a view of the string: usually implemented as [ptr, length] . When a string_view is created there’s no need to copy the data (as opposite when you create a copy of a string). What’s more string_view is smaller than std::string - regarding the size on the stack/heap.

For example when we look at a possible (pseudo) implementation:

string_view { size_t _len ; const CharT * _str ; }

Depending on the architecture the total size is 8 or 16 bytes.

Due to Small String Optimizations std::string is usually 24 or 32 bytes so doubles or triple the size of string_view . In that form, such a string can hold between 15 (GCC, MSVC) and 22 characters (Clang) without having to allocate memory on the heap. Of course larger string will use more memory, but 24/32 bytes is the minimal size of the std::string .

You can read more details about SSO in this excellent post Exploring std::string. Or here: SSO-23 (as suggested in a comment)

Obviously returning string views, creating string views, using substr is definitely much faster than deep copies of std::string . However, the initial performance tests showed that std::string is usually highly optimized and sometimes string_view doesn’t win that much.

The Series

This article is part of my series about C++17 Library Utilities. Here’s the list of the other topics that I’ll cover:

Resources about C++17 STL:

string_view Operations

string_view is modelled to be very similar to std::string . Yet, the view is non-owning, so any operation that modifies the data cannot go into the API. Here’s a brief list of methods that you can use with this new type:

operator[]

at

front

back

data

size / length

/ max_size

empty

remove_prefix

remove_suffix

swap

copy (not constexpr )

(not ) substr - complexity O(1) and not O(n) as in std::string

- complexity and not as in compare

find

rfind

find_first_of

find_last_of

find_first_not_of

find_last_not_of

operators for lexicography compare: ==, !=, <=, >=, <, >

operator <<

One important note is that all of the above methods (except for copy and operator << ) are also constexpr ! With this capability, you might now work with strings in constant expressions.

What’s more for C++20 we’ll get at least two new methods:

starts_with

ends_with

That are implemented both for std::string_view and std::string . As of now (July 2018) Clang 6.0 supports those functions. So you can experiment with them.

A Basic Test - substr

substr gives probably the best advantage over the standard string substr . It has the complexity of O(1) and not O(n) as with regular strings.

I’ve created a basic test using Quick C++ Benchmark and got the following results:

Using Clang 6.0.0, -O3, libc++

The code:

static void StringSubStr ( benchmark :: State & state ) { std :: string s = "Hello Super Extra Programming World" ; for ( auto _ : state ) { auto oneStr = s . substr ( 0 , 5 ); auto twoStr = s . substr ( 6 , 5 ); auto threeStr = s . substr ( 12 , 5 ); auto fourStr = s . substr ( 18 , 11 ); auto fiveStr = s . substr ( 30 , 5 ); // Make sure the variable is not optimized away by compiler benchmark :: DoNotOptimize ( oneStr ); benchmark :: DoNotOptimize ( twoStr ); benchmark :: DoNotOptimize ( threeStr ); benchmark :: DoNotOptimize ( fourStr ); benchmark :: DoNotOptimize ( fiveStr ); } }

And for string_view :

static void StringViewSubStr ( benchmark :: State & state ) { // Code before the loop is not measured std :: string s = "Hello Super Extra Programming World" ; for ( auto _ : state ) { std :: string_view sv = s ; auto oneSv = sv . substr ( 0 , 5 ); auto twoSv = sv . substr ( 6 , 5 ); auto threeSv = sv . substr ( 12 , 5 ); auto fourSv = sv . substr ( 18 , 11 ); auto fiveSv = sv . substr ( 30 , 5 ); benchmark :: DoNotOptimize ( oneSv ); benchmark :: DoNotOptimize ( twoSv ); benchmark :: DoNotOptimize ( threeSv ); benchmark :: DoNotOptimize ( fourSv ); benchmark :: DoNotOptimize ( fiveSv ); } }

Here’s the full experiment: @Quick C++ Bench

For this test, we have 10x speed-up!

Can we achieve similar results in other cases?

String Split

After the basic tests we can do one more step and try to compose a more complicated algorithm: let’s take string splitting.

For this experiment I’ve gathered code from this resources:

Here are the two versions, one for std::string and the second one for std::string_view :

std :: vector < std :: string > split ( const std :: string & str , const std :: string & delims = " " ) { std :: vector < std :: string > output ; auto first = std :: cbegin ( str ); while ( first != std :: cend ( str )) { const auto second = std :: find_first_of ( first , std :: cend ( str ), std :: cbegin ( delims ), std :: cend ( delims )); if ( first != second ) output . emplace_back ( first , second ); if ( second == std :: cend ( str )) break ; first = std :: next ( second ); } return output ; }

No,w the string_view version:

std :: vector < std :: string_view > splitSV ( std :: string_view strv , std :: string_view delims = " " ) { std :: vector < std :: string_view > output ; size_t first = 0 ; while ( first < strv . size ()) { const auto second = strv . find_first_of ( delims , first ); if ( first != second ) output . emplace_back ( strv . substr ( first , second - first )); if ( second == std :: string_view :: npos ) break ; first = second + 1 ; } return output ; }

And here’s the benchmark:

const std :: string_view LoremIpsumStrv { /*one paragraph of lorem ipsum */ }; static void StringSplit ( benchmark :: State & state ) { std :: string str { LoremIpsumStrv }; for ( auto _ : state ) { auto v = split ( str ); benchmark :: DoNotOptimize ( v ); } } // Register the function as a benchmark BENCHMARK ( StringSplit ); static void StringViewSplit ( benchmark :: State & state ) { for ( auto _ : state ) { auto v = splitSV ( LoremIpsumStrv ); benchmark :: DoNotOptimize ( v ); } } BENCHMARK ( StringViewSplit );

Will we get the same 10X perf speed as in the previous benchmark… hmmm:

This is GCC 8.1, -O3

A bit better with Clang 6.0.0, -O3:

A slightly better result when I run it locally in MSVC 2017:

string length : 486 test iterations : 10000 string split : 36.7115 ms string_view split : 30.2734 ms

Here’s the benchmark @Quick C++ Bench

Do you have any ideas why we do not see 10X speed up as with the initial experiment?

Of course, we cannot assume 10X is realistic in this case.

First of all, we have a container - std::vector - that the algorithm uses to output the results. The memory allocations inside std::vector will affect the overall speed.

If we run the iteration once, and when I override operator new I can see the following numbers (MSVC):

string length : 486 test iterations : 1 string split : 0.011448 ms , Allocation count : 15 , size 6912 string_view split : 0.006316 ms , Allocation count : 12 , size 2272

We have 69 words in that string, the string version generated 15 memory allocations (both for strings and to increase the vector space), and in total it allocated 6912 bytes.

The strng_view version used 12 memory allocations (only for vector as there’s no need to allocate memory for string_view ) and in total it used 2272 bytes (3x less than the std::string version)

Some ideas to improve

See the comment by JFT where he reimplemented the split algorithms using raw pointers rather than iterators, and he got much more performance improvements.

Another possibility is to reserve some space up-front in the vector (and later we can use shrink_to_fit - that way we save a lot of memory allocations.

Comparing with boost::split :

For completeness, I also run the benchmark against boost::split (1.67), and both of our versions are much faster:

Running on WandBox, GCC 8.1

string length : 489 test iterations : 10000 string split : 42.8627 ms , Allocation count : 110000 , size 82330000 string_view split : 45.6841 ms , Allocation count : 80000 , size 40800000 boost split : 117.521 ms , Allocation count : 160000 , size 83930000

So the hand-crafted version is almost 3x faster than the boost.split algorithm!

Play with the code @WandBox

String Split and Loading From a File

You might notice that my test string is just one paragraph of “lorem ipsum”. Such a simple test case might cause some additional optimisations in the compiler and produce unrealistic results.

I’ve found a nice post from Rainer Grimm: C++17 - Avoid Copying with std::string_view - ModernesCpp.com

In the article, he used TXT files to process strings. It’s a much better idea to work on some real and large text files, rather than simple strings.

Instead of my lorem ipsum paragraph, I’m just loading a file, for example, ~540kb of text (Gutenberg project)

Here’s a result from a test run over that file:

string length : 547412 test iterations : 100 string split : 564.215 ms , Allocation count : 191800 , size 669900000 string_view split : 363.506 ms , Allocation count : 2900 , size 221262300

The test is run 100 times, so for one iteration we have 191800/100 = 1918 memory allocations (in total we use 669900000/100 = 6699000 bytes per iteration) for std::string .

For string_view we have only 2900/100 = 29 memory allocations and 221262300/100 = 2212623 bytes used per iteration.

While it’s still not 10x gain, we have 3x less memory used and around 1.5x perf boost.

Sorry for a little interruption in the flow :)

I've prepared a little bonus if you're interested in C++17, check it out here: Download a free copy of C++17 Language Ref Card!

Risks With Using string_view

I think that every article about string_view should also mention the potential risks involved with this new type:

Taking care of the (non)null-terminated strings - string_view may not contain NULL at the end of the string. So you have to be prepared for such a case.

Problematic when calling functions like atoi , printf that accepts null-terminated strings Conversion into strings

may not contain NULL at the end of the string. So you have to be prepared for such a case. References and Temporary objects - string_view doesn’t own the memory, so you have to be very careful when working with temporary objects.

When returning string_view from a function Storing string_view in objects or container.

doesn’t own the memory, so you have to be very careful when working with temporary objects.

Wrap Up

By leveraging string_view , you can achieve a lot of performance boost in many use cases. However, it’s important to know that there are caveats and sometimes the perf might be even slower as compare to std::string !

The first thing is that string_view doesn’t own the data - thus you need to be careful, so you don’t end up with references to deleted memory!

The second thing is that compilers are very smart when handling strings, especially when strings are short (so they work nicely with SSO - Small String Optimization), and in that case, the perf boost might not be that visible.

A Few Questions to You