read

A few weeks ago, I posted the first part of this article, where I explained the advantages of Boost.MultiIndex over the standard containers when you need to have multiple views on a set of data.

In this second part, I would like to talk about the benefits you can get from using Boost.MultiIndex as a single-index hash table, as a replacement of std::unordered_map.

One interesting and powerful aspect of Boost.MultiIndex is that it allows you to add an index of type T, where T is different from the stored type. And it is more frequent and useful that you could think.

std::string as a key

Who never used a std::unordered_map<std::string, X>? This is the most straightforward way to lookup for an object with a string: it is simple, relatively fast, and convenient as std::hash<std::string> is provided by the standard library.

However, what if you get another string type as input for your lookup? I already found myself in this situation multiple times:

when reading from shared memory, a socket, or a device, you get a const char*

when using APIs that provide their own string implementation, like Qt: even if your entire codebase is using std::string, you will have some boundary zones where both types are used

when using third party libraries: you get most of the time a const char * or std::string

struct stock { std::string ref; double price; int volume; }; struct message_handler { void receive(const char* buffer, size_t size) { // no choice here, lookup has to be on std::string // a std::string is constructed: memory allocation + bytes copy auto it = _stocks.find({buffer, size}); // ... } private: std::unordered_map<std::string, stock> _stocks; };



One would say that you could simply convert your string to std::string before the lookup. However this might not be acceptable from a performance point of view, as it will perform a memory allocation. Nowadays, many STL implementations use “small string optimization* (SSO) — up to 15 bytes with g++ 6.2 and 22 chars with clang 3.9 — in order to avoid the memory allocation. Bytes are still copied though, and the performance of your system heavily relies on implementation details.

The solution here is to use C++17’s std::string_view. This is not possible with standard containers, though: with std::unordered_map and std::unordered_set, you cannot query on another type than Key — and using a std::unordered_map<std::string_view, X> is a bad choice, for a number of reasons I explain in the appendix.

Easy task with Boost.MultiIndex: simlpy add a method that returns a string view and use it as index!

struct stock { std::experimental::string_view ref_view() const { return ref; } // same attributes as before }; struct message_handler { void receive(const char* buffer, size_t size) { // no memory allocation, no copy! std::experimental::string_view ref_view(buffer, size); auto it = _stocks.find(ref_view); // ... } private: boost::multi_index_container< stock, indexed_by< hashed_unique< const_mem_fun<stock, std::experimental::string_view, &stock::ref_view>, std::hash<std::experimental::string_view> > > > _stocks; };



The magic is done on line 23, which defined the hashed unique index as const_mem_fun<stock, std::experimental::string_view, &stock::ref_view> : a const member function of stock, that returns a string view, and can be found at &stock::ref_view.

unordered_set<unique_ptr<T>>

Keeping a set of unique_ptr<T> is less common, but sometimes convenient. Such pattern can be used if a class owns a collection of polymorphic T, where this owner class does not know anything about the concrete type.

struct connection { virtual ~connection() =default; virtual void close(const std::string& reason) =0; }; struct node { void new_connection(const parameters& params) { ... } private: void on_error(connection& conn) { conn.close("bye bye"); _connections.erase(&conn); // won't compile: Key is a unique_ptr<connection>, not a connection* } std::unordered_set<std::unique_ptr<connection>> _connections; };



Note: on_error is private. If it was public, this would be a poor API, as the client code would have a dangling reference the line right after the call to node::on_error. node owns and does not share connection ; otherwise it could simply use a std::unordered_set<std::shared_ptr<T>>.

There are a few ways to solve this with unordered STL containers, but…

std::unordered_set<connection*> : you lose the advantages of RAII std::unordered_map<connection*, std::unique_ptr<connection>> : ugly and under-performant solution

Boost.MultiIndex allows you here to store unique_ptr while being able to access it from raw pointers:

struct node { // ... private: void on_error(connection& conn) { conn.close("bye bye"); _connections.erase(&conn); // this line now builds! } using connection_ptr = std::unique_ptr<connection>; using connections = multi_index_container< connection_ptr, indexed_by< hashed_unique< const_mem_fun<connection_ptr, connection*, &connection_ptr::get> > > >; connections _connections; };





composite keys

Composite keys is another quite common pattern when storing objects into associative containers. As the name implies, it allows to add an index composed by multiple fields. Let’s take a quick example.

struct session { session(const std::string& username, const std::string& hostname) : _username(username), _hostname(hostname) { // ... } const std::string& username() const { return _username; } const std::string& hostname() const { return _hostname; } private: std::string _username; std::string _hostname; // more attributes ... std::chrono::time_point<std::chrono::system_clock> _started_time; std::vector<process> _processes; };



This class represents a session of a user on a particular machine. A user can only log once on a host, which allows us to uniquely identify a session by the association of the username and the hostname.

What are our options to do that with standard containers?

The most straightforward solution is to use a std::unordered_map<std::pair<std::string, std::string>, session> , as you don’t have to provide any custom std::hash nor operator== implementations. I see a lot of problems with this solution though: it is bad from a performance point of view, because it uses four std::string instances while we only need two and it allocates two std::string on each lookup. Also, the code is not very elegant and is error prone as you don’t know if username is the first element of the pair or the second. A better way to do it in my opinion is to have a std::unordered_map<session_id, session> , with session_id a struct with two std::string. This gets rid of a few drawbacks that we have in the solution 1, but requires a std::hash specialization and operator== definition. Not dramatic, but this raises more questions: how to implement a good hash function that takes two std::string as input? I usually use Boost.Functional/Hash for that, but not every application can support Boost as a dependency. Apparently, a good start would be to use a xor with some bit shifting on the two std::string hashes, but who knows if this is really good? A last solution I could think of is to use std::unordered_set<session> , but it is still far from ideal: you have all the problems cited in the solution 2. The only advantage is that you’re using less memory as you don’t have the two additional std::string instances as key… but now you have to const_cast all the time your stored objects, which can be annoying or simply not desirable depending on your API.

As you can see, there is no simple and straightforward solution to that (simple!) problem. Here is what is looks like with Boost.MultiIndex:

boost::multi_index_container< session, indexed_by< hashed_unique< composite_key< session, const_mem_fun<session, std::string, &session::username>, const_mem_fun<session, std::string, &session::hostname> > > > > sessions; sessions.insert({"foobar", "localhost"}); auto it = sessions.find(boost::make_tuple("foobar", "localhost")); std::cout << it->user_name << std::endl;



The composite_key takes in this case two const_mem_fun — which is the argument used to refer to const member functions. For the lookup, you will have to build a Boost.Tuple as a key.

As I wrote in the first paragraph, building std::string to do a lookup take a significant time. If you care enough about performance, here is the solution that is going to squeeze the last bits and nanoseconds:

boost::multi_index_container< session, indexed_by< hashed_unique< composite_key< session, const_mem_fun<session, std::experimental::string_view, &session::username>, const_mem_fun<session, std::experimental::string_view, &session::hostname> >, composite_key_hash< std::hash<std::experimental::string_view>, std::hash<std::experimental::string_view> > > > > sessions;



Same approach as before: the composite key is still composed by the two methods of your class, that we modify of course in order to return a std::string_view and not a std::string anymore. The only annoying bit is that we have to explicitly define the composite_key_hash as Boost.MultiIndex doesn’t pick the std::string_view specialization!

Appendix





Why using std::unordered_map<std::string_view, X> is bad design choice?

It is a bad design because it is fragile and error-prone. The key is const std::string_view, but as it is a view, it can actually change anytime. The only acceptable scenario in my mind would be if X owns the viewed string as const std::string.



How do you insert an element in such container?

m.emplace(x.str, x) std::experimental::string_view sv{x.str};

m.emplace(sv, x.str); m.emplace(x.str, std::move(x)); std::experimental::string_view sv{x.str};

m.emplace(sv, std::move(x));



… and the answer is: