Introduction

I was trying to revive some old C++ code at the weekend when I noticed a strange problem. I was using a std::set of objects of a class I’d created, I added objects to the set, but when I came to iterate through them a bit later, half of the objects seemed to have disappeared! What the heck could be going on? It took me quite while to realise what my problem was, because it was not at all obvious. This article describes the problem you may encounter when using ordered associative containers like std::set and std::map in your own code, but is not intended to be a complete guide to using these containers, or to sorting in general. You will need a little (but not much) C++11 knowledge in order to follow it.

std::set 101

The std::set container class is part of the C++ Standard Library and implements a collection of unique key values which are maintained in sorted order. The std::map container is very similar, but associates a value with each key. You can use std::set like this:

#include <set> #include <iostream> using namespace std; int main() { std::set <int> s; s.insert( 7 ); s.insert( 2 ); s.insert( 5 ); for ( auto i : s ) { cout << i << endl; } }

The for/auto stuff is C++11 syntax for iterating over the set, which prints out:

2 5 7

If I added another insert statement:

s.insert( 2 );

then the insert would not take place, as the set already contains the value 2.

Sets are implemented internally as balanced binary search trees. If you have ever played with BSTs, you will know that you need to compare the value at each node with the thing you want to add to the tree in order to find the correct insertion place. For C++ sets, this comparison is performed by the Standard Library functor std::less, which itself uses the < (less-than) operator for the types in the set. This is fine and dandy if the things in the set actually have a less-than operator (as integers do in the case above), but what if they don’t?

A Point Struct

For the remainder of the article, I’ll look at the problems of creating a Point structure that you can store in a std::set. The Point structure implements a par of integer (x,y) coordinates, and a first attempt looks like this:

#include <iostream> #include <set> using namespace std; struct Point { int x, y; }; ostream & operator << ( ostream & os, const Point & p ) { return os << p.x << "," << p.y; } int main() { Point p1{1,2}; Point p2{4,1}; cout << p1 << " " << p2 << endl; }

I’ve added a streaming output operator<<() for convenience. The braces around the values used to initialise the Point objects are part of C++11 uniform initialisation.

This code produces the output:

1,2 4,1

So far so good. But what happens if we try to create a std:;set of Point objects and to add the point p1 to the set:

int main() { Point p1{1,2}; Point p2{4,1}; set <Point> s; s.insert( p1 ); }

Well, if you try to compile that, your compiler will complain – in the case of my GCC compiler, it complains bitterly, producing one of those wall-of-text sequence of error messages we all know and love. The relevant message here is this one, which I’ve edited slightly:

required from here stl_function.h:235:20: error: no match for 'operator<' (operand types are 'const Point' and 'const Point') { return __x < __y; }

You can see that some code in the Standard Library implementation header stl_function.h wants to use the less-than operator on a couple of Point objects, but the compiler cannot find such an operator because we have not written one, and one is not provided by C++ by default. If we want to put Point objects in a set, we have to provide a less-than operator.

A less-than Operator

How to go about writing a less-than operator for our Point struct? There are actually several ways of going about doing this, but I am going to suggest that it should be implemented as a free, non-member function that takes two parameters. I don’t intend to go into the reasons for this here except to say that binary operators should generally be implemented as free functions, and that if you can avoid making something a member function, you probably should do so. Our comparison operator in outline then should look like this:

bool operator < ( const Point & a, const Point & b ) { // what to put here? }

What should the operator return? Put simply, it should return true if the parameter a is less than the parameter b. So your first thought might be to write something like this:

bool operator < ( const Point & a, const Point & b ) { return a.x < b.x && a.y < b.y; }

and to test it like this:

int main() { set <Point> s = {{1,1},{1,2},{2,1},{2,2}}; for ( auto p : s ) { cout << p << endl; } }

This code compiles, as the compiler is now happy it has a less-than operator it can use, but unfortunately, when you run the program, you will get the following output:

1,1 2,2

where two of the Points seem to have disappeared! This was the problem I was having with my own code (though to be fair on myself, my code was considerably more complex than this Point structure). So, what is going on?

Strict Weak Ordering

The C++ Standard says that in order for a less-than operator to work correctly with sets and maps (and with some of the Standard Library algorithms), it must enforce what is known as "strict weak ordering". I don’t intend to go into any detail about exactly what this means here – if you want more details I suggest taking a look at this Wikipedia article. Suffice it to say that the comparison operator we have provided does not enforce it, and is in fact rather obviously not a great way of ordering 2D points.

But if what we have written does not enforce strict weak ordering, how to go about writing one that does? Unfortunately, doing so is both somewhat unintuitive (at least for me – you might like to test your own intuition by correcting the above comparison function) and very, very easy to get wrong if you are not methodical about your implementation. I’m going to suggest two approaches – a mechanical, algorithmic one, and a possibly better one using other C++ facilities.

The Mechanical Method

The mechanical method is simple. For each pair of values in the two parameters referred to as a and b in the less-than operator above, write code like this:

if ( a.x < b.x ) return true; if ( b.x < a.x ) return false;

and finish off by returning false. So, using this method, our comparison operator becomes:

bool operator < ( const Point & a, const Point & b ) { if ( a.x < b.x ) return true; if ( b.x < a.x ) return false; if ( a.y < b.y ) return true; if ( b.y < a.y ) return false; return false; }

If we decided to make Point a 3D point, containing a z coordinate, we would just add the statements:

if ( a.z < b.z ) return true; if ( b.z < a.z ) return false;

With the above changes in place, the test program now produces the correct output:

1,1 1,2 2,1 2,2

This method may seem long-winded, but it has two big advantages. First, it is hard to get it it wrong – you can write the comparison function almost without thinking. And second, it only uses the less-than operator on the types it is comparing – this is important, as many types will not implement the full gamut of comparisons.

Using C++ Features

If you want to avoid the semi-boilerplate code above, C++ offers several other possibilities for providing things that you can store in a set.

One possibility is not to create your own Point struct at all. C++ provides several data structures that work much like a point, and which implement a suitable less-than operator out of the box. The most obvious of these is std::pair:

#include <iostream> #include <set> #include <utility> using namespace std; int main() { set <pair<int,int>> s = {{1,1},{1,2},{2,1},{2,2}}; for ( auto p : s ) { cout << p.first << "," << p.second << endl; } }

If you wanted to retain the Point struct, you could use std::pair to implement your custom less-than operator:

bool operator < ( const Point & a, const Point & b ) { pair <int,int> pa{a.x,a.y}; pair <int,int> pb{b.x,b.y}; return pa < pb; }

You can do similar things with std::vector, or in C++11 with std::tuple and the very handy std::tie function:

bool operator < ( const Point & a, const Point & b ) { return tie( a.x, a.y ) < tie( b.x, b.y ); }

The latter would be my personal preference, and its what I ended up using to solve my own problem with providing a less-than operator.

Summary

If you want to conveniently put objects in a set or a map, they must provide a less-than operator.

The less-than operator must enforce strict weak ordering.

Providing strict weak orderingis easy for human programmers to get wrong.

Use C++ features like pairs and tuples to provide it.