If you are using a modern C++ (C++11 or better), you have access to set data structures ( unordered_set ) which have the characteristics of a hash set. The standard does not provide us with built-in functions to compute the union and the intersection of such sets, but we can make our own. For example, the union of two sets A and B can be computed as follow:

out . insert ( A . begin ( ) , A . end ( ) ) ; out . insert ( B . begin ( ) , B . end ( ) ) ;

where out is an initially empty set. Because insertion in a set has expected constant-time performance, the computational complexity of this operation is O(size(A) + size(B)) which is optimal.

If you are a computer scientist who does not care about real-world performance, your work is done and you are happy.

But what if you want to build fast software for the real world? How fast are these C++ sets?

I decided to populate two sets with one million integers each, and compute how how many cycles it takes to compute the intersection and the union, and then I divide by 2,000,000 to get the time spent per input element.

intersection ( unordered_set ) 100 cycles/element union ( unordered_set ) 250 cycles/element

How good or bad is this? Well, we can also take these integers and put them in sorted arrays. Then we can invoke the set_intersection and set_union methods that STL offers.

set_intersection ( std::vector ) 3 cycles/element set_union ( std::vector ) 5 cycles/element

That’s an order of magnitude better!

So while convenient, C++’s unordered_set can also suffer from a significant performance overhead.

What about std::set which has the performance characteristics of a tree? Let us use code as follows where out is an initially empty set.

std :: set_union ( A . begin ( ) , A . end ( ) , B . begin ( ) , B . end ( ) , std :: inserter ( out , out . begin ( ) ) ) ;

set_intersection ( std::set ) 150 cycles/element set_union ( std::set ) 750 cycles/element

As we can see, results are considerably worse.

The lesson is that a simple data structure like that of std::vector can serve us well.

My source code is available.