Table of Contents

I’ve spent a long time developing my robin_hood::unordered_map, and after claiming that it is now the fastest hashmap I understandably got quite a few skeptic comments. Some of the comments were quite right, and my benchmarks were not as unbiased as they could be, I did not test as many unordered maps as I should have, my compiler options were not choosen well, and so on.

That’s why I have now spent considerable time to create a highly improved benchmarks, where I have tried to remedy all most of the critique that I got. The results are not as flattering to my robin_hood::unordered_map, but I am still very pleased with the results.

What is actually Benchmarked?

This benchmark has evalued 20 different unordered_map implementations, each with 5 different hashing implementations. So there are a total of 20*5 = 100 hashmap variants to benchmark. Each of this 100 hashmaps was evaluated in 10 different benchmarks, so in total 1000 benchmark evaluations. I ran each benchmark 9 times and show the median, to filter out any outliers. So in total I ran 9000 benchmarks, which took about 6 days on my Intel i7-8700 at 3200 MHz. To get highly accurate results, I’ve isolated a core to only benchmarking, and disabled all frequency scaling.

Hashmaps

Hashes

Some hashmap implementations come with their own hashing methods, each with different properties. In my benchmarks I have used either integral types like int or uint64_t , and std::string as the keys.

Abseil’s Hash absl:Hash : An extremely fast hash, that works very well in all situations I have tested.

: An extremely fast hash, that works very well in all situations I have tested. FNV1a A very simple hash that is used by Microsoft in Visual Studio 2017. Interestingly, they even use this byte-wise hash for integral types. My benchmark has its own implementation, but in my experiments it has produced the same assembler code as the original Microsoft variant.

Folly’s Hash folly::hasher : Unfortunately I could not find any documentation. It seems to be well optimized and uses native crc instruction if available. Unfortunately the result is only a 32bit hash which can work badly for some hashmap variants.

: Unfortunately I could not find any documentation. It seems to be well optimized and uses native crc instruction if available. Unfortunately the result is only a 32bit hash which can work badly for some hashmap variants. libstdc++-v3 simply casts integral types to size_t and uses this as a hash function. It is obviously the fastest hash, but many hashmap implementations rely on a somewhat good avalanching hash quality so this seems to be a rather bad choice.

and uses this as a hash function. It is obviously the fastest hash, but many hashmap implementations rely on a somewhat good avalanching hash quality so this seems to be a rather bad choice. martinus’s robin-hood-hashing robin_hood::hash is based on abseil’s hash for integral types, with minor modifications.

How is benchmarked?

Build

I’ve used g++ 8.2.0 with -O3 -march=native : g++-8 ( Ubuntu 8.2.0-1ubuntu2~18.04 ) 8.2.0

: CMake build is done with Release mode

mode and I’ve set FOLLY_CXX_FLAGS to -march=native .

to . For the ktprime map benchmarks I had to add -fno-strict-aliasing .

System Configuration

All benchmarks were run on an Linux. uname -a output: Linux dualit 4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:44:52 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

output: Processor Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz, locked to 3200 MHz.

Isolated a core with it’s hyperthreading companion by editing /etc/default/grub and changing GRUB_CMDLINE_LINUX_DEFAULT so it looks like this: GRUB_CMDLINE_LINUX_DEFAULT = "quiet splash isolcpus=5,11 rcu_nocbs=5,11"

and changing so it looks like this: Turbo boost and frequency scaling were disabled with the python tool perf with the command sudo python3 -m perf system tune This sets cores 5 and 11 to 3200 MHz, sets scaling governor to performance, disables Turbo Boost, sets irqbalances service to inactive, IRQ affinity to all CPUs except 5 and 11.

Each benchmarks is run in a separately started process.

Isolated cores are used with taskset -c 5,11

To get rid of any potential outliers and to average effects of ASLR, all benchmarks were run 9 times and I show only the median result.

Benchmarks

Enough talk, onwards to the benchmarks!