As a Perl programmer, I enjoy a lot using hash tables. I keep this habit in C/C++ programming. Then what C/C++ hash libraries are available? How are they compared to each other? In this post, I will give a brief review of hash libraries and present a small benchmark showing their practical performance.

Hash table libraries

In C++, the most widely used hash table implementation is hash_map/set in SGI STL, which is part of the GCC compiler. Note that hash_map/set is SGI’s extention to STL, but is not part of STL. TR1 (technical report 1) tries to standardize hash tables. It provides unordered_map/set with similar API to hash_map/set. Most of TR1 routines are available since gcc-4.0. Google sparse hash is another C++ hash table template library with similar API to hash_map/set. It provides two implementations, one is efficient in speed and the other is in memory.

In contrast, there are few good C libraries around. I have tried SunriseDD, uthash, glibc hash table, hashit, Christopher Clark’s hashtable, glib hash table and ghthash. SunriseDD sounds a great library that implements a lock-free hash table. However, I am not sure how to install it or use it, although the code itself is well documented. Uthash is a single header file. It is quite complex to use and incompatiable with C++. It also lacks basic APIs such as counting how many elements in the hash table. Glibc hash and hashit seem to only implement static hash tables. Glibc hash even does not have deletion operation. Only glib hash, CC’s hashtable and ghthash implement most of common operations. And they still have their weakness in comparison to C++ implementations (see below).

Design of the benchmark

The benchmark is comprised of two experiments. In the first experiment, a random integer array of 5 million elements is generated with about 1.25 million distinct keys. Each element is then tested whether it is present in the hash. If the element is in the hash, it will be removed; otherwise, it will be inserted. 625,792 distinct keys will be in the hash after this process. To test performance on string input, I convert integers to strings with sprintf().

The second experiment is designed by Craig Silverstein, the author of sparsehash. I am using his source codes. This experiment tests the performance of insertion from zero sized hash, insertion from preallocated hash, replacement, query, query of empty hash, and removal.

Results

The following table gives the results in the first experiment:

Library Mac-intCPU (sec) Mac-strCPU (sec) Mac PeakMem (MB) Linux-intCPU (sec) Linux-strCPU (sec) Linux PeakMem (MB) glib 1.904 2.436 11.192 3.490 4.720 24.968 ghthash 2.593 2.869 29.0/39.0 3.260 3.460 61.232 CC’s hashtable 2.740 3.424 59.756 3.040 4.050 129.020 TR1 1.371 2.571 16.140 1.750 3.300 28.648 STL hash_set 1.631 2.698 14.592 2.070 3.430 25.764 google-sparse 2.957 6.098 4.800 2.560 6.930 5.42/8.54 google-dense 0.700 2.833 24.616 0.550 2.820 24.7/49.3 khash (C++) 1.089 2.372 6.772 1.100 2.900 6.88/13.1 khash (C) 0.987 2.294 6.780 1.140 2.940 6.91/13.1 STL set (RB) 5.898 12.978 19.868 7.840 18.620 29.388 kbtree (C) 3.080 13.413 3.268 4.260 17.620 4.86/9.59 NP’s splaytree 8.455 23.369 8.936 11.180 27.610 19.024

Notes:

Please be aware that changing the size of input data may change the ranking of speed and memory. The speed of a library may vary up to 10% in two different runs.

CPU time is measured in seconds. Memory denotes the peak memory, measured in MB.

For string hash, only the pointer to a string is inserted. Memory in the table does not count the space used by strings.

If two numbers are given for memory, the first is for integer keys and the second for string keys.

For all C++ libraries and khash.h, one operation is needed to achieve “insert if absent; delete otherwise”. Glib and ghthash require two operations, which does not favour these two libraries.

The speed may also be influenced by the efficiency of hash funtions. Khash and Glib use the same hash function. TR1/SGI-STL/google-hash use another hash function. Fortunately, to my experiment, the two string hash functions have quite similar performance and so the benchmark reflects the performance of the overall hash libraries instead of just hash functions.

For glib and ghthash, what is inserted is the pointer to the integer instead of the integer itself.

Ghthash supports dynamic hash table. However, the results do not seem correct when this is switched on. I am using fixed-size hash table. This favours ghthash.

CC’s hashtable will force to free a key, which is not implemented in all the other libraries. This behaviour will add overhead on both speed and memory in my benchmark (but probably not in other applications). The memory is measured for integer keys.

This simple benchmark does not test the strength and weakness of splay tree.

And here is the result of the second experiment:

Library grow pred/grow replace fetch fetchnull remove Memory TR1 194.2 183.9 30.7 15.6 15.2 83.4 224.6 STL hash_map 149.0 110.5 35.6 11.5 14.0 87.2 204.2 STL map 289.9 289.9 141.3 134.3 7.0 288.6 236.8 google-sparse 417.2 237.6 89.5 84.0 12.1 100.4 85.4 google-dense 108.4 39.4 17.8 8.3 2.8 18.0 256.0 khash (C++) 111.2 99.2 26.1 11.5 3.0 17.4 198.0

Notes:

CPU time is measured in nanosecond for each operation. Memory is measured by TCmalloc. It is the memory difference before and after the allocation of the hash table, instead of the peak memory.

In this experiment, integers are inserted in order and there are no collisions in the hash table.

All these libraries provide similar API.

Discussions

Speed and memory. The larger the hash table, the fewer collisions may occur and the faster the speed. For the same hash library, increasing memory always increases speed. When we compare two libraries, both speed and memory should be considered.

The larger the hash table, the fewer collisions may occur and the faster the speed. For the same hash library, increasing memory always increases speed. When we compare two libraries, both speed and memory should be considered. C vs. C++. All C++ implementations have similar API. It is also very easy to use for any type of keys. Both C libraries, ghthash and glib, can only keep pointers to the keys, which complicates API and increases memory especially for 64-bit systems where a pointer takes 8 bytes. In general, C++ libraries is perferred over C ones. Surprisingly, on 32-bit Mac OS X, glib outperforms TR1 and STL for string input. This might indicate that the glib implementation itself is very efficient, but just the lack of functionality in C affects the performance.

All C++ implementations have similar API. It is also very easy to use for any type of keys. Both C libraries, ghthash and glib, can only keep pointers to the keys, which complicates API and increases memory especially for 64-bit systems where a pointer takes 8 bytes. In general, C++ libraries is perferred over C ones. Surprisingly, on 32-bit Mac OS X, glib outperforms TR1 and STL for string input. This might indicate that the glib implementation itself is very efficient, but just the lack of functionality in C affects the performance. Generic programming in C . Except my khash.h, all the other C hash libraries use (void*) to achieve generic typing. Using void* is okey for strings, but will cause overhead for integers. This is why all C libraries, except khash.h, is slower than C++ libraries on integer keys, but close to on string keys.

. Except my khash.h, all the other C hash libraries use (void*) to achieve generic typing. Using void* is okey for strings, but will cause overhead for integers. This is why all C libraries, except khash.h, is slower than C++ libraries on integer keys, but close to on string keys. Open addressing vs. chaining hash. Khash and google hash implement open addressing hash while the remaining implement chaining hash. In open addressing hash, the size of each bucket equals the size of a key plus 0.25 byte. Google sparsehash further compresses unused bucket to 1 bit, achieving high memory efficiency. In chaining hash, the memory overhead of each bucket is at least 4 bytes on 32bit machines, or 8 bytes on 64bit machines. However, chaining hash is less affected when the hash table is nearly full. In practice, both open addressing and chaining hash occupy similar memory under similar speed. Khash takes less peak memory mainly due to its advanced technique in rehashing which reduces memory usage. So far as speed is concerned, chaining hash may have fewer comparison between keys. We can see this from the fact that the speed of chaining hash approaches that of open addressing hash on string keys but much slower on integer keys.

Khash and google hash implement open addressing hash while the remaining implement chaining hash. In open addressing hash, the size of each bucket equals the size of a key plus 0.25 byte. Google sparsehash further compresses unused bucket to 1 bit, achieving high memory efficiency. In chaining hash, the memory overhead of each bucket is at least 4 bytes on 32bit machines, or 8 bytes on 64bit machines. However, chaining hash is less affected when the hash table is nearly full. In practice, both open addressing and chaining hash occupy similar memory under similar speed. Khash takes less peak memory mainly due to its advanced technique in rehashing which reduces memory usage. So far as speed is concerned, chaining hash may have fewer comparison between keys. We can see this from the fact that the speed of chaining hash approaches that of open addressing hash on string keys but much slower on integer keys. Memory usage of search trees. B-tree is the winner here. Each element in the B-tree only needs one additional pointer. When there are enough elements, a B-tree is at least halfly full; on average it should be around 75% full. And so on 64-bit systems, for a B-tree with N elements, we need additional N*8/0.75=10N bytes memory. Splay tree will need N*8*2=16N extra space. RB tree is the worst.

B-tree is the winner here. Each element in the B-tree only needs one additional pointer. When there are enough elements, a B-tree is at least halfly full; on average it should be around 75% full. And so on 64-bit systems, for a B-tree with N elements, we need additional N*8/0.75=10N bytes memory. Splay tree will need N*8*2=16N extra space. RB tree is the worst. Other issues. a) Google hash becomes unbearably slow when I try to put a lot of strings in the hash table. All the other libraries do not have this problem. b) Google hash performs more comparisons than khash. This is obvious because google-dense is clearly faster on integer keys but comparable to khash on string keys.

Concluding remarks

C++ hash library is much easier to use than C libraries. This is definitely where C++ is preferred over C.

TR1 hash implementation is no faster than STL implementation. They may outperform one another under certain input or settings.

SGI hash_map is faster and takes less memory than STL map. Unless ordering is important, hash_map is a better container than map.

Google hash is a worthy choice when we understand why it is slow for many string keys.

My khash library, which is a single-file C++ template header, achieves good balance between speed and memory. All my source codes are available at the Programs page.

Update

C interface can be elegant, too, if we implement it cleverly. See this post. I realize that we just need one lookup to achieve “insert if absent; delete otherwise”. This further improves the speed for all C++ libraries. I have analyzed google dense hash table in this post which explains why it is faster than khash on integer keys but close to or slower than on string keys. This thread directed me to gcc hashtable, and cocom hashtable. They are more or less independent of other source codes, but it would still take time to separate the source codes. So, I have not benchmarked them. Just keep a record. Python dictionary is in fact a hash table. The dictnotes.txt in that directory gives some quite interesting discussion about how to implement hash efficiently. hashlib library. A bit hard to use and I cannot get it running correctly. Possibly I have not provided a proper second hash function for rehashing. Added results for STL set (based on red-black tree) and John-Mark Gurney’s B-tree implementation (JG’s btree). Both libraries are considerably slower than hash tables. Of course search trees provide more functionality than hash tables, and every nice thing comes with a price. I have also tried Jason Evans’s and Niels Provos’ red-black tree implementations. On integer keys, JE’s takes 6.110 seconds on Mac-Intel using 18.884 MB memory and NP’s taks 6.611 seconds using the same amount of memory. This performance is close to that of STL set. They appear to be slower mainly due to the additional malloc/free calls I have to made under their APIs. Unlike hash table which have a variety of ways to implement it, red-black tree usually has one way (well, can be more. See also Jason’s blog.). And so I only show the performance of STL set as a representitive. Replaced JG’s B-tree with a modified version. The new version is both faster and more light-weighted.