(Start at the beginning of the series – and all the source can be found in my github repo)

Now that I was used to writing C++11-style constexpr , I decided to try some compile-time string hashing. It turns out that FNV1 is very easy to express in a move-down-the-string recursive style:

namespace detail { constexpr uint64_t fnv1 ( uint64_t h, const char * s ) { return ( * s == 0 ) ? h : fnv1 ( ( h * 1099511628211ull ) ^ static_cast < uint64_t > ( * s ) , s + 1 ) ; } } constexpr uint64_t fnv1 ( const char * s ) { return true ? detail :: fnv1 ( 14695981039346656037ull, s ) : throw err :: fnv1_runtime_error ; } namespace detail { constexpr uint64_t fnv1(uint64_t h, const char* s) { return (*s == 0) ? h : fnv1((h * 1099511628211ull) ^ static_cast<uint64_t>(*s), s+1); } } constexpr uint64_t fnv1(const char* s) { return true ? detail::fnv1(14695981039346656037ull, s) : throw err::fnv1_runtime_error; }

That wasn’t difficult enough

So far so good. How about some other hash functions? On a previous project I had used MurmurHash, so I decided to give that a go.

(Coincidentally, a blog post has recently surfaced implementing 32-bit Murmur3 using C++14 constexpr . But I was sticking to C++11 constexpr .)

Implementing such a big function from cold in C++11 constexpr was bound to be difficult, so I scaffolded. I used the freely-available runtime implementation for testing, and initially I converted it to C++14 constexpr , which was fairly easy. After doing that, and with Wikipedia as a pseudocode guide I was ready to start breaking down (or building up) the functionality, C++11-style. At the base of 32-bit Murmur3 is the hash round function which is done for each 4-byte chunk of the key:

constexpr uint32_t murmur3_32_k ( uint32_t k ) { return ( ( ( k * 0xcc9e2d51 ) << 15 ) | ( ( k * 0xcc9e2d51 ) >> 17 ) ) * 0x1b873593 ; } constexpr uint32_t murmur3_32_hashround ( uint32_t k, uint32_t hash ) { return ( ( ( hash ^ k ) << 13 ) | ( ( hash ^ k ) >> 19 ) ) * 5 + 0xe6546b64 ; } constexpr uint32_t murmur3_32_k(uint32_t k) { return (((k * 0xcc9e2d51) << 15) | ((k * 0xcc9e2d51) >> 17)) * 0x1b873593; } constexpr uint32_t murmur3_32_hashround( uint32_t k, uint32_t hash) { return (((hash^k) << 13) | ((hash^k) >> 19)) * 5 + 0xe6546b64; }

And this can easily be put into a loop, with a helper function to constitute 4-byte chunks (the somewhat strange formulation of word32le here is because it is used differently in some other code):

constexpr uint32_t word32le ( const char * s, int len ) { return ( len > 0 ? static_cast < uint32_t > ( s [ 0 ] ) : 0 ) + ( len > 1 ? ( static_cast < uint32_t > ( s [ 1 ] ) << 8 ) : 0 ) + ( len > 2 ? ( static_cast < uint32_t > ( s [ 2 ] ) << 16 ) : 0 ) + ( len > 3 ? ( static_cast < uint32_t > ( s [ 3 ] ) << 24 ) : 0 ) ; } constexpr uint32_t word32le ( const char * s ) { return word32le ( s, 4 ) ; } constexpr uint32_t murmur3_32_loop ( const char * key, int len, uint32_t hash ) { return len == 0 ? hash : murmur3_32_loop ( key + 4 , len - 1 , murmur3_32_hashround ( murmur3_32_k ( word32le ( key ) ) , hash ) ) ; } constexpr uint32_t word32le(const char* s, int len) { return (len > 0 ? static_cast<uint32_t>(s[0]) : 0) + (len > 1 ? (static_cast<uint32_t>(s[1]) << 8) : 0) + (len > 2 ? (static_cast<uint32_t>(s[2]) << 16) : 0) + (len > 3 ? (static_cast<uint32_t>(s[3]) << 24) : 0); } constexpr uint32_t word32le(const char* s) { return word32le(s, 4); } constexpr uint32_t murmur3_32_loop( const char* key, int len, uint32_t hash) { return len == 0 ? hash : murmur3_32_loop( key + 4, len - 1, murmur3_32_hashround( murmur3_32_k(word32le(key)), hash)); }

So this is the first part of Murmur3 that will process all the 4-byte chunks of key. What remains is to deal with the trailing bytes (up to 3 of them) and then finalize the hash. To deal with the trailing bytes, I chain the end functions ( _end0 through _end3 ) and “jump in” at the right place. There is probably a more elegant way to do this, but…

constexpr uint32_t murmur3_32_end0 ( uint32_t k ) { return ( ( ( k * 0xcc9e2d51 ) << 15 ) | ( ( k * 0xcc9e2d51 ) >> 17 ) ) * 0x1b873593 ; } constexpr uint32_t murmur3_32_end1 ( uint32_t k, const char * key ) { return murmur3_32_end0 ( k ^ static_cast < uint32_t > ( key [ 0 ] ) ) ; } constexpr uint32_t murmur3_32_end2 ( uint32_t k, const char * key ) { return murmur3_32_end1 ( k ^ ( static_cast < uint32_t > ( key [ 1 ] ) << 8 ) , key ) ; } constexpr uint32_t murmur3_32_end3 ( uint32_t k, const char * key ) { return murmur3_32_end2 ( k ^ ( static_cast < uint32_t > ( key [ 2 ] ) << 16 ) , key ) ; } constexpr uint32_t murmur3_32_end ( uint32_t hash, const char * key, int rem ) { return rem == 0 ? hash : hash ^ ( rem == 3 ? murmur3_32_end3 ( 0 , key ) : rem == 2 ? murmur3_32_end2 ( 0 , key ) : murmur3_32_end1 ( 0 , key ) ) ; } constexpr uint32_t murmur3_32_end0(uint32_t k) { return (((k*0xcc9e2d51) << 15) | ((k*0xcc9e2d51) >> 17)) * 0x1b873593; } constexpr uint32_t murmur3_32_end1(uint32_t k, const char* key) { return murmur3_32_end0( k ^ static_cast<uint32_t>(key[0])); } constexpr uint32_t murmur3_32_end2(uint32_t k, const char* key) { return murmur3_32_end1( k ^ (static_cast<uint32_t>(key[1]) << 8), key); } constexpr uint32_t murmur3_32_end3(uint32_t k, const char* key) { return murmur3_32_end2( k ^ (static_cast<uint32_t>(key[2]) << 16), key); } constexpr uint32_t murmur3_32_end(uint32_t hash, const char* key, int rem) { return rem == 0 ? hash : hash ^ (rem == 3 ? murmur3_32_end3(0, key) : rem == 2 ? murmur3_32_end2(0, key) : murmur3_32_end1(0, key)); }

Finalizing the hash is a very similar affair, with 3 stages:

constexpr uint32_t murmur3_32_final1 ( uint32_t hash ) { return ( hash ^ ( hash >> 16 ) ) * 0x85ebca6b ; } constexpr uint32_t murmur3_32_final2 ( uint32_t hash ) { return ( hash ^ ( hash >> 13 ) ) * 0xc2b2ae35 ; } constexpr uint32_t murmur3_32_final3 ( uint32_t hash ) { return ( hash ^ ( hash >> 16 ) ) ; } constexpr uint32_t murmur3_32_final ( uint32_t hash, int len ) { return murmur3_32_final3 ( murmur3_32_final2 ( murmur3_32_final1 ( hash ^ static_cast < uint32_t > ( len ) ) ) ) ; } constexpr uint32_t murmur3_32_final1(uint32_t hash) { return (hash ^ (hash >> 16)) * 0x85ebca6b; } constexpr uint32_t murmur3_32_final2(uint32_t hash) { return (hash ^ (hash >> 13)) * 0xc2b2ae35; } constexpr uint32_t murmur3_32_final3(uint32_t hash) { return (hash ^ (hash >> 16)); } constexpr uint32_t murmur3_32_final(uint32_t hash, int len) { return murmur3_32_final3( murmur3_32_final2( murmur3_32_final1(hash ^ static_cast<uint32_t>(len)))); }

And that’s all there is to it: all that remains is to stitch the pieces together and provide the driver function:

constexpr uint32_t murmur3_32_value ( const char * key, int len, uint32_t seed ) { return murmur3_32_final ( murmur3_32_end ( murmur3_32_loop ( key, len / 4 , seed ) , key + ( len / 4 ) * 4 , len & 3 ) , len ) ; } constexpr uint32_t murmur3_32 ( const char * key, uint32_t seed ) { return true ? murmur3_32_value ( key, strlen ( key ) , seed ) : throw err :: murmur3_32_runtime_error ; } constexpr uint32_t murmur3_32_value(const char* key, int len, uint32_t seed) { return murmur3_32_final( murmur3_32_end( murmur3_32_loop(key, len/4, seed), key+(len/4)*4, len&3), len); } constexpr uint32_t murmur3_32(const char *key, uint32_t seed) { return true ? murmur3_32_value(key, strlen(key), seed) : throw err::murmur3_32_runtime_error; }

Doing strlen non-naively

One more thing: strlen crept in there. This is of course a constexpr version of strlen . The naive way to do this is to step down the string linearly, as seen in the FNV1 algorithm. With a max recursive depth of 512, I thought this fairly limiting… it might not be uncommon to have 1k string literals that I want to hash?

So the way I implemented strlen was to measure the string by chunks, constraining a max recursion depth of say 256. Instead of a plain linear recursion, it’s basically recursing for every 256 bytes of string length, and then recursing on the chunk inside that. So the max depth is 256 + 256 = 512, and we can deal with strings close to 64k in size (depending on how deep we are already when calling strlen ). There’s a chance that strlen might be made constexpr in the future – I think it’s already implemented as an intrinsic in some compilers. So maybe I can throw away that code someday.

Anyway, now we have a complete implementation of 32-bit Murmur3 in C++11 constexpr style, and the tests go something like this (and can be checked online):

static_assert ( murmur3_32 ( "hello, world" , 0 ) == 345750399 , "murmur3 test 1" ) ; static_assert ( murmur3_32 ( "hello, world1" , 0 ) == 3714214180 , "murmur3 test 2" ) ; static_assert ( murmur3_32 ( "hello, world12" , 0 ) == 83041023 , "murmur3 test 3" ) ; static_assert ( murmur3_32 ( "hello, world123" , 0 ) == 209220029 , "murmur3 test 4" ) ; static_assert ( murmur3_32 ( "hello, world1234" , 0 ) == 4241062699 , "murmur3 test 5" ) ; static_assert ( murmur3_32 ( "hello, world" , 1 ) == 1868346089 , "murmur3 test 6" ) ; static_assert(murmur3_32("hello, world", 0) == 345750399, "murmur3 test 1"); static_assert(murmur3_32("hello, world1", 0) == 3714214180, "murmur3 test 2"); static_assert(murmur3_32("hello, world12", 0) == 83041023, "murmur3 test 3"); static_assert(murmur3_32("hello, world123", 0) == 209220029, "murmur3 test 4"); static_assert(murmur3_32("hello, world1234", 0) == 4241062699, "murmur3 test 5"); static_assert(murmur3_32("hello, world", 1) == 1868346089, "murmur3 test 6");

OK. That was a lot of code for a blog post. But after the trivial FNV1 and the only slightly harder Murmur3, I wanted more of a challenge. What about MD5? Or SHA-256? That’s for the next post.

Share this: Facebook

Twitter

Reddit

Email

