Let’s talk about CityHash.

CityHash is a family of non-cryptographic hash functions, designed for fast hashing of strings. It has 32-, 64-, 128-, and 256-bit variants. © Wikipedia

And when Wikipedia says fast it means really fast. Let’s compare CityHash 128-bit variant to MD5 hash function. The whole idea to compare these functions is about my curiosity. As you know Rails’s Asset Pipeline uses MD5 hash function against the content of assets to generate a name for css, js files. And I was thinking what if we would use CityHash for this task?

And I wrote a benchmark. I used cityhash gem - it’s a Ruby wrapper around C++ lib. And my test file was compiled css file from Github. Here is my benchmark:

1 2 3 4 5 6 7 8 9 10 require 'cityhash' require 'digest' require 'benchmark/ips' test_string = File . read ( './test.txt' ) Benchmark . ips do | x | x . report ( "CityHash.hash128" ) { CityHash . hash128 ( test_string ) } x . report ( "Digest::MD5.hexdigest" ) { Digest :: MD5 . hexdigest ( test_string ) } end

and there are the results:

1 2 CityHash.hash128 27490.3 ( ±3.4% ) i/s - 139968 in 5.097615s Digest::MD5.hexdigest 1705.6 ( ±0.6% ) i/s - 8619 in 5.053583s

Yeah, CityHash.hash128 is 16x faster that Digest::MD5.hexdigest . Pretty impressive, right? And yes, I know that calculationg asset names is not a Rails’s bottleneck. It was just the first real life scenario of using MD5 hash function on big files that came up in my head.

JFYI, I’ve tested CityHash, MD5, MurmurHash3, and xxHash against files of different size. Here is the results:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 CityHash.hash128 file_1k 176517.1 ( ±6.6% ) i/s - 885807 in 5.042911s Digest::MD5.hexdigest file_1k 206991.4 ( ±5.2% ) i/s - 1039926 in 5.038555s MurmurHash3::V128 file_1k 1057240.9 ( ±8.2% ) i/s - 5246514 in 4.999302s XXhash.xxh32 file_1k 1582107.2 ( ±12.6% ) i/s - 7788214 in 4.998577s CityHash.hash128 file_100k 51033.2 ( ±5.9% ) i/s - 257895 in 5.072438s Digest::MD5.hexdigest file_100k 3720.0 ( ±1.1% ) i/s - 18600 in 5.000583s MurmurHash3::V128 file_100k 39508.5 ( ±5.2% ) i/s - 196962 in 5.001046s XXhash.xxh32 file_100k 50274.8 ( ±1.4% ) i/s - 251962 in 5.012696s CityHash.hash128 file_1mb 6980.9 ( ±6.1% ) i/s - 35152 in 5.057140s Digest::MD5.hexdigest file_1mb 366.8 ( ±2.7% ) i/s - 1850 in 5.047233s MurmurHash3::V128 file_1mb 3947.5 ( ±6.3% ) i/s - 19845 in 5.047911s XXhash.xxh32 file_1mb 4547.8 ( ±11.9% ) i/s - 22795 in 5.091122s CityHash.hash128 file_10mb 572.1 ( ±12.8% ) i/s - 2856 in 5.085573s Digest::MD5.hexdigest file_10mb 36.1 ( ±2.8% ) i/s - 183 in 5.069818s MurmurHash3::V128 file_10mb 369.0 ( ±7.3% ) i/s - 1836 in 5.003449s XXhash.xxh32 file_10mb 436.7 ( ±11.0% ) i/s - 2160 in 5.013092s CityHash.hash128 file_100mb 59.0 ( ±13.6% ) i/s - 294 in 5.086478s Digest::MD5.hexdigest file_100mb 3.6 ( ±0.0% ) i/s - 19 in 5.242694s MurmurHash3::V128 file_100mb 37.9 ( ±7.9% ) i/s - 189 in 5.037752s XXhash.xxh32 file_100mb 48.0 ( ±4.2% ) i/s - 240 in 5.005541s

So if you need a fast non-cryptographic hash function you definitely should try CityHash.