And when Wikipedia says fast it means really fast. Let’s compare CityHash 128-bit variant to MD5 hash function. The whole idea to compare these functions is about my curiosity. As you know Rails’s Asset Pipeline uses MD5 hash function against the content of assets to generate a name for css, js files. And I was thinking what if we would use CityHash for this task?
And I wrote a benchmark. I used cityhash gem - it’s a Ruby wrapper around C++ lib. And my test file was compiled css file from Github. Here is my benchmark:
CityHash.hash128 27490.3 (±3.4%) i/s - 139968 in 5.097615s
Digest::MD5.hexdigest 1705.6 (±0.6%) i/s - 8619 in 5.053583s
Yeah, CityHash.hash128 is 16x faster that Digest::MD5.hexdigest. Pretty impressive, right? And yes, I know that calculationg asset names is not a Rails’s bottleneck. It was just the first real life scenario of using MD5 hash function on big files that came up in my head.
JFYI, I’ve tested CityHash, MD5, MurmurHash3, and xxHash against files of different size. Here is the results:
123456789101112131415161718192021222324
CityHash.hash128 file_1k 176517.1 (±6.6%) i/s - 885807 in 5.042911s
Digest::MD5.hexdigest file_1k 206991.4 (±5.2%) i/s - 1039926 in 5.038555s
MurmurHash3::V128 file_1k 1057240.9 (±8.2%) i/s - 5246514 in 4.999302s
XXhash.xxh32 file_1k 1582107.2 (±12.6%) i/s - 7788214 in 4.998577s
CityHash.hash128 file_100k 51033.2 (±5.9%) i/s - 257895 in 5.072438s
Digest::MD5.hexdigest file_100k 3720.0 (±1.1%) i/s - 18600 in 5.000583s
MurmurHash3::V128 file_100k 39508.5 (±5.2%) i/s - 196962 in 5.001046s
XXhash.xxh32 file_100k 50274.8 (±1.4%) i/s - 251962 in 5.012696s
CityHash.hash128 file_1mb 6980.9 (±6.1%) i/s - 35152 in 5.057140s
Digest::MD5.hexdigest file_1mb 366.8 (±2.7%) i/s - 1850 in 5.047233s
MurmurHash3::V128 file_1mb 3947.5 (±6.3%) i/s - 19845 in 5.047911s
XXhash.xxh32 file_1mb 4547.8 (±11.9%) i/s - 22795 in 5.091122s
CityHash.hash128 file_10mb 572.1 (±12.8%) i/s - 2856 in 5.085573s
Digest::MD5.hexdigest file_10mb 36.1 (±2.8%) i/s - 183 in 5.069818s
MurmurHash3::V128 file_10mb 369.0 (±7.3%) i/s - 1836 in 5.003449s
XXhash.xxh32 file_10mb 436.7 (±11.0%) i/s - 2160 in 5.013092s
CityHash.hash128 file_100mb 59.0 (±13.6%) i/s - 294 in 5.086478s
Digest::MD5.hexdigest file_100mb 3.6 (±0.0%) i/s - 19 in 5.242694s
MurmurHash3::V128 file_100mb 37.9 (±7.9%) i/s - 189 in 5.037752s
XXhash.xxh32 file_100mb 48.0 (±4.2%) i/s - 240 in 5.005541s
So if you need a fast non-cryptographic hash function you definitely should try CityHash.