Consulting. Training. Development.

Fast hashing with CityHash

Let’s talk about CityHash.

CityHash is a family of non-cryptographic hash functions, designed for fast hashing of strings. It has 32-, 64-, 128-, and 256-bit variants. © Wikipedia

And when Wikipedia says fast it means really fast. Let’s compare CityHash 128-bit variant to MD5 hash function. The whole idea to compare these functions is about my curiosity. As you know Rails’s Asset Pipeline uses MD5 hash function against the content of assets to generate a name for css, js files. And I was thinking what if we would use CityHash for this task?

And I wrote a benchmark. I used cityhash gem - it’s a Ruby wrapper around C++ lib. And my test file was compiled css file from Github. Here is my benchmark:

1
2
3
4
5
6
7
8
9
10
require 'cityhash'
require 'digest'
require 'benchmark/ips'

test_string = File.read('./test.txt')

Benchmark.ips do |x|
  x.report("CityHash.hash128")      { CityHash.hash128(test_string) }
  x.report("Digest::MD5.hexdigest") { Digest::MD5.hexdigest(test_string) }
end

and there are the results:

1
2
CityHash.hash128        27490.3 (±3.4%) i/s  - 139968 in  5.097615s
Digest::MD5.hexdigest   1705.6 (±0.6%) i/s   - 8619   in  5.053583s

Yeah, CityHash.hash128 is 16x faster that Digest::MD5.hexdigest. Pretty impressive, right? And yes, I know that calculationg asset names is not a Rails’s bottleneck. It was just the first real life scenario of using MD5 hash function on big files that came up in my head.

JFYI, I’ve tested CityHash, MD5, MurmurHash3, and xxHash against files of different size. Here is the results:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
CityHash.hash128 file_1k         176517.1 (±6.6%) i/s -   885807  in  5.042911s
Digest::MD5.hexdigest file_1k    206991.4 (±5.2%) i/s -   1039926 in  5.038555s
MurmurHash3::V128 file_1k        1057240.9 (±8.2%) i/s -  5246514 in  4.999302s
XXhash.xxh32 file_1k             1582107.2 (±12.6%) i/s - 7788214 in  4.998577s

CityHash.hash128 file_100k       51033.2 (±5.9%) i/s -   257895 in    5.072438s
Digest::MD5.hexdigest file_100k  3720.0 (±1.1%) i/s -    18600 in     5.000583s
MurmurHash3::V128 file_100k      39508.5 (±5.2%) i/s -   196962 in    5.001046s
XXhash.xxh32 file_100k           50274.8 (±1.4%) i/s -   251962 in    5.012696s

CityHash.hash128 file_1mb        6980.9 (±6.1%) i/s -    35152 in     5.057140s
Digest::MD5.hexdigest file_1mb   366.8 (±2.7%) i/s -     1850 in      5.047233s
MurmurHash3::V128 file_1mb       3947.5 (±6.3%) i/s -    19845 in     5.047911s
XXhash.xxh32 file_1mb            4547.8 (±11.9%) i/s -   22795 in     5.091122s

CityHash.hash128 file_10mb       572.1 (±12.8%) i/s -    2856 in      5.085573s
Digest::MD5.hexdigest file_10mb  36.1 (±2.8%) i/s -      183 in       5.069818s
MurmurHash3::V128 file_10mb      369.0 (±7.3%) i/s -     1836 in      5.003449s
XXhash.xxh32 file_10mb           436.7 (±11.0%) i/s -    2160 in      5.013092s

CityHash.hash128 file_100mb      59.0 (±13.6%) i/s -     294 in       5.086478s
Digest::MD5.hexdigest file_100mb 3.6 (±0.0%) i/s -       19 in        5.242694s
MurmurHash3::V128 file_100mb     37.9 (±7.9%) i/s -      189 in       5.037752s
XXhash.xxh32 file_100mb          48.0 (±4.2%) i/s -      240 in       5.005541s

So if you need a fast non-cryptographic hash function you definitely should try CityHash.

Comments