Skip to content
Sean edited this page Aug 1, 2016 · 2 revisions

MurmurHash3 information and brief performance results

Introduction

(Note - this information is a bit stale, and refers to an earlier version of Murmur3. I'll rewrite it when I have time.)

MurmurHash3 is the successor to MurmurHash2. It comes in 3 variants - a 32-bit version that targets low latency for hash table use and two 128-bit versions for generating unique identifiers for large blocks of data, one each for x86 and x64 platforms.

Details

MurmurHash3's mix functions are based on this snippet -

k *= c1; 
k = rotl(k,r1); 
k *= c2;

h ^= k;

h = rotl(h,r1);	
h = h*m1+n1;

k is a block of the key, h is a block of the hash state, and rN/mN/cN are constants.

For each block of the key, we pre-mix it using two constants and a rotate, xor it into the hash block, and them mix the hash block using a rotate and a multiply-add.

MurmurHash3's 32-bit finalizer is

h ^= h >> 16;
h *= 0x85ebca6b;
h ^= h >> 13;
h *= 0xc2b2ae35;
h ^= h >> 16;

and its 64-bit finalizer is

h ^= h >> 33;
h *= 0xff51afd7ed558ccd;
h ^= h >> 33;
h *= 0xc4ceb9fe1a85ec53;
h ^= h >> 33;

The constants for the finalizers were generated by a simple simulated-annealing algorithm, and both avalanche all bits of h to within 0.25% bias.

Block inter-mixing

The 128-bit variants mix multiple blocks of key data in parallel. To ensure that all the intermediate hash blocks affect each other, Murmurhash3 does a few simple operations interleaved with the block mix -

The 64-bit, 2-block inter-mix is

h1 += h2;
h2 = _rotl64(h2,41);
h2 += h1;

The 32-bit, 4-block inter-mix is

h1 += h2;
h1 += h3;
h1 += h4;
h1 = _rotl(h1,17);
h2 += h1;
h3 += h1;
h4 += h1;

where hN is one block of the hash value.

Bulk speed test, hashing an 8-byte-aligned 256k block

Results are from an Intel Core 2 Quad Q9650 running at 3.0 ghz, running on a single core.

Hash Speed
FNV_x86_32 554 mb/sec
FNV_x64_32 715 mb/sec
SuperFastHash_x86_32 1224 mb/sec (1)
SuperFastHash_x64_32 1311 mb/sec
Lookup3_x86_32 1234 mb/sec
Lookup3_x64_32 1265 mb/sec
MurmurHash2_x86_32 2577 mb/sec
MurmurHash2_x86_64 3352 mb/sec (2)
MurmurHash2_x64_64 2857 mb/sec
MurmurHash3_x86_32 3105 mb/sec
MurmurHash3_x86_128 2684 mb/sec
MurmurHash3_x64_128 5058 mb/sec (3)

(1) - SuperFastHash has very poor collision properties, which have been documented elsewhere.

(2) - MurmurHash2_x86_64 computes two 32-bit results in parallel and mixes them at the end, which is fast but means that collision resistance is only as good as a 32-bit hash. I suggest avoiding this variant.

(3) - That's about 1.68 bytes per cycle, or about 9.5 cycles per 16-byte chunk. The inner loop is 20 instructions long, so we're sustaining over 2 instructions per cycle. Hooray for modern platforms with fast 64-bit multipliers and superscalar architectures. :)

Clone this wiki locally