Fast Murmur3 hash implementation for Java, mostly a fork of yonik work with String-optimized 128-bit implementation and Guava-style sink added. See blog post about the original verion.
There are 3 128-bit versions provided:
murmurhash3_x64_128which takes a byte buffer to hashmurmurhash3_x64_128which takes aCharSequenceand treats it as a UTF-8 encoded string. It doesn't make a quite expensive call toString.getBytesand decodes small chunks on the go instead. It runs in constant memory and uses a 19 element byte buffer which can be provided by the caller to avoid any allocations. See memory charts below for a difference in memory consumptionmurmurhash3_x64_128_asciiwhich takes aCharSequenceand treats it as UTF-8 encoded string with ASCII characters only. Has zero allocations and provides a significant performance boost.
There are no checks made around string decoding operations, so wrong hashes will be generated if a non-ASCII characters
are passed to murmurhash3_x64_128_ascii or if murmurhash3_x64_128 receives an illegal UTF-8 string, no errors will be
raised.
Here are results for 128-bit hash on my laptop, run ./gradlew jmh to reproduce. One point is given for
hashing 100 random strings, so a score of 38820 for example means 3.8M hashes/sec.
guavausescom.google.common.hash.Hashing.murmur3_128(0).hashString()murmur*Bytesusesmurmurhash3_x64_128passings.getBytes(StandardCharsets.UTF_8)murmur*Stringcallsmurmurhash3_x64_128passing a string directly, it also reuses the 19-byte buffermurmurAsciiOptimizedStringcallsmurmurhash3_x64_128_ascii
ACSCII (1-128 characters strings)
Benchmark Mode Cnt Score Error Units
BenchString128.guavaAscii thrpt 100 38820.061 ± 767.128 ops/s
BenchString128.murmurAsciiBytes thrpt 100 79370.216 ± 1014.949 ops/s
BenchString128.murmurAsciiString thrpt 100 83249.135 ± 521.336 ops/s
BenchString128.murmurAsciiOptimizedString thrpt 100 155105.622 ± 1984.475 ops/s
Unicode (1-64 character strings)
Benchmark Mode Cnt Score Error Units
BenchString128.guavaUnicode thrpt 100 22044.362 ± 228.743 ops/s
BenchString128.murmurUnicodeBytes thrpt 100 32272.569 ± 635.764 ops/s
BenchString128.murmurUnicodeString thrpt 100 36704.326 ± 237.645 ops/s
Here are memory usage charts by different versions.
Guavaiscom.google.common.hash.Hashing.murmur3_128(0).hashString()Bytesismurmurhash3_x64_128receivings.getBytes(StandardCharsets.UTF_8)Stringismurmurhash3_x64_128getting a string directly and having to allocatebuf19on every callString, providede bufferismurmurhash3_x64_128getting a string directly, reusingbuf19
Simply copy to your project or get it from maven central.
