This project provides the API (JavaDocs)
for hashing any sequences of bytes in Java, including all kinds of
primitive arrays, buffers, CharSequence
s and more. Java 6+. Apache 2.0 licence.
The key design goal, distinguishing this project from, for example, Guava hashing:
this API ease implementing hashing algorithms which don't do a single allocation
during hash computation for any input, and without using ThreadLocal
.
Also, the API attemps to be agile enough in byte order treatment, favoring native access, but allowing the hash function implementation be platform-endianness-agnostic. On the other hand, it allows to "fool" the existing implementation, even sealed for one byte order, feeding data in different byte order and obtain consistent results, only moderately compromising performance.
Currently long
-valued hash function interface is defined, with a plenty of shipped
implementations:
-
xxHash, r39 (latest; r40-r42 are maintenance releases without algorithm changes).
-
Two algorithms from FarmHash:
farmhashna
(introduced in FarmHash 1.0) andfarmhashuo
(introduced in FarmHash 1.1) -
CityHash, version 1.1 (latest; 1.1.1 is a C++ language-specific maintenance release).
These implementations are thought to be independent from the native byte order. They are thoroughly tested with JDK 6, 7 and 8, but only on little-endian platform.
Tested on Intel Core i7-4870HQ CPU @ 2.50GHz
Algorithm | Speed, GB/s | Bootstrap, ns |
---|---|---|
xxHash | 9.5 | 6 |
FarmHash na |
9.0 | 6 |
FarmHash uo |
7.2 | 7 |
CityHash | 7.0 | 7 |
MurmurHash | 5.3 | 12 |
To sum up,
- You need to hash plain byte sequences, memory blocks or "flat" objects.
- You like zero-allocation and pretty good performance (at Java scale).
- You need hashing to be agile in questions related to byte ordering.
-
You need to hash POJOs whose actual data is scattered in memory between managed objects. There is no simple way to hash, for example, instances of such class:
class Person { String givenName, surName; int salary; }
using the API provided by this project.
-
You need to hash byte sequences of beforehand unknown length, for the simpliest example,
Iterator<Byte>
. -
You need to transform the byte sequence (e. g. encode or decode it with a specific coding), and hash the resulting byte sequence on the way without dumping it to memory.
Gradle:
dependencies {
compile 'net.openhft:zero-allocation-hashing:0.6'
}
Or Maven:
<dependency>
<groupId>net.openhft</groupId>
<artifactId>zero-allocation-hashing</artifactId>
<version>0.6</version>
</dependency>
In Java:
long hash = LongHashFunction.xx_r39().hashChars("hello");
See JavaDocs for more information.
See the list of open issues.