Skip to content

gbessonov/jhash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

jHash

High-performance Murmur3 hashing for the JVM. jHash delivers blazing-fast Murmur3 (128-bit) hashing with minimal allocations, offering both one-shot and streaming APIs. Optimized for speed, especially on small to medium-sized data and streaming operations.


Table of Contents

Features

  • Lightweight, zero/minimal-allocation implementation
  • One-shot and streaming hash APIs
  • Seedable functionality for repeatable results
  • Faster than Guava Murmur3 across chunk sizes

Getting Started

Installation

via JitPack:

<repository>
  <id>jitpack.io</id>
  <url>https://jitpack.io</url>
</repository>
<dependency>
  <groupId>com.github.gbessonov</groupId>
  <artifactId>jHash</artifactId>
  <version>main-SNAPSHOT</version>
</dependency>

Usage Examples

One-shot hashing

⚠️ Murmur3 is not cryptographic. Do not use it for security-sensitive purposes (signatures, password hashing, etc).

import io.github.gbessonov.jhash.HashCode;
import io.github.gbessonov.jhash.JHash;

byte[] data = "Hello, World!".getBytes(StandardCharsets.UTF_8);
HashCode hash = JHash.murmur3_128(data);
System.out.println(hash); // Prints the 128-bit hash

Output:

jHash.HashCode.Murmur3F: c0a1b86f7365bc93bfae71678c2843aa

One-shot hashing with seed

import io.github.gbessonov.jhash.HashCode;
import io.github.gbessonov.jhash.JHash;

int seed = 1337;
byte[] data = "Hello, World!".getBytes(StandardCharsets.UTF_8);
HashCode hash = JHash.murmur3_128(data, seed);
System.out.println(hash); // Prints the 128-bit hash

Output:

jHash.HashCode.Murmur3F: 4afbedb4dfd83011c7e0a1f0baeadad7

Streaming hashing

import io.github.gbessonov.jhash.HashCode;
import io.github.gbessonov.jhash.JHash;

// Prepare data chunks
Random random = new Random(42);
int chunkCount = 1000;
byte[][] chunks = new byte[chunkCount][];

for (int i = 0; i < chunkCount; i++) {
    chunks[i] = new byte[64]; // 64-byte chunks
    random.nextBytes(chunks[i]);
}
// Create a hasher with a specific seed
HashFunction hasher = JHash.newMurmur3_128(42);
// Feed data chunks into the hasher
for (byte[] chunk : chunks) {
    hasher.include(chunk);
}
// Compute the final hash
var hash = hasher.hash();
System.out.println(hash);

Output:

jHash.HashCode.Murmur3F: f103c566293376326c638cbc4db64b0e

Performance

jHash is designed for high performance, especially on small to medium-sized data and streaming operations.

Benchmark Environment

  • JVM: OpenJDK 64-Bit Server VM 21.0.8+9-LTS
  • Framework: JMH 1.37 (Java Microbenchmark Harness)
  • Configuration: 3 warmup iterations, 5 measurement iterations, 2 forks
  • Methodology: Throughput measurement (operations per second)

Hash Function Performance

Performance comparison for Murmur3 128-bit hash function across different input sizes:

Input Size jHash (ops/s) Guava (ops/s) GreenRobot (ops/s) jHash Advantage
8 bytes 111,634,650 53,540,645 47,849,136 2.1x vs Guava
64 bytes 55,033,573 23,791,891 60,687,830 2.3x vs Guava
1KB 5,941,919 3,851,453 6,197,071 1.5x vs Guava
8KB 792,667 1,063,469 806,817 Competitive

Key Performance Insights

  • Small Data Excellence: jHash demonstrates superior performance for small inputs (8-64 bytes), achieving over 2x throughput compared to Guava
  • Consistent Performance: Maintains competitive performance across all input sizes
  • Memory Efficiency: Optimized for minimal allocation overhead

Streaming Hash Performance

For applications requiring incremental hashing of multiple data chunks:

Chunk Count jHash (ops/s) Guava (ops/s) GreenRobot (ops/s) Performance Lead
10 chunks 6,596,477 3,386,620 6,587,794 1.9x vs Guava
100 chunks 672,561 296,831 661,867 2.3x vs Guava
1K chunks 65,539 58,748 67,527 1.1x vs Guava
10K chunks 7,567 4,818 7,570 1.6x vs Guava
100K chunks 755 478 755 1.6x vs Guava

Streaming Performance Highlights

  • Incremental Hashing: Optimized streaming API for processing data in chunks
  • Scalable Architecture: Maintains performance advantage across various workload sizes
  • API Consistency: Single fluent interface for both one-shot and streaming operations

Real-World Performance Impact

Based on these benchmarks, applications can expect:

  • Cache Systems: 2-3x faster hash key generation for small cache keys
  • Data Structures: Significantly improved HashMap/HashSet performance
  • Checksums: Faster data integrity verification for network protocols
  • Distributed Systems: Reduced CPU overhead in consistent hashing scenarios

Benchmark Reproduction

To reproduce these results:

mvn clean compile
mvn exec:java -Dexec.mainClass="org.openjdk.jmh.Main" \
  -Dexec.args="io.github.gbessonov.jhash.benchmarks"

Results were obtained on a modern development machine and may vary depending on hardware configuration and JVM optimizations.

Contributing

Contributions welcome! To help us improve:

  1. Report bugs or performance regressions via Issues (label bug, perf).
  2. Propose enhancements—e.g., SIMD (Vector API), Kotlin wrappers, additional algorithms.
  3. Send pull requests with sensible descriptions and tests.

All contributions must include unit tests (and benchmarks where appropriate). Please open an issue before working on large features.

License

This project is licensed under the Apache-2.0 License. See the LICENSE file for details

About

high-performance hashing for JVM with zero/minimal allocations and streaming API

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages