Skip to content
/ quichash Public

Ultra fast hashing app for Linux, Mac, Windows, Freebsd

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

vyrti/quichash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QuicHash

High-performance cryptographic hash utility with SIMD optimization.

Features

  • Algorithms: MD5, SHA-1, SHA-2/3, BLAKE2/3, xxHash3/128
  • Defaults: BLAKE3 algorithm, parallel processing
  • HDD Mode: Sequential processing with --hdd flag for old mechanical drives
  • SIMD: Automatic hardware acceleration (SSE, AVX, AVX2, AVX-512, NEON)
  • Optional Fast Mode: Quick hashing for large files (samples 300MB) ONLY for edge cases
  • Flexible Input: Files, stdin, or text strings
  • Wildcard Patterns: Support for *, ?, and [...] patterns in file/directory arguments
  • Directory Scanning: Recursive hashing with parallel processing by default
  • Verification: Compare hashes against stored database
  • Database Comparison: Compare two databases to identify changes, moves, and differences
  • Database Analysis: Analyze database statistics, duplicates, and potential space savings
  • Deduplication: Find and report duplicate files based on hash comparison
  • .hashignore: Exclude files using gitignore patterns
  • Formats: Standard, hashdeep, JSON
  • Compression: LZMA compression for databases
  • Cross-Platform: Linux, macOS, Windows, FreeBSD

Installation

cargo install quichash

Or build from source:

cargo build --release

Quick Start

# Hash a file (uses blake3 by default)
hash myfile.txt

# Hash text
hash --text "hello world"

# Hash from stdin
cat myfile.txt | hash

# Scan directory (parallel by default)
hash scan -d ./my_dir -b hashes.db

# Scan on old HDD (sequential)
hash scan -d ./my_dir -b hashes.db --hdd

# Verify
hash verify -b hashes.db -d ./my_dir

# Analyze database
hash analyze -d hashes.db

# List algorithms
hash list

Usage

Hash Files

hash myfile.txt                              # Uses blake3 by default
hash myfile.txt -a sha256                    # Specify algorithm
hash myfile.txt -a sha256 -a blake3          # Multiple algorithms
hash largefile.iso -f                        # Fast mode
hash myfile.txt -b output.txt                # Save to file
hash myfile.txt --json                       # JSON output

Wildcard Patterns

Hash multiple files using wildcard patterns:

hash "*.txt" -a sha256                       # All .txt files
hash "file?.bin" -a sha256                   # file1.bin, fileA.bin, etc.
hash "[abc]*.jpg" -a sha256                  # Files starting with a, b, or c
hash "img202405*.jpg" -a sha256              # All images from May 2024

Patterns work with all commands:

hash scan -d "data/*/hashes" -a sha256 -b output.db    # Multiple directories
hash verify -b "*.db" -d "data/*" --json               # Multiple databases/dirs

Hash Text or Stdin

hash --text "hello world" -a sha256          # Hash text
cat myfile.txt | hash -a sha256              # Hash from stdin

Scan Directory

hash scan -d /path/to/dir -b hashes.db                        # Basic (blake3, parallel)
hash scan -d /path/to/dir -b hashes.db --hdd                  # Sequential for old HDDs
hash scan -d /path/to/dir -a sha256 -b hashes.db              # Custom algorithm
hash scan -d /path/to/dir -b hashes.db -f                     # Fast mode
hash scan -d /path/to/dir -b hashes.db -f --hdd               # Fast mode, sequential
hash scan -d /path/to/dir -b hashes.db --compress             # Compressed
hash scan -d /path/to/dir -b hashes.db --format hashdeep      # Hashdeep format

Verify Directory

hash verify -b hashes.db -d /path/to/dir                      # Parallel (default)
hash verify -b hashes.db -d /path/to/dir --hdd                # Sequential for old HDDs
hash verify -b hashes.db -d /path/to/dir --json               # JSON output

Performance Optimizations

Parallel Verification (Default)

The verification engine uses parallel processing by default for significantly faster verification:

# Parallel verification (default, 2-4x faster)
hash verify -b hashes.db -d /path/to/dir

# Sequential verification (for old HDDs)
hash verify -b hashes.db -d /path/to/dir --hdd

Performance improvements:

  • Parallel by default: Uses all CPU cores via rayon (like scan)
  • Path canonicalization caching: Reduces redundant filesystem calls
  • Optimized file collection: Efficient recursive directory traversal
  • Reduced overhead: Minimizes lock contention in parallel mode

Parallel mode (default):

  • SSDs or NVMe drives (no seek penalty)
  • Large numbers of files (>1000)
  • Fast network storage
  • Modern systems with multiple cores

Sequential mode (--hdd flag):

  • Old mechanical HDDs (avoid thrashing)
  • Network drives with high latency
  • Systems with limited CPU cores
  • When minimizing system load
hash verify -b hashes.db -d /path/to/dir              # Verify
hash verify -b hashes.db.xz -d /path/to/dir           # Compressed
hash verify -b hashes.db -d /path/to/dir --json       # JSON

Output shows: Matches, Mismatches, Missing files, New files

Compare Databases

Compare two hash databases to identify changes, moves, and differences:

hash compare db1.txt db2.txt                          # Compare two databases
hash compare db1.txt db2.txt -b report.txt            # Save report to file
hash compare db1.txt db2.txt --format json            # JSON output
hash compare db1.txt db2.txt --format hashdeep        # Hashdeep audit format
hash compare db1.txt.xz db2.txt.xz                    # Compare compressed databases
hash compare db1.txt db2.txt.xz                       # Mix compressed and plain

Output shows:

  • Unchanged: Files with same hash in both databases
  • Changed: Files with different hashes
  • Moved: Files renamed or moved (same hash, different path)
  • Removed: Files in DB1 but not DB2
  • Added: Files in DB2 but not DB1

Analyze Database

Analyze a hash database to view statistics, duplicates, and potential space savings:

hash analyze -d hashes.db                             # Analyze database
hash analyze -d hashes.db --json                      # JSON output
hash analyze -d hashes.db -b report.txt               # Save report to file
hash analyze -d hashes.db.xz                          # Analyze compressed database

Output shows:

  • Database info: Path, format (standard/hashdeep), size
  • Summary: Total files, unique hashes, algorithms used
  • File sizes: Total size of all files (hashdeep format only)
  • Duplicates: Number of duplicate groups, duplicate files, potential space savings

Deduplicate Files

Find and report duplicate files based on hash comparison:

hash dedup -d /path/to/dir                # Find duplicates (dry-run)
hash dedup -d /path/to/dir -b report.txt  # Save report to file
hash dedup -d /path/to/dir -f             # Fast mode
hash dedup -d /path/to/dir --json         # JSON output

Output shows duplicate groups with file paths and sizes.

Benchmark & List

hash benchmark                    # Benchmark all algorithms
hash benchmark -s 500             # Custom data size
hash list                         # List algorithms
hash list --json                  # JSON output

Command-Line Options

Command Option Description
FILE File or wildcard pattern to hash (omit for stdin)
-t, --text <TEXT> Hash text string
-a, --algorithm <ALG> Algorithm (default: blake3)
-b, --output <FILE> Write to file
-f, --fast Fast mode (samples 300MB)
--json JSON output
scan -d, --directory <DIR> Directory or wildcard pattern to scan
-a, --algorithm <ALG> Algorithm (default: blake3)
-b, --database <FILE> Output database
--hdd Sequential mode for old HDDs (default: parallel)
-f, --fast Fast mode
--format <FMT> standard or hashdeep
--compress LZMA compression
--json JSON output
verify -b, --database <FILE> Database file or wildcard pattern
-d, --directory <DIR> Directory or wildcard pattern to verify
--json JSON output
compare DATABASE1 First database file (supports .xz)
DATABASE2 Second database file (supports .xz)
-b, --output <FILE> Write report to file
--format <FMT> plain-text, json, or hashdeep
analyze -d, --database <FILE> Database file to analyze (supports .xz)
-b, --output <FILE> Write report to file
--json JSON output
dedup -d, --directory <DIR> Directory to scan for duplicates
-f, --fast Fast mode
-b, --output <FILE> Write report to file
--json JSON output
benchmark -s, --size <MB> Data size (default: 100)
--json JSON output

.hashignore

Exclude files using gitignore-style patterns:

cat > /path/to/dir/.hashignore << 'EOF'
*.log
*.tmp
build/
node_modules/
!important.log
EOF

hash scan -d /path/to/dir -a sha256 -b hashes.db

Patterns: *.ext, dir/, !pattern, #comments, **/*.ext

Output Formats

Standard (default):

<hash>  <algorithm>  <mode>  <filepath>

Hashdeep: CSV format with file size, compatible with hashdeep tool

JSON: Structured output for automation

Performance

Algorithm Throughput Use Case
xxHash3 10-30 GB/s Non-crypto, max speed
BLAKE3 1-3 GB/s Crypto, fastest
SHA-512 600-900 MB/s Crypto, 64-bit
SHA-256 500-800 MB/s Crypto, common
SHA3-256 200-400 MB/s Post-quantum

Tips:

  • Parallel processing is enabled by default (2-4x faster on multi-core)
  • Use --hdd for old mechanical drives (sequential processing)
  • Use -f for large files (10-100x faster)
  • BLAKE3 is the default algorithm (fastest cryptographic hash)
  • Compile with RUSTFLAGS="-C target-cpu=native" for best performance

Fast Mode Speedup:

  • 1 GB: ~7x faster
  • 10 GB: ~67x faster
  • 100 GB: ~667x faster

Fast Mode

Samples 300MB (first/middle/last 100MB) instead of entire file.

Good for: Quick checks, large files, backups Not for: Full verification, forensics, small files

Common Use Cases

# Verify downloaded file
hash downloaded-file.iso -a sha256

# Backup verification (parallel by default)
hash scan -d /data -b backup.db
hash verify -b backup.db -d /data

# Backup on old HDD (sequential processing)
hash scan -d /data -b backup.db --hdd
hash verify -b backup.db -d /data

# Monitor changes
hash scan -d /etc/config -b baseline.db
hash verify -b baseline.db -d /etc/config

# Compare two snapshots
hash scan -d /data -b snapshot1.db
# ... time passes ...
hash scan -d /data -b snapshot2.db
hash compare snapshot1.db snapshot2.db -b changes.txt

# Analyze database for duplicates and stats
hash analyze -d media.db                          # View stats and duplicates
hash analyze -d media.db --json                   # JSON output for automation

# Find duplicates in directory
hash dedup -d /media                              # Quick duplicate scan

# Forensic analysis
hash scan -d /evidence -a sha3-256 -b evidence.db
hash scan -d /evidence -a sha256 -b evidence.txt --format hashdeep

# Quick checksums (blake3 is default)
hash large-backup.tar.gz -f
hash scan -d /backups -b checksums.db -f

# Automation
hash verify -b hashes.db -d /data --json | jq '.report.mismatches'
hash compare db1.db db2.db --format json | jq '.summary'

Algorithm Selection

Recommended:

  • SHA-256: Widely supported, good security
  • BLAKE3: Fastest cryptographic hash
  • SHA3-256: Post-quantum resistant

Deprecated:

  • MD5, SHA-1: Use only for compatibility

Non-crypto (trusted environments):

  • xxHash3/128: Maximum speed

SIMD Optimization

Automatic support for SSE, AVX, AVX2, AVX-512 (x86_64) and NEON (ARM).

Verify: cargo test --release --test simd_verification -- --nocapture

Wildcard Patterns

Supported patterns:

  • * - Matches any number of characters (e.g., *.txt, file*)
  • ? - Matches exactly one character (e.g., file?.bin)
  • [...] - Matches any character in brackets (e.g., [abc]*.jpg)

Examples:

hash "*.txt" -a sha256                       # All .txt files in current dir
hash "data/*.bin" -a sha256                  # All .bin files in data/
hash "file?.txt" -a sha256                   # file1.txt, fileA.txt, etc.
hash "[abc]*.jpg" -a sha256                  # Files starting with a, b, or c
hash scan -d "backup/*/data" -a sha256 -b db.txt  # Multiple directories
hash verify -b "*.db" -d "data/*"            # All .db files against all data dirs

Notes:

  • Patterns are expanded by the shell or the application
  • If no files match, an error is displayed
  • Multiple matches are processed in sorted order
  • For scan/verify with multiple directories, results are aggregated

Troubleshooting

Issue Solution
Unsupported algorithm Run hash list to see available algorithms
Permission errors Use sudo hash scan -d /protected/dir ...
Slow performance Use -p for parallel, -f for fast mode, or BLAKE3
Fast mode not working Fast mode only works with files (not stdin/text)
.hashignore not working Check file location: /path/to/dir/.hashignore
Wildcard pattern not matching Ensure pattern is quoted (e.g., "*.txt" not *.txt)
No files match pattern Check pattern syntax and file locations

Contributing

We welcome contributions to QuicHash! To contribute, you must certify that you have the right to submit your contribution and agree to license it under the project's dual MIT/Apache-2.0 license.

Developer Certificate of Origin (DCO)

QuicHash uses the Developer Certificate of Origin (DCO) process. This is a lightweight way for contributors to certify that they wrote or otherwise have the right to submit code or documentation to an open source project.

Inbound = Outbound License

All contributions to QuicHash are made under the same dual MIT/Apache-2.0 license as the project itself. By signing off on your commits, you agree that your contributions will be licensed under these same terms, with no additional restrictions.

How to Sign Off Commits

Contributors sign-off that they adhere to these requirements by adding a Signed-off-by line to commit messages.

This is my commit message

Signed-off-by: Random J Developer <random@developer.example.org>

Git even has a -s command line option to append this automatically to your commit message:

$ git commit -s -m 'This is my commit message'

License

QuicHash is dual-licensed under either:

at your option.