Rust toolkit for hashing folders, comparing snapshots, and automating clean-up tasks.
This crate rebuilds the feature set as an idiomatic, multi-command CLI with strong typing, streaming IO, and cross-platform support. The long-term design and extra context live in spec.md.
- Fast hashing pipeline backed by Rayon workers, reusable buffer pools, and selectable memory modes (
stream,balanced,booster). - Multiple commands in one binary: build hash maps, diff two states, copy only what changed, delete empty folders, batch-rename files, generate reports, and benchmark algorithms.
- Built-in map formats (JSON + CSV) with deterministic ordering and metadata headers (root, timestamp, algorithm, etc.).
- Config layering: system -> user -> project -> env vars -> CLI flags, with support for TOML/YAML/JSON configuration files and
HASH_FOLDEROO_*overrides. - Batteries-included developer experience:
cargo testruns unit + integration smoke tests (seetests/cli_smoke.rs).
| Command | Purpose | Handy flags |
|---|---|---|
hashmap |
Walk a directory, hash every file, and write a map in JSON or CSV. | --path, --output, --format, --algorithm, --xof-length, --force-expand, --strip-prefix, --exclude, --threads, --mem-mode, --max-ram, --progress, --dry-run |
compare |
Compare two maps or directories and classify files as identical, changed, moved, missing, or new. | --source, --target, --format {json,csv}, --algorithm |
copydiff |
Generate and optionally execute copy ops derived from a comparison. | --source, --target, --plan <json>, --execute, --dry-run, --conflict {overwrite,skip,rename}, --preserve-times, --algorithm |
removempty |
Delete now-empty directories (post-order) with glob exclusions. | --path, --dry-run, --min-empty-depth, --exclude |
renamer |
Apply a simple old->new replacement across filenames. |
--path, --pattern, --dry-run |
report |
Summarize a hash map (stats, duplicates, largest files, etc.). | --input, --format {json,text}, --include, --top-n |
benchmark |
Benchmark supported algorithms over an in-memory buffer. | --algorithm {blake3,shake256,all}, --size <bytes> |
If you installed a prebuilt binary, invoke the binary directly for help and to run commands. If you're developing locally, using cargo run remains supported.
# show global help
hash-folderoo --help
# show per-command help (example)
hash-folderoo hashmap --helpUse --alg-list to print the currently compiled hashing algorithms (BLAKE3, BLAKE2b, BLAKE2bp, SHAKE256, TurboSHAKE256, ParallelHash256, XXH3-1024, WyHash-1024, KangarooTwelve).
Note about forcing expansion: algorithms that do not natively support XOF (e.g., BLAKE2b, BLAKE2bp) will reject requests for arbitrarily-long output unless you explicitly opt-in using --force-expand. When used, the tool performs a deterministic, non-standard expansion (chained hashing) to produce the requested number of bytes. This is intended for benchmarking and interoperability testing and is not a cryptographic XOF replacement.
xxh3-1024 and wyhash-1024 are non-cryptographic options that expand fast hashes into 1024-bit digests via deterministic counters, suitable for high-speed comparisons/benchmarks instead of integrity/security guarantees.
- Rust 1.75+ (edition 2021) and Cargo.
- Windows, macOS, or Linux. The code avoids platform-specific assumptions.
We publish cross-platform release artifacts on GitHub Releases. Each automatic release uses the commit short SHA as the tag (for example v1a2b3c4) and artifacts are produced per target with names like hash-folderoo-<target>-<release>.tar.gz or .zip.
Examples (replace vSHORT with the release tag you want):
Linux x86_64
curl -Lo hash-folderoo-vSHORT.tar.gz \
https://github.com/supermarsx/hash-folderoo/releases/download/vSHORT/hash-folderoo-x86_64-unknown-linux-gnu-vSHORT.tar.gz
tar -xzf hash-folderoo-vSHORT.tar.gz
sudo install -m 755 hash-folderoo /usr/local/bin/macOS (Intel / Apple Silicon)
curl -Lo hash-folderoo-vSHORT.tar.gz \
https://github.com/supermarsx/hash-folderoo/releases/download/vSHORT/hash-folderoo-x86_64-apple-darwin-vSHORT.tar.gz
tar -xzf hash-folderoo-vSHORT.tar.gz
sudo install -m 755 hash-folderoo /usr/local/bin/
# or to install into homebrew prefix on Apple Silicon:
sudo install -m 755 hash-folderoo /opt/homebrew/bin/Windows (PowerShell)
$tag = 'vSHORT'
Invoke-WebRequest -Uri "https://github.com/supermarsx/hash-folderoo/releases/download/$tag/hash-folderoo-x86_64-pc-windows-msvc-$tag.zip" -OutFile zipfile.zip
Expand-Archive zipfile.zip -DestinationPath .\dist
# Move the executable somewhere in PATH, e.g. $HOME\bin
Move-Item -Path .\dist\hash-folderoo.exe -Destination $HOME\bin\hash-folderoo.exeWe recommend verifying checksums/signatures before installing. CI can be extended to generate checksums for each artifact and attach them to the release.
Homebrew / Brew (macOS & Linuxbrew)
If the project provides a Homebrew Tap (see packaging/homebrew) you can install directly with:
# tap repo and install (replace tap name if different)
brew tap supermarsx/homebrew-tap
brew install supermarsx/tap/hash-folderooScoop (Windows)
If the project provides a Scoop bucket (see packaging/scoop) you can install with:
# add a bucket and install (example bucket URL)
scoop bucket add hash-folderoo https://github.com/supermarsx/scoop-bucket
scoop install hash-folderooIf you prefer to build locally (development or debugging), you can still use Cargo:
# Build release binary at target/release/hash-folderoo
cargo build --release
# Install into your cargo bin directory
cargo install --path .hash-folderoo hashmap \
--path ./sample-data \
--output snapshots/sample.json \
--format json \
--algorithm blake3 \
--xof-length 64 \
--exclude "**/.git/**" \
--progress \
--mem-mode balanced- Paths recorded in the map are relative unless
--strip-prefixis used. - Streams files through a bounded buffer pool so even booster mode stays bounded by
--max-ram.
hash-folderoo compare \
--source snapshots/sample.json \
--target ./backup-drive \
--format json- Both
--sourceand--targetaccept JSON/CSV maps or directories. When directories are given, the tool hashes them on the fly with the chosen algorithm. - Output as JSON (structured
ComparisonReport) or CSV (flattened rows with status columns).
# Preview (default)
hash-folderoo copydiff --source ./src --target ./dst --algorithm blake3
# Execute plan with conflict handling
hash-folderoo copydiff \
--source ./src \
--target ./dst \
--execute \
--conflict rename \
--preserve-times- Without
--executethe plan is printed (dry-run). Add--executeto copy files. --plan <file>lets you feed an existing JSON plan (matching theCopyPlanschema) instead of computing a diff.
hash-folderoo removempty --path ./tmp --min-empty-depth 2 --dry-runGlob exclusions are relative to the provided root (e.g., --exclude "**/node_modules/**").
hash-folderoo renamer --path ./photos --pattern "-draft->" --dry-runPatterns follow old->new; omitting -> means "replace with nothing".
hash-folderoo report \
--input snapshots/sample.json \
--format json \
--include stats,duplicates,largest \
--top-n 10Reports compute totals, duplicate groups, wasted bytes, top extensions, and largest files. Text format prints a human summary; JSON is structured for automation.
hash-folderoo benchmark --algorithm all --size 134217728 # 128 MiB bufferUse this to gauge algorithm speed on your hardware.
hash-folderoo --alg-listOutputs default digest lengths, whether the algorithm is cryptographic, and XOF capabilities.
hash-folderoo merges configuration from several locations (lowest to highest precedence):
/etc/hash-folderoo/{config.toml,config.yaml,config.json}${XDG_CONFIG_HOME:-~/.config}/hash-folderoo/*./config.{toml,yaml,json}in the current working directoryHASH_FOLDEROO_CONFIG=/path/to/config.{toml,yaml,json}- Environment variables (
HASH_FOLDEROO_*) - CLI flags
Each file can be TOML, YAML, or JSON. Unsupported keys are ignored.
[general]
path = "D:/datasets"
output = "snapshots/datasets.json"
format = "json"
threads = 8
exclude = ["**/.git/**", "target/**"]
follow_symlinks = false
progress = true
[algorithm]
name = "shake256"
xof_length = 64
[memory]
mode = "stream" # stream | balanced | booster
max_ram = 2147483648 # 2 GiB| Section | Keys | Notes |
|---|---|---|
[general] |
path (string), output (string), format (json or csv), threads (u32 > 0), strip_prefix (string), depth (u32 > 0), exclude (array of globs), follow_symlinks (bool), progress (bool), dry_run (bool) |
Matches CLI flags for hashmap; invalid formats or zero-valued counts are rejected during config validation. |
[algorithm] |
name (string), xof_length (bytes > 0) |
name must map to a supported algorithm (blake3, blake2b, blake2bp, shake256, turboshake256, k12, …). |
[memory] |
mode (stream, balanced, or booster), max_ram (bytes > 0) |
Controls the buffer-plan recommender; invalid modes result in a startup error. |
Configs loaded from /etc, $XDG_CONFIG_HOME, the project directory, env overrides, and --config all go through the same validator so mistakes are caught early.
| Variable | Meaning |
|---|---|
HASH_FOLDEROO_CONFIG |
Absolute path to a config file to merge last. |
HASH_FOLDEROO_PATH, HASH_FOLDEROO_OUTPUT, HASH_FOLDEROO_FORMAT |
Override the corresponding general settings. |
HASH_FOLDEROO_THREADS, HASH_FOLDEROO_DEPTH, HASH_FOLDEROO_STRIP_PREFIX |
Numeric overrides for CLI-style options. |
HASH_FOLDEROO_EXCLUDE |
Comma-separated glob list (e.g., target/**,**/.git/**). |
HASH_FOLDEROO_FOLLOW_SYMLINKS, HASH_FOLDEROO_PROGRESS, HASH_FOLDEROO_DRY_RUN |
Boolean toggles (true/false, 1/0, on/off). |
HASH_FOLDEROO_ALG, HASH_FOLDEROO_XOF_LENGTH |
Select hashing backend and output length (bytes). |
HASH_FOLDEROO_MEMORY_MODE, HASH_FOLDEROO_MAX_RAM |
Tune memory mode and total buffer budget (bytes). |
Hash maps written by the hashmap command follow this JSON shape:
CSV output contains the same fields (path,hash,size,mtime) and is always sorted by path for deterministic diffs.
compare JSON output matches compare::ComparisonReport with arrays identical, changed, moved, missing, and new. CSV output flattens each row with a status column so it can be consumed by spreadsheets.
copydiff plans are serialized as:
{ "ops": [ { "src": "/src/file.txt", "dst": "/dst/file.txt", "op": "copy" } ] }report JSON output bundles the requested sections (stats, duplicates, largest) and is safe to post-process.
memory.rs encapsulates the heuristics used by the hashing pipeline:
- stream - smallest buffers (~64 KiB), half of logical CPUs, minimal RAM footprint for slow disks or low-memory machines.
- balanced (default) - moderates between throughput and memory: full logical CPUs, ~256 KiB buffers, glob prefetch disabled when RAM is tight.
- booster - aggressive parallelism (up to 2x logical CPUs) with 1 MiB buffers and directory prefetching; ideal for SSDs and generous RAM. Specify
--max-ramto keep it in check.
Use --threads and --max-ram to override the auto plan. The buffer pool enforces the byte budget so multiple commands can run concurrently without starving the system.
- Format/lint:
cargo fmtandcargo clippy --all-targets. - Tests:
cargo testcovers units plustests/cli_smoke.rs, which exercises the CLI end-to-end (hashing, comparing, copydiff, removempty, renamer, report, benchmark). - Logging is powered by
env_logger; setRUST_LOG=debugfor verbose traces while hacking. - See
spec.mdfor the long-term blueprint (extra algorithms, GUI front-ends, richer copy planners, etc.).
This repository includes GitHub Actions workflows to ensure code quality and to publish cross-platform binaries:
-
Continuous Integration (CI) -
.github/workflows/ci.ymlruns on PRs and pushes tomainand contains separate jobs for:- format checks (
cargo fmt -- --check) - linting (
cargo clippy --all-targets) - unit & integration tests (
cargo test) - matrixed release-style builds that compile the binary for common targets (Linux, macOS, Windows) and both x86_64 and aarch64 where possible.
- format checks (
-
Automated release workflow -
.github/workflows/release.ymltriggers on pushes tomainand automatically:- uses the commit short SHA as the release version (e.g.
v1a2b3c4) - builds platform-specific binaries for: Linux (x86_64/aarch64), Windows (x86_64/aarch64), macOS (x86_64/aarch64)
- uploads the compressed artifacts to a GitHub Release created for that commit
- uses the commit short SHA as the release version (e.g.
Notes:
- GitHub runner capabilities determine whether every target can be built on the runners provided; cross-compilation is attempted via rust target tooling. If a specific target cannot be produced on a runner, the release workflow will fail for that matrix cell — this ensures each published artifact was actually built on a runner that produced it.
- If you want to publish stable semver releases instead of short-sha releases, adapt
.github/workflows/release.ymlto trigger on tags (e.g., push tags) and set a semantic version.
Packaging helpers
This repo includes packaging templates to simplify adding distribution channels and CI automation for them:
packaging/scoop- a template Scoop manifest and aconfig/README explaining how to wire an entry into a Scoop bucket (Windows user installer flow).packaging/homebrew- a sample Homebrew formula andconfig/notes for publishing a tap (macOS & Linuxbrew).
The release workflow generates SHA256 checksums (*.sha256) for each uploaded artifact and an aggregated SHA256SUMS file for the release so packaging manifests can be populated automatically with the correct checksum values. The workflow will also create a branch in this repository containing updated packaging manifests (Homebrew & Scoop) with concrete versions and checksums and open a pull request so you can review and merge them (or adapt the workflow to push directly to your taps/buckets).
The current binary ships with BLAKE3, BLAKE2b, BLAKE2bp, SHAKE256, TurboSHAKE256, ParallelHash256, XXH3-1024, WyHash-1024, and KangarooTwelve hashing backends plus the core CLI workflow. The design document (spec.md) covers upcoming work such as additional algorithms (MeowHash, etc.), richer booster-mode controls, persisted copy plans, and a GUI front-end. Contributions aligning with that plan are welcome - open an issue to discuss larger changes.
This project declares a dual license in Cargo.toml: MIT OR Apache-2.0. Choose the license that best fits your needs and consult the repository owner for any redistribution policies.
{ "version": 1, "generated_by": "hash-folderoo", "timestamp": "2025-11-23T19:19:42Z", "root": "/absolute/or/relative/root", "algorithm": { "name": "blake3", "params": { "xof_length": 64 } }, "entries": [ { "path": "foo/bar.txt", "hash": "<hex>", "size": 12345, "mtime": 1700000000 }, { "path": "baz.bin", "hash": "<hex>", "size": 42 } ] }