Benchmark suite comparing approaches to storing revision histories. Each storage approach packs a sequence of text revisions into a single file and is measured on packed size and random read time for old revisions.
# 1. Download all three benchmark datasets
uv run download_all.py --quick
# 2. Run tests, benchmarks, and generate charts
uv run benchmark_all.pyThe benchmark suite uses three datasets with different revision characteristics:
| Dataset | Source | Revisions | Character |
|---|---|---|---|
George_W._Bush |
Wikipedia API | ~15,000 | Prose, many small edits |
yahoo.com |
Wayback Machine | ~13,500 | HTML, large structural changes |
btrfs_inode.c |
Linux kernel git | ~2,700 | C source code, steady growth |
Download all three with a single command:
uv run download_all.pyThis will take a long time (potentially hours) due to API rate limits. The archive.org API in particular has a conservative wait time between calls and is slow. All downloads support --resume, so you can interrupt and restart safely:
uv run download_all.py --resumeTo do a relatively quick smoke test (full btrfs, Wikipedia and Wayback capped at 1,000 revisions):
uv run download_all.py --quickRun the full pipeline (tests, benchmarks, charts, and tables) with:
uv run benchmark_all.pyUse --parallel to pack all backends in parallel, then benchmark reads separately. You lose packfile generation time and memory use though:
uv run benchmark_all.py --parallelFilter to specific approaches with --approaches (case-insensitive substring match):
uv run benchmark_all.py --approaches "revlog/lz4/fossil,naive/zlib"An interactive web UI lets you browse revision histories and compare how each storage backend reconstructs any revision. It requires pre-built web cache data for each dataset.
# Prepare cached data for a single dataset (run once, or after dataset changes)
cd rust && cargo run --release --bin prepare_web_cache -- ../revisions/George_W._Bush
# Start the server (defaults to port 8080)
cd rust && cargo run --release --bin web_server -- ../revisionsFor production deployment, deploy.sh builds the binaries, prepares web cache data for all datasets, uploads everything, and restarts the server.