bench: use BatchSize::PerIteration for cold-cache variants by max-sixty · Pull Request #2728 · max-sixty/worktrunk

max-sixty · 2026-05-12T00:37:19Z

Summary

Cold benchmark variants in list.rs, remove.rs, and time_to_first_output.rs were passing BatchSize::SmallInput to iter_batched. Under SmallInput, criterion calls the setup closure once per batch up front and then runs the routines back-to-back inside a single timing window — so when setup is invalidate_caches_auto, only iter 1 per batch is actually cold and iters 2-N read from the cache the previous routine just populated. The reported "cold" medians were warm-biased averages.

BatchSize::PerIteration switches to setup → time(routine) per iter, so every measured iter is genuinely cold. The setup is far cheaper than a wt subprocess, so per-iter Instant::now overhead doesn't dominate. Codex flagged this on the sibling picker_preview bench (#2721); that PR landed PerIteration for its cold path, and this aligns the rest of the bench suite.

Measured corrections

Variant	Before median	After median	Spread (before → after)	Median Δ
`list full/cold/8`	113.5 ms	107.1 ms	16.0 → 3.9 ms	−5.7% (n.s., p=0.14)
`remove_e2e/first_output`	48.2 ms	86.4 ms	2.6 → 17.9 ms	+44.4% (p<0.001)
`first_output/remove`	50.5 ms	69.6 ms	2.4 → 0.65 ms	+23.0% (p<0.001)

remove_e2e/first_output is the starkest correction — compute_integration_lazy writes is-ancestor / has-added-changes / merge-add-probe entries, so warm-bias was substantial. first_output/remove is the cleanest demonstration of variance tightening (3.7× tighter spread). list full/cold/8 doesn't move the median significantly but the upper-tail outliers that warm-bias was inflating disappear.

Scope

Only the four iter_batched(… invalidate_caches_auto, …, BatchSize::SmallInput) call sites are switched:

benches/list.rs run_benchmark (covers skeleton, full, worktree_scaling, many_branches, divergent_branches cold variants)
benches/list.rs bench_real_repo cold path
benches/remove.rs first_output
benches/time_to_first_output.rs remove

Other BatchSize::SmallInput sites in remove.rs (no_hooks, with_hooks) use recreate_worktree as setup — that doesn't invalidate a cache the routine repopulates, so they stay as-is.

benches/CLAUDE.md "Cache Handling" gets a paragraph so future bench authors reach for PerIteration when the setup invalidates a cache the routine repopulates.

Test plan

cargo run -- hook pre-merge --yes — 3667 tests pass, 0 failed
Each cold variant run before+after on this branch with --save-baseline / --baseline

Cold benchmark variants in `list.rs`, `remove.rs`, and `time_to_first_output.rs` were passing `BatchSize::SmallInput` to `iter_batched`. Under `SmallInput`, criterion (0.8) calls the setup closure once per batch up front, then runs the routines back-to-back inside a single timing window — so when the setup is `invalidate_caches_auto`, only iter 1 per batch is actually cold and iters 2-N read from the cache the previous routine just populated. The reported "cold" medians were warm-biased averages. `BatchSize::PerIteration` switches to `setup → time(routine)` per iter, so every measurement is genuinely cold. Setup is far cheaper than a `wt` subprocess, so per-iter `Instant::now` overhead doesn't dominate. Measured corrections on this branch: - `list full/cold/8`: spread 16.0 ms → 3.9 ms (4× tighter); median −5.7% (n.s., p=0.14) — warm-bias previously inflated upper outliers. - `remove_e2e/first_output`: median 48 → 86 ms (+44%, p<0.001). The routine writes is-ancestor/has-added-changes/merge-add-probe entries via `compute_integration_lazy`, so warm-bias was substantial. - `first_output/remove`: median 50 → 70 ms (+23%, p<0.001) with spread tightening 2.4 ms → 0.65 ms (3.7×). `skeleton/*` and `worktree_scaling/*` cold variants share the `run_benchmark` helper, so they pick up the fix too. The `real_repo` group's cold branch is also switched (separate call site). Other `BatchSize::SmallInput` sites in `remove.rs` (`no_hooks`, `with_hooks`) use `recreate_worktree` as setup — that doesn't invalidate a cache the routine repopulates, so they stay as-is. Likewise `BatchSize::PerIteration` was already in use on the `picker_preview` bench (#2721); this aligns the rest of the suite. `benches/CLAUDE.md` "Cache Handling" now points future bench authors at `PerIteration` so warm-bias doesn't silently recur on new cold variants. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

) ## Summary Follow-up to #2728. `benches/alias.rs` was the last `iter_batched(invalidate_caches_auto, …, BatchSize::SmallInput)` site in the bench suite. Under `SmallInput`, criterion calls the setup closure once per batch and times the routines back-to-back inside one timing window — so only iter 1 per batch is actually cold, and iters 2-N hit the `worktrunk.default-branch` git-config entry that `build_hook_context` wrote on iter 1 (default-branch resolution is ~17ms on a synthetic repo per `benches/CLAUDE.md`). `BatchSize::PerIteration` runs `setup → time(routine)` per iter, so every measurement is genuinely cold. ## Measurement `dispatch/cold/1` (stub alias, 1 worktree): | | Before | After | |---|---|---| | Median | 23.14 ms | 23.94 ms | | Range | [22.34, 24.01] | [22.91, 25.03] | Median +6.9% (p=0.05, "Change within noise threshold"). The correction is small — alias dispatch is cheap (~23ms) and only the default-branch cache is being repopulated, so each warm iter saved ~1ms — but the direction is right and it brings the bench in line with the rest of the suite (#2728) and the `benches/CLAUDE.md` "Cache Handling" guidance. ## Test plan - [x] `cargo run -- hook pre-merge --yes` — 3684 tests pass, 0 failed - [x] `dispatch/cold/1` run before+after on this branch with `--save-baseline` / `--baseline` Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

worktrunk-bot approved these changes May 12, 2026

View reviewed changes

max-sixty merged commit d7c8428 into main May 12, 2026
34 checks passed

max-sixty deleted the bench-cold-per-iteration branch May 12, 2026 00:50

max-sixty mentioned this pull request May 12, 2026

bench(alias): use BatchSize::PerIteration for cold-cache variants #2731

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: use BatchSize::PerIteration for cold-cache variants#2728

bench: use BatchSize::PerIteration for cold-cache variants#2728
max-sixty merged 1 commit into
mainfrom
bench-cold-per-iteration

max-sixty commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

max-sixty commented May 12, 2026

Summary

Measured corrections

Scope

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants