bench: measure `wt switch` picker preview pre-compute workload by max-sixty · Pull Request #2721 · max-sixty/worktrunk

max-sixty · 2026-05-11T20:16:06Z

Summary

Adds picker_preview benchmark group measuring "process spawn → all preview tasks drained" for wt switch's interactive picker.
Introduces WORKTRUNK_PREVIEW_BENCH=1, an early-exit gate inside handle_picker that runs the full prelude (collect, speculative spawn, skeleton, initial + deferred precompute, orchestrator.wait_for_idle()) and returns before skim launches or any JSON / stderr I/O. Shares the dry-run path; behavior with the env var unset is unchanged.
Closes the coverage gap behind perf(picker): stage preview pre-compute and unify rayon pool #2662 / fix(bench): stabilize list-bench fixture and record thread-count findings #2683 / perf(bench): cheaper cargo bench --bench list (lazy fixtures + lighter real-repo sampling) #2685 / perf(list): cache ahead/behind counts SHA-keyed; skip the for-each-ref walk on warm runs #2704, which were tuned against wt list as a proxy because no direct picker measurement existed.

Why this measurement

Picker submits one preview-compute task per row to the global rayon pool. The user-visible quantity to optimize is the responsiveness window between picker launch and "all previews ready" (j/k navigation hits cached content). Option 1 from the task — headless wall clock to drain — is the cleanest measurable proxy and avoids the PTY route, which hits the documented nextest/SIGTTOU pain on shell-integration-tests. PTY-driven first-interactive-ready can be a follow-up.

Variants

picker_preview/warm/typical-8
picker_preview/cold/typical-8

Cold uses BatchSize::PerIteration (not SmallInput): SmallInput calls setup for an entire batch up front and then runs timed routines back-to-back, so only the first iter in each batch is genuinely cold — the rest hit a freshly populated .git/wt/cache/. PerIteration invalidates immediately before every measured iteration; setup is far cheaper than wt switch, so per-iter Instant::now doesn't dominate.

sample_size(10) + measurement_time(35s) per #2685's lead — slow benches don't benefit from the default 30 samples.

cfg(unix)-gated with a no-op main on Windows; the picker is Unix-only and wt switch (no args) hits the unavailable path before the env var is consulted.

Sample run

picker_preview/warm/typical-8   time:   [185.62 ms 191.72 ms 200.77 ms]
picker_preview/cold/typical-8   time:   [209.34 ms 226.23 ms 239.29 ms]

Test plan

cargo bench --bench picker_preview runs cleanly on both variants
cargo run -- hook pre-merge --yes — 3667 tests pass
New test_picker_preview_bench_produces_no_output asserts WORKTRUNK_PREVIEW_BENCH=1 keeps stdout/stderr empty (covers the env-gated branch, locks the no-I/O contract)
Smoke test: wt switch with WORKTRUNK_PREVIEW_BENCH unset still hits the TTY error path (user-visible behavior unchanged)
Smoke test: WORKTRUNK_PICKER_DRY_RUN=1 still emits the cache JSON dump (regression check)
/review-codex pass clean after iterating on three findings (packed-refs fix already on main via fix(bench): stop deleting packed-refs in invalidate_caches_auto #2697 once branch was rebased; BatchSize::PerIteration for true per-iter invalidation; cfg(unix) gate for Windows)

This was written by Claude Code on behalf of Maximilian Roos

🤖 Generated with Claude Code

Add `picker_preview` benchmark group plus a `WORKTRUNK_PREVIEW_BENCH=1` env var that runs the picker prelude (collect, speculative spawn, skeleton, initial + deferred precompute) and exits after `orchestrator.wait_for_idle()` — before skim launches and before any JSON serialization or stderr drain. Headless measurement of "spawn → all preview tasks drained" without a PTY. The recent unified rayon pool / SHA cache / bench-harness work (#2662 / #2683 / #2685 / #2704) was tuned against `wt list` as a proxy because no direct picker measurement existed; this is that. Variants: `warm/typical-8` and `cold/typical-8`. Cold uses `BatchSize::PerIteration` (not `SmallInput`) so every iteration genuinely invalidates first — `SmallInput` batches setup up-front and runs timed routines back-to-back, biasing the cold measurement warm after the first iter in each batch repopulates the cache. `cfg(unix)`-gated with a no-op `main` on Windows; the picker is Unix-only and `wt switch` with no args hits the unavailable path before the env var is consulted. > _This was written by Claude Code on behalf of Maximilian Roos_ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

## Summary Cold benchmark variants in `list.rs`, `remove.rs`, and `time_to_first_output.rs` were passing `BatchSize::SmallInput` to `iter_batched`. Under `SmallInput`, criterion calls the setup closure once per batch up front and then runs the routines back-to-back inside a single timing window — so when setup is `invalidate_caches_auto`, only iter 1 per batch is actually cold and iters 2-N read from the cache the previous routine just populated. The reported "cold" medians were warm-biased averages. `BatchSize::PerIteration` switches to `setup → time(routine)` per iter, so every measured iter is genuinely cold. The setup is far cheaper than a `wt` subprocess, so per-iter `Instant::now` overhead doesn't dominate. Codex flagged this on the sibling `picker_preview` bench (#2721); that PR landed `PerIteration` for its cold path, and this aligns the rest of the bench suite. ## Measured corrections | Variant | Before median | After median | Spread (before → after) | Median Δ | |---|---|---|---|---| | `list full/cold/8` | 113.5 ms | 107.1 ms | 16.0 → 3.9 ms | −5.7% (n.s., p=0.14) | | `remove_e2e/first_output` | 48.2 ms | 86.4 ms | 2.6 → 17.9 ms | **+44.4%** (p<0.001) | | `first_output/remove` | 50.5 ms | 69.6 ms | 2.4 → 0.65 ms | **+23.0%** (p<0.001) | `remove_e2e/first_output` is the starkest correction — `compute_integration_lazy` writes `is-ancestor` / `has-added-changes` / `merge-add-probe` entries, so warm-bias was substantial. `first_output/remove` is the cleanest demonstration of variance tightening (3.7× tighter spread). `list full/cold/8` doesn't move the median significantly but the upper-tail outliers that warm-bias was inflating disappear. ## Scope Only the four `iter_batched(… invalidate_caches_auto, …, BatchSize::SmallInput)` call sites are switched: - `benches/list.rs` `run_benchmark` (covers `skeleton`, `full`, `worktree_scaling`, `many_branches`, `divergent_branches` cold variants) - `benches/list.rs` `bench_real_repo` cold path - `benches/remove.rs` `first_output` - `benches/time_to_first_output.rs` `remove` Other `BatchSize::SmallInput` sites in `remove.rs` (`no_hooks`, `with_hooks`) use `recreate_worktree` as setup — that doesn't invalidate a cache the routine repopulates, so they stay as-is. `benches/CLAUDE.md` "Cache Handling" gets a paragraph so future bench authors reach for `PerIteration` when the setup invalidates a cache the routine repopulates. ## Test plan - [x] `cargo run -- hook pre-merge --yes` — 3667 tests pass, 0 failed - [x] Each cold variant run before+after on this branch with `--save-baseline` / `--baseline` Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

worktrunk-bot approved these changes May 11, 2026

View reviewed changes

max-sixty merged commit 80ee2cb into main May 11, 2026
35 checks passed

max-sixty deleted the picker-preview-bench branch May 11, 2026 21:55

max-sixty mentioned this pull request May 12, 2026

bench: use BatchSize::PerIteration for cold-cache variants #2728

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: measure `wt switch` picker preview pre-compute workload#2721

bench: measure `wt switch` picker preview pre-compute workload#2721
max-sixty merged 1 commit into
mainfrom
picker-preview-bench

max-sixty commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

max-sixty commented May 11, 2026

Summary

Why this measurement

Variants

Sample run

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants