Skip to content

bench: measure wt switch picker preview pre-compute workload#2721

Merged
max-sixty merged 1 commit into
mainfrom
picker-preview-bench
May 11, 2026
Merged

bench: measure wt switch picker preview pre-compute workload#2721
max-sixty merged 1 commit into
mainfrom
picker-preview-bench

Conversation

@max-sixty

Copy link
Copy Markdown
Owner

Summary

Why this measurement

Picker submits one preview-compute task per row to the global rayon pool. The user-visible quantity to optimize is the responsiveness window between picker launch and "all previews ready" (j/k navigation hits cached content). Option 1 from the task — headless wall clock to drain — is the cleanest measurable proxy and avoids the PTY route, which hits the documented nextest/SIGTTOU pain on shell-integration-tests. PTY-driven first-interactive-ready can be a follow-up.

Variants

  • picker_preview/warm/typical-8
  • picker_preview/cold/typical-8

Cold uses BatchSize::PerIteration (not SmallInput): SmallInput calls setup for an entire batch up front and then runs timed routines back-to-back, so only the first iter in each batch is genuinely cold — the rest hit a freshly populated .git/wt/cache/. PerIteration invalidates immediately before every measured iteration; setup is far cheaper than wt switch, so per-iter Instant::now doesn't dominate.

sample_size(10) + measurement_time(35s) per #2685's lead — slow benches don't benefit from the default 30 samples.

cfg(unix)-gated with a no-op main on Windows; the picker is Unix-only and wt switch (no args) hits the unavailable path before the env var is consulted.

Sample run

picker_preview/warm/typical-8   time:   [185.62 ms 191.72 ms 200.77 ms]
picker_preview/cold/typical-8   time:   [209.34 ms 226.23 ms 239.29 ms]

Test plan

  • cargo bench --bench picker_preview runs cleanly on both variants
  • cargo run -- hook pre-merge --yes — 3667 tests pass
  • New test_picker_preview_bench_produces_no_output asserts WORKTRUNK_PREVIEW_BENCH=1 keeps stdout/stderr empty (covers the env-gated branch, locks the no-I/O contract)
  • Smoke test: wt switch with WORKTRUNK_PREVIEW_BENCH unset still hits the TTY error path (user-visible behavior unchanged)
  • Smoke test: WORKTRUNK_PICKER_DRY_RUN=1 still emits the cache JSON dump (regression check)
  • /review-codex pass clean after iterating on three findings (packed-refs fix already on main via fix(bench): stop deleting packed-refs in invalidate_caches_auto #2697 once branch was rebased; BatchSize::PerIteration for true per-iter invalidation; cfg(unix) gate for Windows)

This was written by Claude Code on behalf of Maximilian Roos

🤖 Generated with Claude Code

Add `picker_preview` benchmark group plus a `WORKTRUNK_PREVIEW_BENCH=1`
env var that runs the picker prelude (collect, speculative spawn,
skeleton, initial + deferred precompute) and exits after
`orchestrator.wait_for_idle()` — before skim launches and before any
JSON serialization or stderr drain. Headless measurement of
"spawn → all preview tasks drained" without a PTY.

The recent unified rayon pool / SHA cache / bench-harness work
(#2662 / #2683 / #2685 / #2704) was tuned against `wt list` as a
proxy because no direct picker measurement existed; this is that.

Variants: `warm/typical-8` and `cold/typical-8`. Cold uses
`BatchSize::PerIteration` (not `SmallInput`) so every iteration
genuinely invalidates first — `SmallInput` batches setup up-front and
runs timed routines back-to-back, biasing the cold measurement warm
after the first iter in each batch repopulates the cache.

`cfg(unix)`-gated with a no-op `main` on Windows; the picker is
Unix-only and `wt switch` with no args hits the unavailable path
before the env var is consulted.

> _This was written by Claude Code on behalf of Maximilian Roos_

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@max-sixty max-sixty merged commit 80ee2cb into main May 11, 2026
35 checks passed
@max-sixty max-sixty deleted the picker-preview-bench branch May 11, 2026 21:55
max-sixty added a commit that referenced this pull request May 12, 2026
## Summary

Cold benchmark variants in `list.rs`, `remove.rs`, and
`time_to_first_output.rs` were passing `BatchSize::SmallInput` to
`iter_batched`. Under `SmallInput`, criterion calls the setup closure
once per batch up front and then runs the routines back-to-back inside a
single timing window — so when setup is `invalidate_caches_auto`, only
iter 1 per batch is actually cold and iters 2-N read from the cache the
previous routine just populated. The reported "cold" medians were
warm-biased averages.

`BatchSize::PerIteration` switches to `setup → time(routine)` per iter,
so every measured iter is genuinely cold. The setup is far cheaper than
a `wt` subprocess, so per-iter `Instant::now` overhead doesn't dominate.
Codex flagged this on the sibling `picker_preview` bench (#2721); that
PR landed `PerIteration` for its cold path, and this aligns the rest of
the bench suite.

## Measured corrections

| Variant | Before median | After median | Spread (before → after) |
Median Δ |
|---|---|---|---|---|
| `list full/cold/8` | 113.5 ms | 107.1 ms | 16.0 → 3.9 ms | −5.7%
(n.s., p=0.14) |
| `remove_e2e/first_output` | 48.2 ms | 86.4 ms | 2.6 → 17.9 ms |
**+44.4%** (p<0.001) |
| `first_output/remove` | 50.5 ms | 69.6 ms | 2.4 → 0.65 ms | **+23.0%**
(p<0.001) |

`remove_e2e/first_output` is the starkest correction —
`compute_integration_lazy` writes `is-ancestor` / `has-added-changes` /
`merge-add-probe` entries, so warm-bias was substantial.
`first_output/remove` is the cleanest demonstration of variance
tightening (3.7× tighter spread). `list full/cold/8` doesn't move the
median significantly but the upper-tail outliers that warm-bias was
inflating disappear.

## Scope

Only the four `iter_batched(… invalidate_caches_auto, …,
BatchSize::SmallInput)` call sites are switched:

- `benches/list.rs` `run_benchmark` (covers `skeleton`, `full`,
`worktree_scaling`, `many_branches`, `divergent_branches` cold variants)
- `benches/list.rs` `bench_real_repo` cold path
- `benches/remove.rs` `first_output`
- `benches/time_to_first_output.rs` `remove`

Other `BatchSize::SmallInput` sites in `remove.rs` (`no_hooks`,
`with_hooks`) use `recreate_worktree` as setup — that doesn't invalidate
a cache the routine repopulates, so they stay as-is.

`benches/CLAUDE.md` "Cache Handling" gets a paragraph so future bench
authors reach for `PerIteration` when the setup invalidates a cache the
routine repopulates.

## Test plan

- [x] `cargo run -- hook pre-merge --yes` — 3667 tests pass, 0 failed
- [x] Each cold variant run before+after on this branch with
`--save-baseline` / `--baseline`

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants