test(perf): gate the durable per-parent outbox drain (WARM walltime + Tier 2 fsync count) by JMBattista · Pull Request #1235 · asheshgoplani/agent-deck

JMBattista · 2026-05-31T17:05:22Z

Summary

Lands the first hard-gated TestPerf_* for the durable per-parent outbox DRAIN — DrainInboxForParent, which runs on every conductor Stop hook and every heartbeat (DrainForStopHook and agent-deck inbox drain self), introduced in #1226 and activated in #1227. It was the highest-ROI item on the deferred list in #790/#1047: an unconditionally-hot lifecycle path with zero walltime gating.

Authority for everything below is docs/perf-budget-suite.md (the relocated #790/#1047 mandate): cold/warm classification, the WarmBudget formula, the n=11 trimmed mean, the 1ms floor, and the Tier 1 / Tier 2 split for I/O-touching tests.

What the drain does (path under test)

DrainInboxForParent is purely in-process and tmux-free. Two-phase, crash-safe:

Stage — recover any in-flight WAL, read the inbox, durably stage the union to the WAL (writeFileDurable), then truncate the inbox.
Finalize — collapse last-wins per child, dedup against the consumed-turns ledger, persist the ledger (writeFileDurable), drop the WAL.

The durable backing is *os.File (writeFileDurable: fsync data file → atomic rename → best-effort fsync parent dir), so both perf-budget tiers apply.

The two gates (one per tier)

Tier 1 — `TestPerf_OutboxDrain_Drain25` (WARM walltime)

TrimmedMeanWithSetup: setup() rebuilds a fresh inbox of N=25 distinct-child records and wipes the WAL + consumed ledger each iteration (so every timed drain hits the identical steady-state path); op() times DrainInboxForParent only.
N=25 is not invented — it's the fan-out the live drain stress test TestIssue1225_DrainInbox_ConcurrentDrainNoDoubleNoLoss fixes at const n = 25; this reuses childIDForN, the same id generator.
Classified WARM because fsync is stubbed to a no-op via the new seam, so the timed window is pure CPU + Go-runtime against the page cache — the WARM "disk speed is irrelevant" contract holds on any filesystem (more robust than relying on t.TempDir being tmpfs, which it isn't on e.g. WSL2/ext4). WARM GC discipline is composed onto TrimmedMeanWithSetup (auto-GC off + forced collection inside setup()), exactly as perfbudget.go instructs.
No real tmux, no real fsync, no child process. State is scoped to t.TempDir via inboxTestHome (HOME override).

Tier 2 — `TestPerf_OutboxDrain_FsyncCount` (count assertion)

Disk fsync latency is un-normalizable (~100× across SSD/HDD/cloud), so the I/O-pattern regression is gated on counts, per the doc. Derived by reading the drain code, not recording observed values: a non-empty drain of distinct-child, not-yet-consumed records issues exactly 2 file fsyncs + 2 directory fsyncs (WAL stage + ledger persist, each batched into one writeFileDurable), independent of N. The test asserts the count is identical for N=1 and N=25 — a fsync-per-record regression would scale the file count with N. (The inbox truncate and WAL drop are os.Remove, no fsync.)

Supporting infrastructure

internal/session/inbox.go — adds the fsyncFile seam (default (*os.File).Sync, zero production overhead) routing writeFileDurable + fsyncDir, plus SetFsyncHookForTest. This is the *os.File seam the doc explicitly anticipates for Tier 2 gates. No production behavior change.
internal/testutil/fsynccounter.go — reusable FsyncCounter wrapper bucketing file vs directory syncs. Landed as shared testutil (not a one-off) because the doc frames the wrapper as infrastructure for every future persistence-touching Tier 2 gate.
Makefile + docs/perf-budget-suite.md — align make test-perf scope to ./... (matching .github/workflows/perf-smoke.yml, which already uses ./...) so a TestPerf_* in any package — like this internal/session gate — is exercised, not just cmd/agent-deck.

Observed local median & budgets

Measured under -race, PERF_BUDGET_MULTIPLIER=1.0, Linux WSL2 container, N=25, fsync no-op'd (disk-independent):

	value
Observed trimmed-mean median	~2.1ms (six runs spanned 1.99–2.21ms)
`base` constant	`2.1ms`
WarmBudget @ mult 1.0 (local)	`base × 3` = 6.3ms
WarmBudget @ mult 2.0 (CI)	`base × 3 × 2` = 12.6ms

The base cites the observed median directly (per the mandate), with the WARM 3× factor for residual warm-host jitter.

Verification (Linux WSL2; template per #1047)

Check	Result
`make test-perf` (mult 1.0) — outbox gates	PASS — drain trimmed mean 1.95–2.71ms (budget 6.3ms); fsync 2 file + 2 dir at N=1 and N=25
`PERF_BUDGET_MULTIPLIER=2.0 make test-perf` — outbox gates	PASS — drain 2.00ms (budget 12.6ms); fsync 2+2
`go test -race -count=1 ./internal/session/` (inbox/drain/outbox/stophook + perf)	PASS
`go test -race -count=1 ./internal/testutil/`	PASS
`go build ./...`	clean
`gofmt -l` (changed files)	clean

Known caveats (both pre-existing, both proven on the clean tree with this PR stashed — not introduced here):

TestPerf_ColdStart_{Help,Version} fail locally on this WSL2 box (cold start ~300–330ms vs the 40ms budget) — a slow-process-spawn environment artifact; the cold-start budget holds on CI's ubuntu-latest. Untouched per the non-goals.
The internal/session status/lifecycle sleep-timing tests (TestStatusCycle_ShellSessionWithCommand, TestLifecycle_StoppedRestartedRunningError) flake under the loaded full-package -race run; they pass in isolation and fail identically on the clean tree. Unrelated to the fsync seam.

Non-goals honored

No cold-start changes; no existing budget widened; no benchstat/trending infra; no real tmux in any TestPerf_*; the 1ms floor and n=11 convention are untouched.

🤖 Generated with Claude Code

Summary by CodeRabbit

Tests
- Introduced comprehensive performance regression testing infrastructure for message handling operations.
- Added dedicated performance tests for outbox drain functionality with standardized measurement patterns.
- Created testing utilities to instrument and measure file synchronization operations during performance validation.
Chores
- Expanded the test suite scope to run performance regression tests across the entire codebase rather than specific packages.

… Tier 2 fsync count) The outbox drain (DrainInboxForParent, run on every conductor Stop hook and heartbeat — asheshgoplani#1226/asheshgoplani#1227) was the highest-ROI ungated hot path from the deferred list in asheshgoplani#790/asheshgoplani#1047. This adds the first TestPerf_* gates for it, per the mandate in docs/perf-budget-suite.md. - internal/session/outbox_drain_perf_test.go: - TestPerf_OutboxDrain_Drain25 (Tier 1, WARM): TrimmedMeanWithSetup over a representative 25-child inbox (N from the live drain test's `const n = 25`). fsync is stubbed to a no-op via the new seam so the timed window is pure in-process Go cost — the WARM/tmpfs contract holds on any filesystem, not just CI tmpfs. base = 2.1ms observed local median (-race, mult 1.0); budget = 6.3ms local / 12.6ms CI. - TestPerf_OutboxDrain_FsyncCount (Tier 2, count assertion): the backing is *os.File (writeFileDurable), so the doc's count-assertion tier applies. The drain issues exactly 2 file + 2 dir fsyncs for ANY non-empty drain (WAL stage + consumed-ledger persist, each batched), proven identical for N=1 and N=25 — a fsync-per-record regression would scale the file count with N. - internal/session/inbox.go: add the `fsyncFile` seam (default (*os.File).Sync, zero prod overhead) routing writeFileDurable + fsyncDir, plus SetFsyncHookForTest. This is the seam docs/perf-budget-suite.md anticipates for Tier 2 count gates. - internal/testutil/fsynccounter.go: reusable FsyncCounter wrapper (buckets file vs directory syncs) — the shared Tier 2 instrumentation the doc frames as infrastructure for every future persistence-touching gate. - Makefile + docs/perf-budget-suite.md: align `make test-perf` scope to ./... (matching .github/workflows/perf-smoke.yml) so a TestPerf_* in any package — e.g. this internal/session gate — is exercised, not just cmd/agent-deck. No production behavior change: the default fsync seam is (*os.File).Sync. Verification: - make test-perf @1.0 and @2.0: both new gates green (drain ~2ms; fsync 2+2). - go test -race -count=1 ./internal/session/ ./internal/testutil/: green (inbox/drain/outbox/stophook + perf). Pre-existing sleep-timing flakes in the status/lifecycle tests reproduce identically on the clean tree and are unrelated to this change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-05-31T17:05:34Z

📝 Walkthrough

Walkthrough

This PR adds performance and I/O regression tests for the outbox drain path by introducing fsync instrumentation, a test hook seam in the inbox durability layer, and two perf tests that verify both walltime bounds and fsync count invariants. The test-perf Makefile target is expanded to discover these tests repository-wide.

Changes

Outbox drain perf instrumentation

Layer / File(s)	Summary
Fsync instrumentation utility `internal/testutil/fsynccounter.go`	`FsyncCounter` provides concurrent-safe counting of file vs. directory fsync calls, with `Wrap()` wrapping an fsync function, `FileSyncs()`/`DirSyncs()` accessors, and `Reset()` for test setup.
Inbox fsync hook seam `internal/session/inbox.go`	Package-level `fsyncFile` variable and exported `SetFsyncHookForTest` function allow tests to intercept fsync; the seam is wired into `writeFileDurable` and `fsyncDir` durability calls.
Outbox drain perf and I/O regression tests `internal/session/outbox_drain_perf_test.go`	`TestPerf_OutboxDrain_Drain25` measures trim-mean walltime of draining 25 children under stubbed fsync; `TestPerf_OutboxDrain_FsyncCount` verifies constant file/directory sync counts regardless of fanout; `populateInboxForDrain` prepares clean inbox state.
Makefile test-perf scope expansion `Makefile`	`test-perf` target scope changed from `./cmd/agent-deck/...` to `./...` with clarifying comments on `TestPerf_*` discovery and filtering.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 7

✅ Passed checks (7 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title follows Conventional Commits format with 'test(perf):' prefix and clearly describes the main change: adding performance gates for the durable per-parent outbox drain with walltime and fsync count measurements.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 70.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Remote_parity	✅ Passed	PR does not modify internal/ui/ or cmd/agent-deck/ directories; changes are limited to internal/session/, internal/testutil/, and Makefile, so remote_parity check is not applicable.
Test_coverage_per_surface	✅ Passed	PR adds only test infrastructure (performance tests, test utilities, test hooks) with no new user-facing features requiring multi-surface testing.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/session/inbox.go`:
- Around line 57-61: The current SetFsyncHookForTest swaps the global fsyncFile
seam unsafely which races with concurrent durable-write code like
writeFileDurable; update the comment to state the real constraint (durable-write
code must not run concurrently with the seam swap) and make the swap
concurrency-safe by protecting fsyncFile access: either change
SetFsyncHookForTest to swap via a package-level sync.Mutex (or use
atomic.Value/unsafe.Pointer) and ensure writeFileDurable reads the seam under
the same synchronization mechanism, or document that tests must synchronize with
that mutex before invoking durable-write paths; reference SetFsyncHookForTest,
fsyncFile and writeFileDurable when applying the change.

In `@internal/testutil/fsynccounter.go`:
- Around line 60-67: The current check conflates Stat errors with regular files;
update the fsync counting in the method that calls f.Stat() to treat a Stat
error separately by adding a new counter (e.g., c.unknownSyncs) to the
fsynccounter struct and incrementing it when f.Stat() returns an error, while
keeping the existing increments of c.dirSyncs and c.fileSyncs only when err ==
nil (dir vs non-dir); ensure you guard unknownSyncs with c.mu like the others
and add a short comment documenting that unknownSyncs represents cases where
Stat failed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: bfaf94cf-8382-43f6-b6fa-9ca7f789850e

📥 Commits

Reviewing files that changed from the base of the PR and between f6bc0f5 and e2c74c2.

⛔ Files ignored due to path filters (1)

docs/perf-budget-suite.md is excluded by !**/*.md

📒 Files selected for processing (4)

Makefile
internal/session/inbox.go
internal/session/outbox_drain_perf_test.go
internal/testutil/fsynccounter.go

coderabbitai · 2026-05-31T17:11:22Z

+func SetFsyncHookForTest(fn func(*os.File) error) func() {
+	prev := fsyncFile
+	fsyncFile = fn
+	return func() { fsyncFile = prev }
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Verify no tests in internal/session call t.Parallel, which would # race with the global fsyncFile swap. rg -nP --type=go -C2 '\bt\.Parallel\s*\(' internal/session/

Repository: asheshgoplani/agent-deck

Length of output: 2890

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Find fsyncFile declaration/uses ==" rg -n --type=go "var\s+fsyncFile|fsyncFile\s*=" internal/session/ echo echo "== Find SetFsyncHookForTest call sites in internal/session tests ==" rg -n --type=go "SetFsyncHookForTest\s*\(" internal/session/ echo echo "== Find where fsyncFile is passed/called ==" rg -n --type=go "\bfsyncFile\b" internal/session/

Repository: asheshgoplani/agent-deck

Length of output: 1117

🏁 Script executed:

#!/bin/bash set -euo pipefail # Narrow to likely relevant files to avoid noise echo "== List go files under internal/session ==" fd --type f --extension go internal/session/ | sort echo echo "== Inspect internal/session/inbox.go around fsyncFile + SetFsyncHookForTest ==" # show around line ~40-90; adjust if file shorter sed -n '1,140p' internal/session/inbox.go | cat -n

Repository: asheshgoplani/agent-deck

Length of output: 503

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find test files under internal/session and show t.Parallel presence echo "== Go test files under internal/session ==" fd --type f --extension go internal/session/ -a | rg "_test\.go$" | sort echo echo "== Search for SetFsyncHookForTest in *_test.go ==" rg -n --type=go "SetFsyncHookForTest\s*\(" internal/session/ -g "*_test.go" echo echo "== Search for durable-write / fsync hooks usage hints ==" rg -n --type=go "fsync|Fsync|durable|durability" internal/session/

Repository: asheshgoplani/agent-deck

Length of output: 503

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect tests that have t.Parallel and see if they touch inbox or durable-write paths. # We'll just output surrounding blocks for t.Parallel occurrences in instance_test.go and other tests. for f in $(fd --type f -g "*_test.go" internal/session/ | sort); do echo "---- $f ----" rg -n --type=go "\bt\.Parallel\s*\(" "$f" -n || true echo done

Repository: asheshgoplani/agent-deck

Length of output: 10525

🏁 Script executed:

#!/bin/bash set -euo pipefail # Specifically inspect internal/session/instance_test.go around the t.Parallel sections near the earlier lines. sed -n '2920,3100p' internal/session/instance_test.go | cat -n

Repository: asheshgoplani/agent-deck

Length of output: 6774

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== List go files under internal/session ==" fd --type f --extension go internal/session/ | sort echo echo "== Find fsyncFile declaration/uses ==" rg -n --type=go "var\s+fsyncFile|fsyncFile\s*=" internal/session/ echo echo "== Find SetFsyncHookForTest call sites in internal/session (all go files) ==" rg -n --type=go "SetFsyncHookForTest\s*\(" internal/session/ echo echo "== Find where fsyncFile is referenced (all go files) ==" rg -n --type=go "\bfsyncFile\b" internal/session/ echo echo "== Read internal/session/inbox.go (first 160 lines) ==" sed -n '1,160p' internal/session/inbox.go | cat -n echo echo "== Search for fsync/durable hints across internal/session ==" rg -n --type=go "fsync|Fsync|durable|durability" internal/session/ echo echo "== t.Parallel occurrences in *_test.go under internal/session ==" for f in $(fd --type f -g "*_test.go" internal/session/ | sort); do echo "---- $f ----" rg -n --type=go "\bt\.Parallel\s*\(" "$f" || true done echo echo "== Inspect internal/session/instance_test.go around earlier t.Parallel lines ==" sed -n '2930,3105p' internal/session/instance_test.go | cat -n

Repository: asheshgoplani/agent-deck

Length of output: 503

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show additional context around SetFsyncHookForTest definition in inbox.go echo "== Context around SetFsyncHookForTest in inbox.go ==" rg -n --type=go "func\s+SetFsyncHookForTest" internal/session/inbox.go line=$(rg -n --type=go "func\s+SetFsyncHookForTest" internal/session/inbox.go | head -n1 | cut -d: -f1) start=$((line-20)) end=$((line+40)) sed -n "${start},${end}p" internal/session/inbox.go | cat -n

Repository: asheshgoplani/agent-deck

Length of output: 3568

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect where fsyncFile is called in codebase for durability path. # Show around each match of fsyncFile usage with context lines. rg -n --type=go "\bfsyncFile\b" internal/session/ | awk -F: '{print $1}' | sort -u | while read -r f; do echo "---- file: $f ----" done rg -n --type=go "\bfsyncFile\b" internal/session/ -C2

Repository: asheshgoplani/agent-deck

Length of output: 1821

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find any tests that call SetFsyncHookForTest and show surrounding code. # We'll print the enclosing function for each call by using rg with context. rg -n --type=go "SetFsyncHookForTest\s*\(" internal/session -C3 -g "*_test.go"

Repository: asheshgoplani/agent-deck

Length of output: 1227

🏁 Script executed:

#!/bin/bash set -euo pipefail # Also check if any test files are in package session vs session_test, which affects access to SetFsyncHookForTest. for f in $(fd --type f -g "*_test.go" internal/session/ | sort); do pkg=$(head -n 5 "$f" | rg -n "package " | head -n1 || true) echo "$f: ${pkg:-package <unknown>}" done

Repository: asheshgoplani/agent-deck

Length of output: 11744

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Find fsyncFile declaration/uses ==" rg -n --type=go "var\s+fsyncFile|fsyncFile\s*=" internal/session/ || true echo echo "== Find SetFsyncHookForTest call sites (all go files) ==" rg -n --type=go "SetFsyncHookForTest\s*\(" internal/session/ || true echo echo "== Find where fsyncFile is referenced (all go files) ==" rg -n --type=go "\bfsyncFile\b" internal/session/ -C2 || true echo echo "== Read internal/session/inbox.go (first 200 lines) ==" sed -n '1,200p' internal/session/inbox.go | cat -n echo echo "== Search for fsync/durable hints across internal/session ==" rg -n --type=go "fsync|Fsync|durable|durability" internal/session/ || true echo echo "== t.Parallel occurrences in internal/session tests ==" rg -n --type=go "\bt\.Parallel\s*\(" internal/session/ -g "*_test.go" -C1 || true echo echo "== Inspect internal/session/instance_test.go around t.Parallel lines ==" sed -n '2920,3105p' internal/session/instance_test.go | cat -n

Repository: asheshgoplani/agent-deck

Length of output: 36696

Fix the “package tests run serially” assumption in SetFsyncHookForTest

internal/session/inbox.go claims tests are serial because perf tests don’t use t.Parallel, but internal/session/instance_test.go has multiple t.Parallel() tests/subtests (e.g., TestWrapIgnoreSuspend, TestNewSandboxConfig), so the package isn’t fully serial.

Since SetFsyncHookForTest swaps the global fsyncFile seam and writeFileDurable uses it, any future t.Parallel test that exercises durable-write paths would introduce a race with the global swap.

Update the comment to reflect the actual constraint (durable-write code must not run concurrently with the perf gate’s seam swap), and/or make the seam swap safe under concurrency (e.g., atomic/synchronized seam access).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/session/inbox.go` around lines 57 - 61, The current SetFsyncHookForTest swaps the global fsyncFile seam unsafely which races with concurrent durable-write code like writeFileDurable; update the comment to state the real constraint (durable-write code must not run concurrently with the seam swap) and make the swap concurrency-safe by protecting fsyncFile access: either change SetFsyncHookForTest to swap via a package-level sync.Mutex (or use atomic.Value/unsafe.Pointer) and ensure writeFileDurable reads the seam under the same synchronization mechanism, or document that tests must synchronize with that mutex before invoking durable-write paths; reference SetFsyncHookForTest, fsyncFile and writeFileDurable when applying the change.

coderabbitai · 2026-05-31T17:11:23Z

+		if info, err := f.Stat(); err == nil && info.IsDir() {
+			c.mu.Lock()
+			c.dirSyncs++
+			c.mu.Unlock()
+		} else {
+			c.mu.Lock()
+			c.fileSyncs++
+			c.mu.Unlock()


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify or handle Stat error separately from file-type bucketing.

The current logic defaults to fileSyncs when f.Stat() fails. A Stat error (e.g., EBADF) doesn't indicate the descriptor's file-vs-directory type, so counting it as a file sync may misattribute the operation. Consider either:

Skipping the count when Stat fails (document that as "unknown type"), or

Adding a third counter for "unknownSyncs", or

Documenting explicitly that Stat failures default to file syncs as a benign simplification.

The current implementation works in practice (Stat on valid open descriptors won't fail in normal test scenarios), but the conflation of "not a directory" and "Stat error" is subtle.

🛡️ Proposed fix to handle Stat errors explicitly

- if info, err := f.Stat(); err == nil && info.IsDir() { + info, err := f.Stat() + if err != nil { + // Stat failed; can't determine file vs dir. Skip counting, or + // default to fileSyncs as a simplification (document the choice). + // For now, skip to avoid misattribution: + return real(f) + } + if info.IsDir() { c.mu.Lock() c.dirSyncs++ c.mu.Unlock()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/testutil/fsynccounter.go` around lines 60 - 67, The current check conflates Stat errors with regular files; update the fsync counting in the method that calls f.Stat() to treat a Stat error separately by adding a new counter (e.g., c.unknownSyncs) to the fsynccounter struct and incrementing it when f.Stat() returns an error, while keeping the existing increments of c.dirSyncs and c.fileSyncs only when err == nil (dir vs non-dir); ensure you guard unknownSyncs with c.mu like the others and add a short comment documenting that unknownSyncs represents cases where Stat failed.

JMBattista · 2026-05-31T17:34:47Z

I'm ignoring the playwright test breakage, as an investigation shows they were broken for a while.

Copilot AI review requested due to automatic review settings May 31, 2026 17:05

JMBattista requested a review from asheshgoplani as a code owner May 31, 2026 17:05

coderabbitai Bot reviewed May 31, 2026

View reviewed changes

JMBattista mentioned this pull request May 31, 2026

test(e2e): fix broken Playwright tests on main (#1235 example) #1236

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(perf): gate the durable per-parent outbox drain (WARM walltime + Tier 2 fsync count)#1235

test(perf): gate the durable per-parent outbox drain (WARM walltime + Tier 2 fsync count)#1235
JMBattista wants to merge 1 commit into
asheshgoplani:mainfrom
JMBattista:feature/outbox-drain-perf-gate

JMBattista commented May 31, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 31, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 31, 2026

Uh oh!

coderabbitai Bot May 31, 2026

Uh oh!

JMBattista commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JMBattista commented May 31, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What the drain does (path under test)

The two gates (one per tier)

Tier 1 — TestPerf_OutboxDrain_Drain25 (WARM walltime)

Tier 2 — TestPerf_OutboxDrain_FsyncCount (count assertion)

Supporting infrastructure

Observed local median & budgets

Verification (Linux WSL2; template per #1047)

Non-goals honored

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

JMBattista commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JMBattista commented May 31, 2026 •

edited by coderabbitai Bot

Loading

Tier 1 — `TestPerf_OutboxDrain_Drain25` (WARM walltime)

Tier 2 — `TestPerf_OutboxDrain_FsyncCount` (count assertion)

coderabbitai Bot commented May 31, 2026 •

edited

Loading