test(perf): Tier 1 WARM suite — statedb CRUD + storage-mediated group lifecycle#1234
test(perf): Tier 1 WARM suite — statedb CRUD + storage-mediated group lifecycle#1234JMBattista wants to merge 2 commits into
Conversation
…ycle Adds the deferred Tier 1 (tmpfs walltime) perf gates from PR asheshgoplani#790/asheshgoplani#1047 that suit CPU/Go-runtime cost measurement, per docs/perf-budget-suite.md ("Tier 1 vs Tier 2"). All run against t.TempDir() (tmpfs on CI Linux), classify as WARM, and use WarmBudget + TrimmedMeanWarm / TrimmedMeanWithSetup. New tests (local medians under -race @ multiplier=1.0): - internal/statedb (StateDB CRUD scaling, N=50 cited deck size): - TestPerf_StateDB_InsertN ~206ms - TestPerf_StateDB_ReadByID ~9.3ms - TestPerf_StateDB_List ~6.9ms - TestPerf_StateDB_DeleteN ~166ms - TestPerf_StateDB_WatcherEventIngest ~1.42s (insert+prune at 500-event steady state) - internal/session (storage-mediated group lifecycle, asheshgoplani#790's carve-out): - TestPerf_Group_CreateDelete ~8.5ms (SaveGroupsOnly round trip) Each cites its measured median next to the base constant and a file-level WARM-not-COLD justification. No real tmux, no network, pure-Go in-process SQLite on tmpfs. Also broadens `make test-perf` from ./cmd/agent-deck/... to ./... so these internal-package gates run locally, matching the perf-smoke.yml CI gate which already runs the whole module. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds Tier-1 warm performance tests: five StateDB perf gates (insert, read-by-ID, list, delete, watcher-event ingest), one Session group create-delete perf test, and expands Makefile ChangesPerformance Test Suite Expansion
Sequence Diagram(s)sequenceDiagram
participant Test as TestPerf_StateDB_*
participant StateDB as StateDB
participant Testutil as testutil.TrimmedMean*
Test->>StateDB: newPerfDB (migrate, open temp DB)
Test->>StateDB: perfRows / SaveInstance / LoadInstances / DeleteInstance (setup)
Test->>Testutil: TrimmedMean* (measure)
Testutil->>StateDB: invoke operation (timed call)
Testutil-->>Test: trimmed mean result
Test->>Test: compare against WarmBudget
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 7✅ Passed checks (7 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds Tier-1 WARM performance gate tests for SQLite-backed persistence paths (StateDB CRUD, watcher event ingestion, and storage-mediated group lifecycle), and updates the local perf test make target to execute perf gates across all packages.
Changes:
- Added StateDB Tier-1 perf tests covering insert/read/list/delete and watcher event ingest+prune paths.
- Added Session/Storage Tier-1 perf test covering group create→persist→delete→persist lifecycle.
- Updated
make test-perfto runTestPerf_*across./...so new perf tests in any package run locally like CI.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| internal/statedb/statedb_perf_test.go | Introduces Tier-1 WARM perf gates for StateDB CRUD and watcher event ingest+prune. |
| internal/session/group_perf_test.go | Adds Tier-1 WARM perf gate for storage-mediated group create/delete persistence round trip. |
| Makefile | Expands test-perf scope to run perf tests across the whole module. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| tx, err := db.DB().Begin() | ||
| if err != nil { | ||
| t.Fatalf("begin prefill: %v", err) | ||
| } | ||
| stmt, err := tx.Prepare(`INSERT INTO watcher_events (watcher_id, dedup_key, created_at) VALUES (?, ?, ?)`) | ||
| if err != nil { | ||
| t.Fatalf("prepare prefill: %v", err) | ||
| } | ||
| now := time.Now().Unix() | ||
| for i := 0; i < perfWatcherSteadyState; i++ { | ||
| if _, err := stmt.Exec(watcherID, fmt.Sprintf("base-%05d", i), now+int64(i)); err != nil { | ||
| t.Fatalf("prefill exec: %v", err) | ||
| } | ||
| } | ||
| _ = stmt.Close() | ||
| if err := tx.Commit(); err != nil { | ||
| t.Fatalf("commit prefill: %v", err) | ||
| } |
…aphy The first version called Storage.SaveGroupsOnly — but that is the lightweight expand/collapse visual-state path (skips the instances round-trip and Touch()). The actual user-facing handlers (cmd/agent-deck/group_cmd.go handleGroupCreate / handleGroupDelete) instead do: LoadWithGroups → NewGroupTreeWithGroups → CreateGroup/DeleteGroup → SaveWithGroups where the dominant cost is the full instances-table read + rewrite (+ Touch + dedup). The old test gated a cheaper sibling op and would miss regressions in the real path. Rewrites TestPerf_Group_CreateDelete to reconstruct that exact choreography against a populated 50-instance deck (the cited upper-end size). Seeded rows carry TmuxSession="" so LoadWithGroups' lazy tmux.ReconnectSessionLazy branch never runs — still zero tmux code. Net-zero per iteration (create then delete the same ephemeral group), so TrimmedMeanWarm still applies. Local median under -race jumps ~8.5ms → ~67ms, confirming the old test measured a fraction of the real cost. Base 70ms → WarmBudget 210ms local / 420ms CI. The CLI shell itself (flag parse, NewStorageWithProfile, out.Success/os.Exit) needs the deferred injection seam and is out of scope; the seam-free remainder carries no meaningful, tmux-touching cost. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tier 1 WARM perf suite — deferred items from #790 / #1047
Builds out a basic suite of Tier 1 (tmpfs walltime)
TestPerf_*gates for the deferred persistence-touching lifecycle paths called out in PR #790 and #1047, per the mandate indocs/perf-budget-suite.md— specifically the "Tier 1 vs Tier 2" section.All new tests:
t.TempDir()(tmpfs on CI Linux) — fsync is effectively free, so the measured cost is CPU + Go-runtime cost only. Nomkdirunder repo paths.testutil.WarmBudget+testutil.TrimmedMeanWarm(orTrimmedMeanWithSetupwhere a per-iteration fixture rebuild is excluded from timing).testutil.SkipIfShort(t).-raceatPERF_BUDGET_MULTIPLIER=1.0next to the base constant.No real tmux is touched. No network. Pure-Go in-process SQLite only. (
internal/sessionStorage is constructed directly against a tmpfs DB; seeded rows carryTmuxSession=""so the lazy-reconnect branch inLoadWithGroupsnever executes.internal/statedbopens the DB directly.)New
TestPerf_*testsLocal medians measured under
-raceon a WSL2 dev host (the CI Xeon container will differ; budgets carry the×3WARM /×6CI headroom that absorbs this).-race, ×1.0)TestPerf_StateDB_InsertNSaveInstanceinto empty table (per-session insert)TestPerf_StateDB_ReadByIDInstanceExists(rm-verify point query, #909)TestPerf_StateDB_ListLoadInstancesover 50 rows (reload scan)TestPerf_StateDB_DeleteNDeleteInstance(agent-deck rmxargs path, #909)TestPerf_StateDB_WatcherEventIngestSaveWatcherEventinsert + pruneDELETE … NOT IN)TestPerf_Group_CreateDeletehandleGroupCreate/handleGroupDeletechoreography:LoadWithGroups→NewGroupTreeWithGroups→ create/delete →SaveWithGroups(×2), over a 50-instance deckWarmBudget = max(base × 3, 1 ms) × PERF_BUDGET_MULTIPLIER. None of the bases hit the 1 ms floor.Coverage rationale
StateDB CRUD scaling (
internal/statedb/statedb_perf_test.go) — insert / read-by-id / list / delete, each its own test, at N = 50. These deliberately gate the storage primitives (the task framed this as "StateDB CRUD scaling"), not an assembled flow. 50 is the representative upper-end deck size cited in code (internal/web/handlers_costs.go:529: the sidebar "grows past ~50 sessions";llms-full.txtcites 30-session workspaces) — not an invented number. DB path ist.TempDir().Group lifecycle, storage-mediated (
internal/session/group_perf_test.go) — test(perf): add walltime regression suite for cold start and group lifecycle #790 explicitly carved group create/delete out because doing it in-memory "exercises the wrong layer" (map ops are nanoseconds). This test gates it by reconstructing the exact storage choreography the real handlers perform (cmd/agent-deck/group_cmd.gohandleGroupCreate~L396-442 /handleGroupDelete~L689-705):Deliberately not
Storage.SaveGroupsOnly— that is the lightweight expand/collapse visual-state path (its own doc comment says so) and skips both the instances round-trip andTouch(). The dominant cost in the real path is the instances-table read+rewrite, which is why this runs against a populated 50-instance deck rather than an empty DB.Out of scope (follow-up): the literal CLI shell — flag parsing,
NewStorageWithProfile,out.Success/os.Exit. Driving that in-process needs the same injection seam deferred for session create/delete (below). The seam-free remainder carries no meaningful, tmux-touching cost, so the storage choreography above is the faithful Tier-1 target.Watcher-event ingestion (
internal/statedb) — a distinct persistence lifecycle:SaveWatcherEventdoesINSERT OR IGNOREthen a pruneDELETE … WHERE id NOT IN (SELECT … LIMIT N). The prune subquery cost scales with steady-state table size, so the table is pre-filled to the default 500-event bound (max_events_per_watcherdefault 500,internal/session/userconfig.go:3419) and a 50-event batch is timed on top.Tooling
make test-perffrom./cmd/agent-deck/...to./...so these internal-package gates run locally. This matches.github/workflows/perf-smoke.yml, whoseperf-testsjob already runs./..., so no CI workflow change is needed — the new tests are picked up automatically.Verification
make test-perfatPERF_BUDGET_MULTIPLIER=1.0and2.0: all six new tests green (verified at both multipliers).go test -race -count=1 ./internal/statedb/: green.-raceat both multipliers with comfortable margins.Non-goals (per task scope)
// TODO(tier2)with a one-line rationale is left in place.🤖 Generated with Claude Code
Summary by CodeRabbit