test(perf): Tier 1 WARM suite — statedb CRUD + storage-mediated group lifecycle by JMBattista · Pull Request #1234 · asheshgoplani/agent-deck

JMBattista · 2026-05-31T16:59:45Z

Tier 1 WARM perf suite — deferred items from #790 / #1047

Builds out a basic suite of Tier 1 (tmpfs walltime) TestPerf_* gates for the deferred persistence-touching lifecycle paths called out in PR #790 and #1047, per the mandate in docs/perf-budget-suite.md — specifically the "Tier 1 vs Tier 2" section.

All new tests:

Run against t.TempDir() (tmpfs on CI Linux) — fsync is effectively free, so the measured cost is CPU + Go-runtime cost only. No mkdir under repo paths.
Classify as WARM (pure in-process Go against an SQLite handle on tmpfs — no process boundary, no real-disk fsync, no child spawn, no network), with a one-line "why WARM not COLD" justification in each file header.
Use testutil.WarmBudget + testutil.TrimmedMeanWarm (or TrimmedMeanWithSetup where a per-iteration fixture rebuild is excluded from timing).
Call testutil.SkipIfShort(t).
Cite the local median measured under -race at PERF_BUDGET_MULTIPLIER=1.0 next to the base constant.

No real tmux is touched. No network. Pure-Go in-process SQLite only. (internal/session Storage is constructed directly against a tmpfs DB; seeded rows carry TmuxSession="" so the lazy-reconnect branch in LoadWithGroups never executes. internal/statedb opens the DB directly.)

New `TestPerf_*` tests

Local medians measured under -race on a WSL2 dev host (the CI Xeon container will differ; budgets carry the ×3 WARM / ×6 CI headroom that absorbs this).

Test	Path exercised	Local median (`-race`, ×1.0)	Base	Budget @1.0	Budget @2.0 (CI)
`TestPerf_StateDB_InsertN`	50× `SaveInstance` into empty table (per-session insert)	~206 ms	210 ms	630 ms	1.26 s
`TestPerf_StateDB_ReadByID`	50× `InstanceExists` (rm-verify point query, #909)	~9.3 ms	10 ms	30 ms	60 ms
`TestPerf_StateDB_List`	one `LoadInstances` over 50 rows (reload scan)	~6.9 ms	7 ms	21 ms	42 ms
`TestPerf_StateDB_DeleteN`	50× `DeleteInstance` (`agent-deck rm` xargs path, #909)	~166 ms	170 ms	510 ms	1.02 s
`TestPerf_StateDB_WatcherEventIngest`	ingest 50 events at 500-event steady state (`SaveWatcherEvent` insert + prune `DELETE … NOT IN`)	~1.42 s	1.45 s	4.35 s	8.7 s
`TestPerf_Group_CreateDelete`	real `handleGroupCreate`/`handleGroupDelete` choreography: `LoadWithGroups` → `NewGroupTreeWithGroups` → create/delete → `SaveWithGroups` (×2), over a 50-instance deck	~67 ms	70 ms	210 ms	420 ms

WarmBudget = max(base × 3, 1 ms) × PERF_BUDGET_MULTIPLIER. None of the bases hit the 1 ms floor.

Coverage rationale

StateDB CRUD scaling (internal/statedb/statedb_perf_test.go) — insert / read-by-id / list / delete, each its own test, at N = 50. These deliberately gate the storage primitives (the task framed this as "StateDB CRUD scaling"), not an assembled flow. 50 is the representative upper-end deck size cited in code (internal/web/handlers_costs.go:529: the sidebar "grows past ~50 sessions"; llms-full.txt cites 30-session workspaces) — not an invented number. DB path is t.TempDir().
Group lifecycle, storage-mediated (internal/session/group_perf_test.go) — test(perf): add walltime regression suite for cold start and group lifecycle #790 explicitly carved group create/delete out because doing it in-memory "exercises the wrong layer" (map ops are nanoseconds). This test gates it by reconstructing the exact storage choreography the real handlers perform (cmd/agent-deck/group_cmd.go handleGroupCreate ~L396-442 / handleGroupDelete ~L689-705):
```
storage.LoadWithGroups()                     // full read of all instances + groups
session.NewGroupTreeWithGroups(insts, grps)  // rebuild the tree
groupTree.CreateGroup / DeleteGroup          // the mutation
storage.SaveWithGroups(insts, groupTree)     // rewrite the WHOLE instances table + groups + Touch() + dedup
```
Deliberately not Storage.SaveGroupsOnly — that is the lightweight expand/collapse visual-state path (its own doc comment says so) and skips both the instances round-trip and Touch(). The dominant cost in the real path is the instances-table read+rewrite, which is why this runs against a populated 50-instance deck rather than an empty DB.

Out of scope (follow-up): the literal CLI shell — flag parsing, NewStorageWithProfile, out.Success/os.Exit. Driving that in-process needs the same injection seam deferred for session create/delete (below). The seam-free remainder carries no meaningful, tmux-touching cost, so the storage choreography above is the faithful Tier-1 target.
Watcher-event ingestion (internal/statedb) — a distinct persistence lifecycle: SaveWatcherEvent does INSERT OR IGNORE then a prune DELETE … WHERE id NOT IN (SELECT … LIMIT N). The prune subquery cost scales with steady-state table size, so the table is pre-filled to the default 500-event bound (max_events_per_watcher default 500, internal/session/userconfig.go:3419) and a 50-event batch is timed on top.

Tooling

Broadens make test-perf from ./cmd/agent-deck/... to ./... so these internal-package gates run locally. This matches .github/workflows/perf-smoke.yml, whose perf-tests job already runs ./..., so no CI workflow change is needed — the new tests are picked up automatically.

Verification

make test-perf at PERF_BUDGET_MULTIPLIER=1.0 and 2.0: all six new tests green (verified at both multipliers).
go test -race -count=1 ./internal/statedb/: green.
The six new tests pass under -race at both multipliers with comfortable margins.

Note: this dev host (WSL2) cannot pass the pre-existing cold-start gates (TestPerf_ColdStart_*, ~300 ms process spawn vs a 40 ms budget calibrated on the CI Xeon) or three pre-existing internal/session shell/lifecycle timing tests (TestStatusCycle_ShellSessionWithCommand, TestLifecycle_StoppedRestartedRunningError, TestBuildCodexCommand_CustomWrapperPreservesToolIdentity). All of these fail identically with this PR's files removed — they are host-specific and untouched here. No cold-start test or existing budget was modified.

Non-goals (per task scope)

No Tier 2 count assertions. A sibling session owns the outbox/drain Tier 2 work. Where a count gate clearly belongs (the watcher-event insert+prune path), a // TODO(tier2) with a one-line rationale is left in place.
Session create/delete Tier 1 gating is intentionally not included: it requires an injection seam to avoid real tmux/process spawn, which is a separate, larger piece of work. (The group test's CLI-shell wrapper shares this seam dependency.) Left as a follow-up.
No cold-start changes, no budget widening, no benchstat/trending infra.

🤖 Generated with Claude Code

Summary by CodeRabbit

Tests
- Performance testing expanded to run locally across the entire module, matching CI behavior
- Added performance benchmarks for group lifecycle operations (create, persist, delete cycles)
- Added performance benchmarks for state database operations (insert, read by ID, list, delete, and event ingestion)
Chores
- Adjusted local perf test target to cover all packages (broader scope)

…ycle Adds the deferred Tier 1 (tmpfs walltime) perf gates from PR asheshgoplani#790/asheshgoplani#1047 that suit CPU/Go-runtime cost measurement, per docs/perf-budget-suite.md ("Tier 1 vs Tier 2"). All run against t.TempDir() (tmpfs on CI Linux), classify as WARM, and use WarmBudget + TrimmedMeanWarm / TrimmedMeanWithSetup. New tests (local medians under -race @ multiplier=1.0): - internal/statedb (StateDB CRUD scaling, N=50 cited deck size): - TestPerf_StateDB_InsertN ~206ms - TestPerf_StateDB_ReadByID ~9.3ms - TestPerf_StateDB_List ~6.9ms - TestPerf_StateDB_DeleteN ~166ms - TestPerf_StateDB_WatcherEventIngest ~1.42s (insert+prune at 500-event steady state) - internal/session (storage-mediated group lifecycle, asheshgoplani#790's carve-out): - TestPerf_Group_CreateDelete ~8.5ms (SaveGroupsOnly round trip) Each cites its measured median next to the base constant and a file-level WARM-not-COLD justification. No real tmux, no network, pure-Go in-process SQLite on tmpfs. Also broadens `make test-perf` from ./cmd/agent-deck/... to ./... so these internal-package gates run locally, matching the perf-smoke.yml CI gate which already runs the whole module. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-05-31T16:59:59Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 1b7f80ce-a58a-4288-8f7e-834d436fb8e7

📥 Commits

Reviewing files that changed from the base of the PR and between c00eb8b and c1d0f15.

📒 Files selected for processing (1)

internal/session/group_perf_test.go

📝 Walkthrough

Walkthrough

Adds Tier-1 warm performance tests: five StateDB perf gates (insert, read-by-ID, list, delete, watcher-event ingest), one Session group create-delete perf test, and expands Makefile test-perf to run perf tests across the whole module (./...).

Changes

Performance Test Suite Expansion

Layer / File(s)	Summary
Makefile test-perf target expansion `Makefile`	Makefile `test-perf` target scope expanded from `./cmd/agent-deck/...` to `./...` with comments clarifying Tier-1 test coverage alignment with CI perf-smoke gate.
StateDB perf test infrastructure and CRUD tests `internal/statedb/statedb_perf_test.go`	Defines perf-gate constants, implements `newPerfDB`, `perfRows`, and `clearInstances` helpers for isolated SQLite-backed fixtures, then adds CRUD perf gates: `TestPerf_StateDB_InsertN`, `TestPerf_StateDB_ReadByID`, `TestPerf_StateDB_List`, `TestPerf_StateDB_DeleteN` measured via `testutil` trimmed-mean warm budgets.
StateDB watcher event steady-state performance test `internal/statedb/statedb_perf_test.go`	Defines watcher steady-state and batch constants and implements `TestPerf_StateDB_WatcherEventIngest`, which pre-fills `watcher_events` then measures batched `SaveWatcherEvent` insert+prune behavior against a warm budget.
Session group create-delete performance test `internal/session/group_perf_test.go`	Adds `TestPerf_Group_CreateDelete` that seeds a deck, then times repeated LoadWithGroups → CreateGroup/DeleteGroup → SaveWithGroups cycles on an isolated migrated SQLite `Storage`, enforcing a trimmed-mean warm budget.

Sequence Diagram(s)

sequenceDiagram
  participant Test as TestPerf_StateDB_*
  participant StateDB as StateDB
  participant Testutil as testutil.TrimmedMean*
  Test->>StateDB: newPerfDB (migrate, open temp DB)
  Test->>StateDB: perfRows / SaveInstance / LoadInstances / DeleteInstance (setup)
  Test->>Testutil: TrimmedMean* (measure)
  Testutil->>StateDB: invoke operation (timed call)
  Testutil-->>Test: trimmed mean result
  Test->>Test: compare against WarmBudget

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 7

✅ Passed checks (7 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title follows the Conventional Commits format with the 'test:' prefix and clearly summarizes the changes: adding Tier 1 WARM performance tests for statedb CRUD operations and storage-mediated group lifecycle.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 70.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Remote_parity	✅ Passed	PR does not modify TUI code in internal/ui/ or cmd/agent-deck/; it only adds test files (group_perf_test.go, statedb_perf_test.go) and updates Makefile test scope. Check not applicable.
Test_coverage_per_surface	✅ Passed	PR adds only performance test infrastructure for existing internal components; no new user-facing features introduced, so surface coverage requirement is not applicable.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds Tier-1 WARM performance gate tests for SQLite-backed persistence paths (StateDB CRUD, watcher event ingestion, and storage-mediated group lifecycle), and updates the local perf test make target to execute perf gates across all packages.

Changes:

Added StateDB Tier-1 perf tests covering insert/read/list/delete and watcher event ingest+prune paths.
Added Session/Storage Tier-1 perf test covering group create→persist→delete→persist lifecycle.
Updated make test-perf to run TestPerf_* across ./... so new perf tests in any package run locally like CI.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
internal/statedb/statedb_perf_test.go	Introduces Tier-1 WARM perf gates for StateDB CRUD and watcher event ingest+prune.
internal/session/group_perf_test.go	Adds Tier-1 WARM perf gate for storage-mediated group create/delete persistence round trip.
Makefile	Expands `test-perf` scope to run perf tests across the whole module.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+		tx, err := db.DB().Begin()
+		if err != nil {
+			t.Fatalf("begin prefill: %v", err)
+		}
+		stmt, err := tx.Prepare(`INSERT INTO watcher_events (watcher_id, dedup_key, created_at) VALUES (?, ?, ?)`)
+		if err != nil {
+			t.Fatalf("prepare prefill: %v", err)
+		}
+		now := time.Now().Unix()
+		for i := 0; i < perfWatcherSteadyState; i++ {
+			if _, err := stmt.Exec(watcherID, fmt.Sprintf("base-%05d", i), now+int64(i)); err != nil {
+				t.Fatalf("prefill exec: %v", err)
+			}
+		}
+		_ = stmt.Close()
+		if err := tx.Commit(); err != nil {
+			t.Fatalf("commit prefill: %v", err)
+		}


…aphy The first version called Storage.SaveGroupsOnly — but that is the lightweight expand/collapse visual-state path (skips the instances round-trip and Touch()). The actual user-facing handlers (cmd/agent-deck/group_cmd.go handleGroupCreate / handleGroupDelete) instead do: LoadWithGroups → NewGroupTreeWithGroups → CreateGroup/DeleteGroup → SaveWithGroups where the dominant cost is the full instances-table read + rewrite (+ Touch + dedup). The old test gated a cheaper sibling op and would miss regressions in the real path. Rewrites TestPerf_Group_CreateDelete to reconstruct that exact choreography against a populated 50-instance deck (the cited upper-end size). Seeded rows carry TmuxSession="" so LoadWithGroups' lazy tmux.ReconnectSessionLazy branch never runs — still zero tmux code. Net-zero per iteration (create then delete the same ephemeral group), so TrimmedMeanWarm still applies. Local median under -race jumps ~8.5ms → ~67ms, confirming the old test measured a fraction of the real cost. Base 70ms → WarmBudget 210ms local / 420ms CI. The CLI shell itself (flag parse, NewStorageWithProfile, out.Success/os.Exit) needs the deferred injection seam and is out of scope; the seam-free remainder carries no meaningful, tmux-touching cost. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 31, 2026 16:59

JMBattista requested a review from asheshgoplani as a code owner May 31, 2026 16:59

Copilot AI reviewed May 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(perf): Tier 1 WARM suite — statedb CRUD + storage-mediated group lifecycle#1234

test(perf): Tier 1 WARM suite — statedb CRUD + storage-mediated group lifecycle#1234
JMBattista wants to merge 2 commits into
asheshgoplani:mainfrom
JMBattista:feature/tier1-warm-suite

JMBattista commented May 31, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 31, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JMBattista commented May 31, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tier 1 WARM perf suite — deferred items from #790 / #1047

New TestPerf_* tests

Coverage rationale

Tooling

Verification

Non-goals (per task scope)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JMBattista commented May 31, 2026 •

edited by coderabbitai Bot

Loading

New `TestPerf_*` tests

coderabbitai Bot commented May 31, 2026 •

edited

Loading