Skip to content

test(insession): e2e model-backed child extraction via stub seam (#135)#186

Merged
edheltzel merged 2 commits into
mainfrom
worktree-issue-135-insession-model-stub-e2e
Jun 25, 2026
Merged

test(insession): e2e model-backed child extraction via stub seam (#135)#186
edheltzel merged 2 commits into
mainfrom
worktree-issue-135-insession-model-stub-e2e

Conversation

@edheltzel

Copy link
Copy Markdown
Owner

Summary

Closes #135 — the model-backed full child path is now covered end-to-end: parent past cadence → detached spawn → real runInSessionExtraction → a row in extraction_sessions — deterministically, in CI, with no live model backend.

Scope note (for the gate)

This issue is labeled type:test, but a non-vacuous model-backed e2e required a small production change: the spawned child (hooks/RecallInSession.ts runChild) hardcoded extract: runExtractionCascade with no injection point, so a subprocess test could not substitute a deterministic model. Per Themis's authorization (Option A), I added a minimal stub-seam:

  • resolveExtract() — when RECALL_INSESSION_EXTRACT_STUB names a readable file, the child uses its contents as a fake extractor; otherwise it returns the real runExtractionCascade. One env var, one branch, no other config (KarpathyGuidelines).

What changed

  • hooks/RecallInSession.ts (+20): resolveExtract() helper + swap the child's hardcoded extract: for resolveExtract(). Production default behavior is unchanged when the env var is unset.
  • tests/hooks/RecallInSession.test.ts (+98): one new e2e test (reuses the existing runHook subprocess harness — DRY) that drives the parent past the turn cadence, lets it spawn the detached child, and polls (bounded, mirrors RecallClearExtract.test.ts) for the extraction_sessions row. Header note updated.

Why it is not vacuous

  • The row is asserted to exist and its summary to equal the stub's exact one-sentence summary, so a real model running instead of the stub would also fail.
  • Mutation-verified: forcing the child's extractor to yield nothing (the CI "no backend" condition) makes the row never appear → the test goes RED. Restored after.

Verification

  • bun run lint (tsc --noEmit): clean
  • bun test: 1175 pass / 0 fail (baseline 1174 + this test)
  • New test deterministic across 3 consecutive runs (~0.5s; 3s poll budget stays under bun's 5s per-test timeout)

Add a minimal, deterministic stub-seam to the RecallInSession child: when
RECALL_INSESSION_EXTRACT_STUB names a readable file, the spawned child uses its
contents as a fake extractor instead of runExtractionCascade (one env var, one
branch). This lets the full child path — parent past cadence -> detached spawn
-> real runInSessionExtraction -> extraction_sessions row — be driven end-to-end
in CI with no live model backend.

The new e2e test drives the parent past the turn cadence, lets it spawn the
detached child, and asserts a row lands in extraction_sessions whose summary
equals the stub's exact sentence. Non-vacuous: without the stub (or a real
backend) the child returns extraction_failed and writes no row (verified by
mutation). Reuses the existing runHook subprocess harness.

Closes #135
@edheltzel edheltzel merged commit 086a0e8 into main Jun 25, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test: end-to-end model-backed RecallInSession child extraction (spawn → child → model → row)

1 participant