Skip to content

[codex] hard cutover full replay telemetry for future sessions#132

Merged
human-bee merged 12 commits intomainfrom
codex/hard-cutover-full-replay
Feb 26, 2026
Merged

[codex] hard cutover full replay telemetry for future sessions#132
human-bee merged 12 commits intomainfrom
codex/hard-cutover-full-replay

Conversation

@human-bee
Copy link
Owner

@human-bee human-bee commented Feb 26, 2026

Summary

Implements a hard-cutover replay telemetry pipeline for future sessions so each model/tool turn is durably replayable by correlation keys and rendered in a session-scoped HTML report with collapsible inputs/outputs.

Issue / User Impact

The current observability path stored lifecycle snapshots but not full turn-level model/tool payloads, so the requested per-model/per-agent forensic replay could not be generated for existing sessions.

Root Cause

  • Turn-level model and tool I/O were transport-only at runtime and not durably persisted.
  • Correlation/model provenance fields were optional/incomplete across handoffs.
  • The report generator only had partial tables and could not render full replay slices.

Why This Fix Solves Root Cause

  • Adds first-class replay ledgers (agent_model_io, agent_tool_io) plus raw blob store (agent_io_blobs) with 90-day retention columns.
  • Emits durable model/tool turn records from voice, conductor/router, canvas runner, fairy router, and fast stewards.
  • Preserves correlation keys (trace_id, request_id, intent_id, tool_call_id) and provider identity across pipeline boundaries.
  • Upgrades report generation to stitch transcript + replay + lifecycle tables per session window and render requested slices.

Changes By Surface

Schema / Contracts

  • Added migration: docs/migrations/012_agent_replay_full_telemetry.sql
    • New tables: agent_model_io, agent_tool_io, agent_io_blobs
    • Added parity columns for drifted environments:
      • agent_trace_events: provider/model/provider_source/provider_path/provider_request_id
      • agent_tasks: trace_id
    • Added indexes + RLS service-role policies.
    • task_id stored as text for compatibility with mixed correlation IDs.

Runtime Ingestion / Emission

  • New async buffered replay writer:
    • src/lib/agents/shared/replay-telemetry.ts
    • Bounded queue, priority drops, inline/blob caps, best-effort shutdown flush.
    • Idempotent flush with upsert(..., ignoreDuplicates) to avoid partial-batch retry wedges.
  • Upgraded telemetry ingestion route:
    • src/app/api/agent/telemetry/route.ts
    • Durable sink via replay writer, auth via ingest token or authenticated user.
    • Correlation fields accepted from both top-level and nested payload shape.
    • Returns 202 when enqueue is dropped instead of false 200 success.

Agent Instrumentation

  • Voice runtime + tool publishing:
    • src/lib/agents/realtime/voice-agent.ts
    • src/lib/agents/realtime/voice-agent/tool-publishing.ts
    • Ensures rewritten tool identity and full correlation/provider metadata are replayed.
  • Conductor / worker / fairy / canvas:
    • src/lib/agents/conductor/router.ts
    • src/lib/agents/conductor/worker.ts
    • src/lib/fairy-intent/router.ts
    • src/lib/agents/canvas-agent/server/runner.ts
  • Fast stewards:
    • src/lib/agents/subagents/crowd-pulse-steward-fast.ts
    • src/lib/agents/subagents/summary-steward-fast.ts
    • src/lib/agents/subagents/debate-steward-fast.ts

Reporting / Ops

  • New report generator:
    • scripts/observability/generate-session-chat-report.ts
    • Renders:
      • transcript
      • voice/transcription/orchestration/steward/fairy/fast sections
      • system prompt/context priming
      • collapsible tool/model inline + raw blob payloads
    • Graceful historical fallback when replay tables are absent/empty.
  • Added retention purge utility:
    • scripts/admin/purge-replay-telemetry.ts

Additional hardening updates in latest commits

  • Persist raw fairy action arrays (actions) + actionCount in fairy quick task outputs:
    • src/lib/agents/subagents/canvas-steward.ts
    • src/lib/agents/conductor/worker.ts
  • Replay queue resiliency:
    • reschedules flush after failed DB batch write (prevents idle stall)
    • drops orphaned queued blob rows if parent model/tool row enqueue fails
    • deterministic event-id composition for lifecycle joins
    • file: src/lib/agents/shared/replay-telemetry.ts
  • Report generator robustness for smoke artifacts:
    • skips transcript table lookup when synthetic/non-UUID session id is present
    • validates --session-id as UUID
    • file: scripts/observability/generate-session-chat-report.ts

Backward Compatibility

  • Scope is intentionally forward-looking; historical sessions may still show no replay rows.
  • Route auth remains compatible with browser-origin telemetry via authenticated user fallback.
  • Protocol parsing remains tolerant for legacy messages while new emitters provide full correlation payloads.

Validation

Automated checks

  • npm test -- src/lib/agents/subagents/canvas-steward.test.ts
  • npm test -- src/lib/agents/conductor/__tests__/router-execute-task.test.ts
  • npm run typecheck:agent

Smoke/report evidence (latest)

  • API smoke result JSON:
    • /Users/bsteinher/PRESENT/reports/showcase/api-smoke-1772084062725/result.json
    • proof: actionCount=5, actionArrayLength=5, terminal succeeded
  • Generated report:
    • HTML: /Users/bsteinher/PRESENT/reports/agent-chat/agent-chat-report-b3ea94cc-15cb-4de0-a443-be10d55920d3.html
    • JSON: /Users/bsteinher/PRESENT/reports/agent-chat/agent-chat-report-b3ea94cc-15cb-4de0-a443-be10d55920d3.json
  • Report walkthrough capture:
    • WebM: /Users/bsteinher/PRESENT/reports/agent-chat/webm/48b64abf966234713a7bf1897d7badad.webm
    • Screenshot: /Users/bsteinher/PRESENT/reports/agent-chat/webm/agent-chat-report-b3ea94cc-15cb-4de0-a443-be10d55920d3.png

Historical report artifact (expected pre-cutover gap)

  • HTML: /Users/bsteinher/PRESENT/reports/agent-chat/agent-chat-report-6267d849-1696-4c12-8a01-1c0bc6dcf2f3.html
  • JSON: /Users/bsteinher/PRESENT/reports/agent-chat/agent-chat-report-6267d849-1696-4c12-8a01-1c0bc6dcf2f3.json
  • Note: historical session contains transcript/tasks/traces but lacks replay rows by design pre-cutover.

Independent Reviewer Lanes

  • Reviewer lane B (independent) completed with concrete findings; addressed medium replay queue risks in latest commits.
  • Reviewer lane A agent process timed out repeatedly in this environment; fallback manual pass completed on changed files (replay-telemetry.ts, generate-session-chat-report.ts, canvas-steward.ts, worker.ts) with no additional blocking findings.

Remaining Risk

  • Full model token accounting/system/context replay remains dependent on applying the new replay DB migration on the PRESENT project used by runtime env. Current logs still show replay insert failures where those tables are absent.

@vercel
Copy link

vercel bot commented Feb 26, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
present Ready Ready Preview, Comment Feb 26, 2026 9:17am

Request Review

@human-bee
Copy link
Owner Author

Follow-up update (reducibility pass) is now pushed:

  • 13de783 reduce replay telemetry/report duplication and probe optional tables

Implemented:

  1. Unified replay row builders in replay-telemetry.ts via a generic enqueue path + shared row/context builders.
  2. Switched replay flush to table-config-driven behavior (REPLAY_TABLE_CONFIG) for conflict keys/upsert options.
  3. Converted report section rendering to a declarative section registry in generate-session-chat-report.ts.
  4. Added single optional-table probe for replay tables (agent_model_io, agent_tool_io, agent_io_blobs) and routed fetches from availability map.

Parity fixes from reviewer findings:

  • Restored graceful fallback when canvas_session_transcripts relation is missing.
  • Kept steward priming output parity (canvas_runner only in steward section priming block).

Validation rerun:

  • npm test -- src/lib/livekit/protocol.test.ts src/lib/agents/realtime/voice-agent/__tests__/tool-publishing.test.ts
  • npm test -- src/lib/agents/conductor/__tests__/router-execute-task.test.ts src/lib/agents/conductor/__tests__/mutation-arbiter.test.ts
  • npm run typecheck:agent
  • npm run build

Independent reviewers (2 lanes) final pass: no findings.

@human-bee
Copy link
Owner Author

Latest incremental update pushed in 81b382c.

What changed:

  • Added richer mission-control replay metadata emission:
    • Fairy router now persists routing skills, route/tooling snapshot (route kinds, context profiles, view events/targets, fast-lane detail schema, route_intent schema), and provider request metadata.
    • Canvas runner now persists prompt/tool snapshot context priming (style instructions, few-shots, tool catalog/schema, prompt budget, viewport/screenshot summary).
  • Added model token usage capture for canvas model turns:
    • Extended structured stream contract (usage, totalUsage, provider/request/response metadata).
    • Propagated usage telemetry from AI SDK model calls into replay payloads.
  • Added fairy action-option provenance in steward outputs:
    • Quick text/quick shapes now include actionOptions snapshots.
    • Graph fallback results now preserve raw actions + actionOptions.
  • Upgraded report rendering:
    • New “Mission Control Prompt + Tool Snapshots” section.
    • Expanded prompt artifact extraction/rendering for Tool Schema, Skills/Capability Profile, and Action/Option Schema.
    • Fairy action ledger now correlates downstream canvas task/tool rows by correlation IDs.
    • Action-array detector now recognizes { name, params } action objects (not only TL shape object forms).
    • If raw arrays are absent, report now explicitly states actionCount and that only summary fields were persisted.

Validation rerun:

  • npm run typecheck:agent
  • npm test -- src/lib/agents/subagents/canvas-steward.test.ts
  • npm test -- src/lib/agents/conductor/__tests__/router-execute-task.test.ts

Fresh artifacts:

  • Smoke result JSON:
    • /Users/bsteinher/PRESENT/test-results/observability-smoke/smoke-1772088185349/result.json
  • Updated smoke report:
    • HTML: /Users/bsteinher/PRESENT/reports/agent-chat/agent-chat-report-run-smoke-1772088185349.html
    • JSON: /Users/bsteinher/PRESENT/reports/agent-chat/agent-chat-report-run-smoke-1772088185349.json
    • WebM: /Users/bsteinher/PRESENT/reports/agent-chat/webm/bfe5426f85dbfe3eb4de197753dbe90e.webm
    • Screenshot: /Users/bsteinher/PRESENT/reports/agent-chat/webm/agent-chat-report-run-smoke-1772088185349.png
  • Re-rendered historical target report:
    • HTML: /Users/bsteinher/PRESENT/reports/agent-chat/agent-chat-report-d781f621-ec3a-4db2-a30d-201b169f05f1.html
    • JSON: /Users/bsteinher/PRESENT/reports/agent-chat/agent-chat-report-d781f621-ec3a-4db2-a30d-201b169f05f1.json
    • WebM: /Users/bsteinher/PRESENT/reports/agent-chat/webm/d630ba9134857bf0cf4683a40236cbb2.webm
    • Screenshot: /Users/bsteinher/PRESENT/reports/agent-chat/webm/agent-chat-report-d781f621-ec3a-4db2-a30d-201b169f05f1.png

Note:

  • This local runtime still reports replay tables unavailable (agent_model_io, agent_tool_io, agent_io_blobs), so mission-control snapshot counts remain zero here until migration docs/migrations/012_agent_replay_full_telemetry.sql is applied to the active PRESENT Supabase project.

@human-bee
Copy link
Owner Author

Follow-up fix wave pushed (9a66aee) after independent review lanes.

What changed:

  • src/app/api/agent/telemetry/route.ts

    • Auth path now:
      • accepts exact ingest bearer token when configured
      • rejects mismatched bearer immediately
      • allows signed-in session fallback only when no bearer is present (and, in token mode, only when cookie header exists)
    • Added bounded flush wait (AGENT_TELEMETRY_FLUSH_TIMEOUT_MS, NaN-safe default 250ms) with timer cleanup.
    • Response semantics:
      • 200 {status:"ok"} when flush confirms
      • 202 {status:"queued"} when queueing/timeout
      • 202 {status:"accepted_with_loss_warning"} when flush detects irrecoverable drops in the batch
  • src/lib/agents/shared/replay-telemetry.ts

    • flushReplayTelemetryNow() now returns explicit status: flushed | queued | dropped.
    • Retains poison-row isolation, and surfaces dropped when isolate recovery had irrecoverable rows.
  • src/lib/agents/realtime/voice-agent.ts

    • Correlation fallback now only resolves by requestId/traceId/intentId when match is unique in pendingToolReplayByCallId (avoids first-match misassociation).

Validation rerun:

  • npm run typecheck:agent
  • npm run typecheck:app
  • npx jest src/lib/agents/realtime/voice-agent/__tests__/tool-publishing.test.ts src/lib/livekit/protocol.test.ts

All passed locally.

@human-bee human-bee marked this pull request as ready for review February 26, 2026 09:19
@human-bee human-bee merged commit 66aacae into main Feb 26, 2026
4 checks passed
@human-bee human-bee deleted the codex/hard-cutover-full-replay branch February 26, 2026 09:20
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9a66aee89e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const token = normalizeOptional(process.env.AGENT_TELEMETRY_INGEST_TOKEN);
const bearer = readBearer(req);
if (token && bearer === token) return true;
if (token && bearer && bearer !== token) return false;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Accept Supabase bearer auth when ingest token is configured

The authorization gate returns false as soon as a bearer token is present but does not equal AGENT_TELEMETRY_INGEST_TOKEN, which prevents resolveRequestUser from running. Since resolveRequestUser explicitly supports authenticating normal Supabase user bearer tokens, authenticated clients that send their user JWT in Authorization (instead of cookie auth) will now get 401 whenever an ingest token is set, breaking the intended "ingest token OR authenticated user" behavior.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant