[codex] hard cutover full replay telemetry for future sessions by human-bee · Pull Request #132 · human-bee/PRESENT

human-bee · 2026-02-26T02:58:12Z

Summary

Implements a hard-cutover replay telemetry pipeline for future sessions so each model/tool turn is durably replayable by correlation keys and rendered in a session-scoped HTML report with collapsible inputs/outputs.

Issue / User Impact

The current observability path stored lifecycle snapshots but not full turn-level model/tool payloads, so the requested per-model/per-agent forensic replay could not be generated for existing sessions.

Root Cause

Turn-level model and tool I/O were transport-only at runtime and not durably persisted.
Correlation/model provenance fields were optional/incomplete across handoffs.
The report generator only had partial tables and could not render full replay slices.

Why This Fix Solves Root Cause

Adds first-class replay ledgers (agent_model_io, agent_tool_io) plus raw blob store (agent_io_blobs) with 90-day retention columns.
Emits durable model/tool turn records from voice, conductor/router, canvas runner, fairy router, and fast stewards.
Preserves correlation keys (trace_id, request_id, intent_id, tool_call_id) and provider identity across pipeline boundaries.
Upgrades report generation to stitch transcript + replay + lifecycle tables per session window and render requested slices.

Changes By Surface

Schema / Contracts

Added migration: docs/migrations/012_agent_replay_full_telemetry.sql
- New tables: agent_model_io, agent_tool_io, agent_io_blobs
- Added parity columns for drifted environments:
  - agent_trace_events: provider/model/provider_source/provider_path/provider_request_id
  - agent_tasks: trace_id
- Added indexes + RLS service-role policies.
- task_id stored as text for compatibility with mixed correlation IDs.

Runtime Ingestion / Emission

New async buffered replay writer:
- src/lib/agents/shared/replay-telemetry.ts
- Bounded queue, priority drops, inline/blob caps, best-effort shutdown flush.
- Idempotent flush with upsert(..., ignoreDuplicates) to avoid partial-batch retry wedges.
Upgraded telemetry ingestion route:
- src/app/api/agent/telemetry/route.ts
- Durable sink via replay writer, auth via ingest token or authenticated user.
- Correlation fields accepted from both top-level and nested payload shape.
- Returns 202 when enqueue is dropped instead of false 200 success.

Agent Instrumentation

Voice runtime + tool publishing:
- src/lib/agents/realtime/voice-agent.ts
- src/lib/agents/realtime/voice-agent/tool-publishing.ts
- Ensures rewritten tool identity and full correlation/provider metadata are replayed.
Conductor / worker / fairy / canvas:
- src/lib/agents/conductor/router.ts
- src/lib/agents/conductor/worker.ts
- src/lib/fairy-intent/router.ts
- src/lib/agents/canvas-agent/server/runner.ts
Fast stewards:
- src/lib/agents/subagents/crowd-pulse-steward-fast.ts
- src/lib/agents/subagents/summary-steward-fast.ts
- src/lib/agents/subagents/debate-steward-fast.ts

Reporting / Ops

New report generator:
- scripts/observability/generate-session-chat-report.ts
- Renders:
  - transcript
  - voice/transcription/orchestration/steward/fairy/fast sections
  - system prompt/context priming
  - collapsible tool/model inline + raw blob payloads
- Graceful historical fallback when replay tables are absent/empty.
Added retention purge utility:
- scripts/admin/purge-replay-telemetry.ts

Additional hardening updates in latest commits

Persist raw fairy action arrays (actions) + actionCount in fairy quick task outputs:
- src/lib/agents/subagents/canvas-steward.ts
- src/lib/agents/conductor/worker.ts
Replay queue resiliency:
- reschedules flush after failed DB batch write (prevents idle stall)
- drops orphaned queued blob rows if parent model/tool row enqueue fails
- deterministic event-id composition for lifecycle joins
- file: src/lib/agents/shared/replay-telemetry.ts
Report generator robustness for smoke artifacts:
- skips transcript table lookup when synthetic/non-UUID session id is present
- validates --session-id as UUID
- file: scripts/observability/generate-session-chat-report.ts

Backward Compatibility

Scope is intentionally forward-looking; historical sessions may still show no replay rows.
Route auth remains compatible with browser-origin telemetry via authenticated user fallback.
Protocol parsing remains tolerant for legacy messages while new emitters provide full correlation payloads.

Validation

Automated checks

npm test -- src/lib/agents/subagents/canvas-steward.test.ts
npm test -- src/lib/agents/conductor/__tests__/router-execute-task.test.ts
npm run typecheck:agent

Smoke/report evidence (latest)

API smoke result JSON:
- /Users/bsteinher/PRESENT/reports/showcase/api-smoke-1772084062725/result.json
- proof: actionCount=5, actionArrayLength=5, terminal succeeded
Generated report:
- HTML: /Users/bsteinher/PRESENT/reports/agent-chat/agent-chat-report-b3ea94cc-15cb-4de0-a443-be10d55920d3.html
- JSON: /Users/bsteinher/PRESENT/reports/agent-chat/agent-chat-report-b3ea94cc-15cb-4de0-a443-be10d55920d3.json
Report walkthrough capture:
- WebM: /Users/bsteinher/PRESENT/reports/agent-chat/webm/48b64abf966234713a7bf1897d7badad.webm
- Screenshot: /Users/bsteinher/PRESENT/reports/agent-chat/webm/agent-chat-report-b3ea94cc-15cb-4de0-a443-be10d55920d3.png

Historical report artifact (expected pre-cutover gap)

HTML: /Users/bsteinher/PRESENT/reports/agent-chat/agent-chat-report-6267d849-1696-4c12-8a01-1c0bc6dcf2f3.html
JSON: /Users/bsteinher/PRESENT/reports/agent-chat/agent-chat-report-6267d849-1696-4c12-8a01-1c0bc6dcf2f3.json
Note: historical session contains transcript/tasks/traces but lacks replay rows by design pre-cutover.

Independent Reviewer Lanes

Reviewer lane B (independent) completed with concrete findings; addressed medium replay queue risks in latest commits.
Reviewer lane A agent process timed out repeatedly in this environment; fallback manual pass completed on changed files (replay-telemetry.ts, generate-session-chat-report.ts, canvas-steward.ts, worker.ts) with no additional blocking findings.

Remaining Risk

Full model token accounting/system/context replay remains dependent on applying the new replay DB migration on the PRESENT project used by runtime env. Current logs still show replay insert failures where those tables are absent.

vercel · 2026-02-26T02:58:16Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
present	Ready	Preview, Comment	Feb 26, 2026 9:17am

human-bee · 2026-02-26T03:19:21Z

Follow-up update (reducibility pass) is now pushed:

13de783 reduce replay telemetry/report duplication and probe optional tables

Implemented:

Unified replay row builders in replay-telemetry.ts via a generic enqueue path + shared row/context builders.
Switched replay flush to table-config-driven behavior (REPLAY_TABLE_CONFIG) for conflict keys/upsert options.
Converted report section rendering to a declarative section registry in generate-session-chat-report.ts.
Added single optional-table probe for replay tables (agent_model_io, agent_tool_io, agent_io_blobs) and routed fetches from availability map.

Parity fixes from reviewer findings:

Restored graceful fallback when canvas_session_transcripts relation is missing.
Kept steward priming output parity (canvas_runner only in steward section priming block).

Validation rerun:

npm test -- src/lib/livekit/protocol.test.ts src/lib/agents/realtime/voice-agent/__tests__/tool-publishing.test.ts
npm test -- src/lib/agents/conductor/__tests__/router-execute-task.test.ts src/lib/agents/conductor/__tests__/mutation-arbiter.test.ts
npm run typecheck:agent
npm run build

Independent reviewers (2 lanes) final pass: no findings.

human-bee · 2026-02-26T06:56:48Z

Latest incremental update pushed in 81b382c.

What changed:

Added richer mission-control replay metadata emission:
- Fairy router now persists routing skills, route/tooling snapshot (route kinds, context profiles, view events/targets, fast-lane detail schema, route_intent schema), and provider request metadata.
- Canvas runner now persists prompt/tool snapshot context priming (style instructions, few-shots, tool catalog/schema, prompt budget, viewport/screenshot summary).
Added model token usage capture for canvas model turns:
- Extended structured stream contract (usage, totalUsage, provider/request/response metadata).
- Propagated usage telemetry from AI SDK model calls into replay payloads.
Added fairy action-option provenance in steward outputs:
- Quick text/quick shapes now include actionOptions snapshots.
- Graph fallback results now preserve raw actions + actionOptions.
Upgraded report rendering:
- New “Mission Control Prompt + Tool Snapshots” section.
- Expanded prompt artifact extraction/rendering for Tool Schema, Skills/Capability Profile, and Action/Option Schema.
- Fairy action ledger now correlates downstream canvas task/tool rows by correlation IDs.
- Action-array detector now recognizes { name, params } action objects (not only TL shape object forms).
- If raw arrays are absent, report now explicitly states actionCount and that only summary fields were persisted.

Validation rerun:

npm run typecheck:agent
npm test -- src/lib/agents/subagents/canvas-steward.test.ts
npm test -- src/lib/agents/conductor/__tests__/router-execute-task.test.ts

Fresh artifacts:

Smoke result JSON:
- /Users/bsteinher/PRESENT/test-results/observability-smoke/smoke-1772088185349/result.json
Updated smoke report:
- HTML: /Users/bsteinher/PRESENT/reports/agent-chat/agent-chat-report-run-smoke-1772088185349.html
- JSON: /Users/bsteinher/PRESENT/reports/agent-chat/agent-chat-report-run-smoke-1772088185349.json
- WebM: /Users/bsteinher/PRESENT/reports/agent-chat/webm/bfe5426f85dbfe3eb4de197753dbe90e.webm
- Screenshot: /Users/bsteinher/PRESENT/reports/agent-chat/webm/agent-chat-report-run-smoke-1772088185349.png
Re-rendered historical target report:
- HTML: /Users/bsteinher/PRESENT/reports/agent-chat/agent-chat-report-d781f621-ec3a-4db2-a30d-201b169f05f1.html
- JSON: /Users/bsteinher/PRESENT/reports/agent-chat/agent-chat-report-d781f621-ec3a-4db2-a30d-201b169f05f1.json
- WebM: /Users/bsteinher/PRESENT/reports/agent-chat/webm/d630ba9134857bf0cf4683a40236cbb2.webm
- Screenshot: /Users/bsteinher/PRESENT/reports/agent-chat/webm/agent-chat-report-d781f621-ec3a-4db2-a30d-201b169f05f1.png

Note:

This local runtime still reports replay tables unavailable (agent_model_io, agent_tool_io, agent_io_blobs), so mission-control snapshot counts remain zero here until migration docs/migrations/012_agent_replay_full_telemetry.sql is applied to the active PRESENT Supabase project.

…lity

human-bee · 2026-02-26T09:16:33Z

Follow-up fix wave pushed (9a66aee) after independent review lanes.

What changed:

src/app/api/agent/telemetry/route.ts
- Auth path now:
  - accepts exact ingest bearer token when configured
  - rejects mismatched bearer immediately
  - allows signed-in session fallback only when no bearer is present (and, in token mode, only when cookie header exists)
- Added bounded flush wait (AGENT_TELEMETRY_FLUSH_TIMEOUT_MS, NaN-safe default 250ms) with timer cleanup.
- Response semantics:
  - 200 {status:"ok"} when flush confirms
  - 202 {status:"queued"} when queueing/timeout
  - 202 {status:"accepted_with_loss_warning"} when flush detects irrecoverable drops in the batch
src/lib/agents/shared/replay-telemetry.ts
- flushReplayTelemetryNow() now returns explicit status: flushed | queued | dropped.
- Retains poison-row isolation, and surfaces dropped when isolate recovery had irrecoverable rows.
src/lib/agents/realtime/voice-agent.ts
- Correlation fallback now only resolves by requestId/traceId/intentId when match is unique in pendingToolReplayByCallId (avoids first-match misassociation).

Validation rerun:

npm run typecheck:agent
npm run typecheck:app
npx jest src/lib/agents/realtime/voice-agent/__tests__/tool-publishing.test.ts src/lib/livekit/protocol.test.ts

All passed locally.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9a66aee89e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-26T09:32:10Z

src/app/api/agent/telemetry/route.ts

+  const token = normalizeOptional(process.env.AGENT_TELEMETRY_INGEST_TOKEN);
+  const bearer = readBearer(req);
+  if (token && bearer === token) return true;
+  if (token && bearer && bearer !== token) return false;


Accept Supabase bearer auth when ingest token is configured

The authorization gate returns false as soon as a bearer token is present but does not equal AGENT_TELEMETRY_INGEST_TOKEN, which prevents resolveRequestUser from running. Since resolveRequestUser explicitly supports authenticating normal Supabase user bearer tokens, authenticated clients that send their user JWT in Authorization (instead of cookie auth) will now get 401 whenever an ingest token is set, breaking the intended "ingest token OR authenticated user" behavior.

Useful? React with 👍 / 👎.

vercel bot deployed to Preview February 26, 2026 02:59 View deployment

vercel bot deployed to Preview February 26, 2026 03:02 View deployment

vercel bot deployed to Preview February 26, 2026 03:08 View deployment

vercel bot deployed to Preview February 26, 2026 03:20 View deployment

vercel bot deployed to Preview February 26, 2026 03:48 View deployment

vercel bot deployed to Preview February 26, 2026 04:10 View deployment

vercel bot deployed to Preview February 26, 2026 05:57 View deployment

vercel bot deployed to Preview February 26, 2026 05:58 View deployment

vercel bot deployed to Preview February 26, 2026 06:56 View deployment

human-bee added 11 commits February 26, 2026 00:32

hard cutover full replay telemetry for future sessions

87514a4

harden replay telemetry ingestion reliability and auth

98556e5

remove replay telemetry enable flag for hard cutover

80f3eba

support smoke result-json scope for trace report generation

119aa7c

reduce replay telemetry/report duplication and probe optional tables

b6c6e2b

report: add timeline/token/fairy raw telemetry fallback

2da50e7

telemetry: persist raw fairy actions in task outputs

b57d5c0

telemetry: harden replay flush retries and blob queue consistency

a2daff3

report: handle synthetic result-json session ids safely

fcf6d67

feat: expand replay mission-control snapshots and fairy action visibi…

b7bee11

…lity

fix: harden replay telemetry durability and correlation

2542ed8

human-bee force-pushed the codex/hard-cutover-full-replay branch from 81b382c to 2542ed8 Compare February 26, 2026 08:57

vercel bot deployed to Preview February 26, 2026 08:59 View deployment

fix: refine telemetry ingest auth and flush statuses

9a66aee

vercel bot deployed to Preview February 26, 2026 09:17 View deployment

human-bee marked this pull request as ready for review February 26, 2026 09:19

human-bee merged commit 66aacae into main Feb 26, 2026
4 checks passed

human-bee deleted the codex/hard-cutover-full-replay branch February 26, 2026 09:20

chatgpt-codex-connector bot reviewed Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] hard cutover full replay telemetry for future sessions#132

[codex] hard cutover full replay telemetry for future sessions#132
human-bee merged 12 commits intomainfrom
codex/hard-cutover-full-replay

human-bee commented Feb 26, 2026 •

edited

Loading

Uh oh!

vercel bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

human-bee commented Feb 26, 2026

Uh oh!

human-bee commented Feb 26, 2026

Uh oh!

human-bee commented Feb 26, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

human-bee commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Issue / User Impact

Root Cause

Why This Fix Solves Root Cause

Changes By Surface

Schema / Contracts

Runtime Ingestion / Emission

Agent Instrumentation

Reporting / Ops

Additional hardening updates in latest commits

Backward Compatibility

Validation

Automated checks

Smoke/report evidence (latest)

Historical report artifact (expected pre-cutover gap)

Independent Reviewer Lanes

Remaining Risk

Uh oh!

vercel bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

human-bee commented Feb 26, 2026

Uh oh!

human-bee commented Feb 26, 2026

Uh oh!

human-bee commented Feb 26, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

human-bee commented Feb 26, 2026 •

edited

Loading

vercel bot commented Feb 26, 2026 •

edited

Loading