Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
0d6578a
fix(frontend): stop agent session tab bar overflow and pin the new-se…
ardaerzin Jul 2, 2026
4284c06
feat(frontend): make the agent playground splitter resize handle disc…
ardaerzin Jul 2, 2026
daa2f93
feat(frontend): move the agent revision selector into the playground …
ardaerzin Jul 2, 2026
4682864
feat(frontend): extend shared drawer rail primitives
ardaerzin Jul 2, 2026
f3edcc5
refactor(frontend): render sandbox, Claude and MCP forms as flat Rail…
ardaerzin Jul 2, 2026
62483a5
refactor(frontend): rebuild the playground build-kit section on Confi…
ardaerzin Jul 2, 2026
23453af
feat(frontend): scoped-draft section drawers + Model & harness / Adva…
ardaerzin Jul 2, 2026
cc17ab0
Merge remote-tracking branch 'origin/big-agents' into big-agents-work
ardaerzin Jul 2, 2026
499b22f
feat(frontend): rich chat input links, code blocks, and block-style p…
ardaerzin Jul 2, 2026
4cfa7da
feat(frontend): persistent HITL approval dock with hardened queue rel…
ardaerzin Jul 2, 2026
99462f9
Merge remote-tracking branch 'origin/big-agents' into big-agents-work
ardaerzin Jul 2, 2026
56346db
fix(frontend): add breathing room between agent message toolbar and b…
ardaerzin Jul 2, 2026
4888081
docs(agent): Turn Inspector (Build-mode tooling) design spec
ardaerzin Jul 2, 2026
84f39b9
docs(agent): Turn Inspector implementation plan
ardaerzin Jul 2, 2026
d9294a3
feat(frontend): build-mode step log for agent tool calls
ardaerzin Jul 2, 2026
5647c76
feat(frontend): turn-inspector open-state atom
ardaerzin Jul 2, 2026
b4c0647
feat(frontend): turn-inspector Timeline tab
ardaerzin Jul 2, 2026
9d8f9f1
feat(frontend): turn-inspector drawer shell
ardaerzin Jul 2, 2026
d47778a
feat(frontend): mount turn inspector + inspect-turn affordance
ardaerzin Jul 2, 2026
81c727c
feat(playground): per-turn request capture + correlation helpers
ardaerzin Jul 2, 2026
0a56cd0
feat(frontend): session-scoped turn-capture store
ardaerzin Jul 2, 2026
a62c380
feat(frontend): capture outgoing agent request per send
ardaerzin Jul 2, 2026
e1c504c
feat(frontend): turn-inspector Context tab (config + messages sent)
ardaerzin Jul 2, 2026
bfd3689
feat(frontend): turn-inspector Raw tab (copyable payloads)
ardaerzin Jul 2, 2026
4496f0a
fix(frontend): turn inspector reads live messages, streams, and gates…
ardaerzin Jul 2, 2026
d4fe0d9
feat(frontend): collapsible individual steps in the build-mode step log
ardaerzin Jul 2, 2026
0829b61
feat(frontend): agent chat empty state — agent-aware in Build, warm m…
ardaerzin Jul 2, 2026
6fc73ab
fix(frontend): agent chat scroll — stop jump-to-top on stream state c…
ardaerzin Jul 2, 2026
fbe5f9f
fix(sdk): key HITL approval on stable spec name + rawInput args
ardaerzin Jul 2, 2026
e90e94c
fix(runner): HITL resume — stable-name key + non-converging loop-breaker
ardaerzin Jul 2, 2026
b9e7af6
fix(frontend): unique fallback id for id-less batch-replay turns
ardaerzin Jul 2, 2026
bbbabb0
style(frontend): calmer chat composer
ardaerzin Jul 2, 2026
4308fcf
refactor(frontend): turn inspector — siderail nav, full-round timelin…
ardaerzin Jul 2, 2026
d93d551
fix(sdk): unique vercel stream messageId per turn
ardaerzin Jul 2, 2026
818d8b7
fix(frontend): strip markdown code fences from tool error display
ardaerzin Jul 2, 2026
76d5796
fix(runner): capture tool args that arrive on tool_call_update
ardaerzin Jul 2, 2026
3142c76
refactor(frontend): turn inspector — inline side panel with animated …
ardaerzin Jul 2, 2026
941f338
fix(runner): restore emit-first tool_call; refresh input via re-emit,…
ardaerzin Jul 2, 2026
ade0cb6
fix(runner): anchor HITL approval key on the recorded tool name
ardaerzin Jul 3, 2026
44bfdd8
fix(sdk): key HITL approval on the recorded tool name, stable across …
ardaerzin Jul 3, 2026
74ed562
fix(frontend): stop agent chat re-sending after a HITL approval resolves
ardaerzin Jul 3, 2026
7f82e9f
fix(frontend): agent tool rendering — collapse resolved gate, strip o…
ardaerzin Jul 3, 2026
0fbedd5
refactor(frontend): agent playground surface system — panel-contrast …
ardaerzin Jul 3, 2026
07b5078
Merge remote-tracking branch 'origin/big-agents' into big-agents-work
ardaerzin Jul 3, 2026
8ab3070
Merge branch 'big-agents' into big-agents-work
bekossy Jul 3, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
146 changes: 146 additions & 0 deletions docs/design/agent-workflows/projects/agent-turn-inspector/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# Project: Agent Turn Inspector (Build-mode tooling)

| | |
| --- | --- |
| **Status** | Design. Approved via brainstorming on 2026-07-02. Not yet planned/implemented. |
| **Type** | Frontend feature (agent playground, Build mode). Sequenced, test-driven. |
| **Audience** | Internal builders — the team actively building the agent system, debugging in the playground. |
| **Scope** | A dedicated, agent-native "Turn Inspector" panel for deep per-turn debugging, plus a per-turn capture of what was actually sent to the agent. The inline Build-mode step log stays as-is. |
| **Owner files (today)** | `web/oss/src/components/AgentChatSlice/components/ToolActivity.tsx`, `.../components/AgentMessage.tsx`, `.../AgentChatPanel.tsx` (transport + temp diagnostic), `web/packages/agenta-playground/src/state/execution/agentRequest.ts` (`buildAgentRequest`). |

## 1. Problem

The agent chat view was deliberately kept calm for non-technical users. That calm hides the
information the team building the system needs to debug it. Across one working session we hit
this repeatedly and could not answer, from the UI alone:

- **Why did the agent misbehave?** e.g. a `commit_revision` loop where it re-committed and
re-answered "already done" several times in one turn.
- **What did the agent actually run with?** The effective config (instructions / model / tools)
and the exact `messages` array we sent — *as of that turn*. This is **invisible today**; it
only exists in a temporary `console.warn` (`[AgentChat OUTGOING]`) added during the loop
investigation.
- **Were the tool calls correct?** Right input in, right output back — the `{}`-input question,
malformed args, error payloads.

The root difficulty is that the agent's *actual execution* is not legible: we watch what it
produces, but not what it saw. The re-commit loop, for instance, is best explained by the FE
re-sending a config that had drifted out from under the agent after its own self-commit — a
fact no surface in the app exposes.

## 2. Goals / non-goals

**Goals** (priority order, from the builders):

1. Diagnose *why it misbehaved* (loops, wrong tool, ignored context).
2. See *what it actually ran with* (effective config + exact messages sent, accurate at that turn).
3. Verify *tool-call correctness* (full input/output/error per call).

**Non-goals:**

- Do **not** touch the existing trace drawer (avoid regressions in a working surface).
- Do **not** reduce the inline Build-mode step log — the panel is purely additive.
- No speed/cost analytics (explicitly deprioritized).
- Read-only. No re-running/editing steps from the panel (possible future work).

## 3. Surfaces

Two surfaces with a clean division of labor:

- **Inline step log** — exists (`ToolActivity` `detailed` mode, gated on Build via
`chatPanelMaximizedAtom`). The fast, in-transcript read: every step with tool I/O. **Unchanged.**
- **Turn Inspector** — new. A dedicated panel opened from a turn, for the deep dive. Its own
shell and its own Jotai state. It shares only generic UI primitives (e.g. `EnhancedDrawer` from
`@agenta/ui`); it does **not** import or mutate the trace-drawer store.

## 4. The Turn Inspector

**Gating:** Build mode only (`!chatPanelMaximizedAtom`), same signal the inline log uses.

**Open affordance:** an "Inspect turn" control on each assistant turn (near the existing
`View full trace` link / the turn's hover actions). Opens a right-side drawer focused on that
turn. Optional deep-link: clicking a step in the inline log opens the panel scrolled to that step.

**Three tabs:**

| Tab | Shows | Data source |
| --- | --- | --- |
| **Timeline** | Every interaction in order: reasoning, each tool call with full input/output/error, text, and HITL events (approval requested / approved / denied). The step log, exhaustive and un-truncated. | The turn's `UIMessage.parts` — already on the client. No new plumbing. |
| **Context** | *What the agent ran with this turn* — the effective config (instructions / model / tools) as of that turn, and the exact `messages` array sent (post-`hasAnswer` filter). When a turn made multiple requests (HITL/auto-resume), shows all of them with a diff. | The per-turn capture (§5). |
| **Raw** | The literal outgoing request body and raw response/events, for copy-paste repro and bug reports. Copy-as-JSON. | The per-turn capture (§5) + read-only reads of trace data via existing atoms (no trace-drawer UI). |

The **Timeline** tab ships essentially for free (message parts). **Context** and **Raw** carry
the real new value and the only real new engineering, because "what we actually sent" is not
persisted anywhere today.

## 5. The per-turn capture (the one new data mechanism)

**Decision: capture-at-send, not reconstruct-on-demand.** Reconstructing later re-reads the
*current* config/messages, which have already drifted (exactly the bug we chase). We snapshot at
the moment of sending, so Context/Raw are accurate *at that turn*.

**Where:** in the transport's `prepareSendMessagesRequest` in `AgentChatPanel` — the exact spot
the temporary `console.warn` lives now. The built `req` is already in hand. This **productizes
and replaces the temp `[AgentChat OUTGOING]` diagnostic** with a structured store write.

**Snapshot shape (per send):**

```ts
interface TurnRequestCapture {
requestId: string // nonce, generated at send
at: number // Date.now() at send
triggerUserMessageId: string // last role:"user" message id in the sent array
parameters: unknown // config-as-sent (config-at-turn)
messages: unknown[] // exact array sent (post-hasAnswer)
references: unknown // as sent
sessionId: string
invocationUrl: string // body/URL only; auth headers + secrets stripped
}
```

**Correlation (the subtle bit):** key each capture by `triggerUserMessageId` = the id of the
last `role:"user"` message in the sent array. The initial send **and every HITL/auto-resume of
that turn share it**, so one turn maps to a *list* of captures. Given an assistant turn, find its
preceding user message → pull all captures for that id.

**Payoff:** keeping *every* send lets the Context tab show "this turn = N requests" and diff
them — which surfaces the re-commit loop and the stale-config re-injection directly. That view
is what was missing all session.

**Storage:** a session-scoped, in-memory Jotai atom (ephemeral — this is a debugging surface),
capped to the last N turns to bound memory. Not `localStorage` (payloads can be large);
persistence is a possible later option.

**Redaction:** capture the request *body* only; strip `Authorization` and any secret-bearing
headers. The body itself must not carry secrets.

## 6. Phasing

1. **Inspector shell + Timeline tab** — no new plumbing (renders `UIMessage.parts`). Ships value
immediately and is independently useful.
2. **Capture store + Context tab** — the snapshot mechanism (§5) + the config/messages view,
including the multi-send-per-turn diff. Replaces the temp diagnostic.
3. **Raw tab** — literal payloads, copy-as-JSON, read-only trace correlation.

## 7. Testing

- **Unit:** the capture reducer + correlation — `triggerUserMessageId` grouping, resends
collapsing into one turn, N-cap eviction.
- **Component:** Timeline renders reasoning / tool-I/O / HITL in order; Context renders config +
messages; empty/no-capture states (a turn hydrated from a prior session has no capture).
- **Manual:** Build-mode-only gating; a HITL/resume turn shows multiple captures and a usable diff.

## 8. Open questions / future

- Deep-link from an inline step into the panel (nice-to-have, Phase 1 or later).
- Persisting captures across reload (currently ephemeral).
- "Re-run / fork from this turn" actions (out of scope now; the panel is read-only).
- Whether Timeline should also show server-recorded events not present in `UIMessage.parts`
(would need a read-only trace correlation, same as Raw).

## See also

- `docs/design/agent-workflows/documentation/agent-configuration.md` — the schema-driven config
the Context tab surfaces.
- The `hasAnswer` filter in `web/packages/agenta-playground/src/state/execution/agentRequest.ts`
— why the sent `messages` differ from the rendered transcript.
Loading
Loading