feat(agent-platform): core Agent Platform feature branch (all agent platform services)#63988
Conversation
The build command `--filter '@posthog/quill-*'` matches all quill packages including quill-charts, but the install step only installs transitive deps of agent-console (which doesn't depend on quill-charts). This causes tsc/module resolution failures. Replace the glob filter with explicit package names (tokens, primitives, components, blocks, quill) to exclude quill-charts from both the Dockerfile and CI workflow build steps.
Same effect as the previous commit but lets pnpm derive the build set
from quill's workspace dep graph. `--filter '@posthog/quill...'`
selects `@posthog/quill` plus every workspace package it depends on
(tokens, primitives, components, blocks today; whatever it depends
on tomorrow). quill-charts is excluded naturally because quill
doesn't depend on it — same outcome as the explicit allowlist, but
new quill subpackages don't silently fail to build the next time one
gets added.
Only changes the four agent-console contexts. Left untouched:
- packages/quill/package.json — workspace root build, legitimately
builds every member when invoked from inside the quill workspace.
- services/mcp/package.json — mcp depends on `@posthog/quill-charts`
directly so its install path includes charts' deps. Building
charts there is correct.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… allowlist
Substantial batch covering the agent platform's authoring + Slack flows
plus the agent-console's session + access UX.
Slack BYO:
- Per-app SLACK_BOT_TOKEN + SLACK_SIGNING_SECRET via TRIGGER_REQUIRED_SECRETS
- Native @posthog/slack-* tools read from ctx.secret() (no team integration)
- Django serializer surfaces computed slack_events_url / slack_interactivity_url
- AGENT_INGRESS_PUBLIC_URL Django setting + ingress boot log echoes it
- bin/agent-tunnel: cloudflared wrapper that writes the URL into .env.local
and removes it on exit
- bin/mprocs.yaml: agent-ingress re-sources .env.local on each restart
Concierge native writes:
- 13 new native @posthog/agent-applications-* write tools
(create, partial-update, revisions-{create,new-draft,partial-update,
file-update,validate,freeze,promote,archive}-create, env-keys-list/get,
set-env-create) so the concierge no longer depends on the MCP transport
for authoring
- Promoted concierge bundle drops the MCP block, uses natives
- Updated agent.md, authoring-new-agents, secrets-and-integrations,
setting-up-slack-app skills with worked spec example, focus_* slug
requirement, promote-before-URL ordering for Slack
Console:
- Per-browser session history menu in DockHeader (localStorage backed,
20-entry FIFO, terminal entries kept for read-only playback)
- runner.switchToSession() to attach to a past thread without unmount
- AGENT_CONSOLE_ALLOWED_TEAM_IDS env gates OAuth callback + every
/api/auth/me refresh; defaults to "1,2"
Branch unwedge:
- posthog/jwt.py: restored get_oidc_verification_keys + signing-key
helpers that a bad merge dropped
- posthog/api/__init__.py: sdk_doctor -> sdk_health rename
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ack_reaction Three additions to the slack trigger config that let agents handle channels the way real Slack bots do — without re-@-mentioning every turn, and with instant feedback that the bot saw the message. mention_only (now enforced): Drop plain message events; only app_mention events seed sessions. Default false to preserve back-compat for bots that subscribe to message.channels by design. auto_resume_threads: Relaxes mention_only for replies in threads the bot already owns. When a message event comes in with a thread_ts matching an existing session's external_key, the trigger accepts it and continues the conversation. Sessions seeded this way carry `mention: false` in the [slack] envelope so the model can judge whether the message is actually addressed to it before responding. Defense in depth: lookup is a single indexed PG read, performed only when mention_only would have dropped. ack_reaction: Fire-and-forget reactions.add posted from the ingress on accept, using the agent's SLACK_BOT_TOKEN. Authored as an emoji name (e.g. "eyes"); the ack lands in Slack within the 3s event-ack window so users see "I saw it" even when the runner takes a moment to produce a real turn. Fails open: missing token, revoked auth, slack.com 5xx, already_reacted — all become silent no-ops; the session still enqueues. Wired: - Schema: services/agent-shared/src/spec/spec.ts + products/agent_platform/backend/spec_schema.py (Django mirror) - Handler: services/agent-ingress/src/triggers/slack.ts — gate runs before identity resolution so dropped events don't pay for the AgentUser write - Harness: pass opts.http to buildApp so tests can intercept reactions.add - 8 new e2e cases in slack-trigger.test.ts (mention_only=false back-compat, mention_only=true drop, auto_resume accept-owned, auto_resume drop-unowned, ack_reaction fires, ack_reaction unset no-op, slack 500 fails open, no SLACK_BOT_TOKEN no-crash) Concierge bundle tweaks for clearer set_secret guidance picked up alongside. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… racing creates
Two agent-platform pods (runner + janitor, plus replicas) call
`migrate()` on boot concurrently. The bundled node-pg-migrate's
`ensureMigrationsTable` issues a plain `CREATE TABLE` rather than
`CREATE TABLE IF NOT EXISTS`, so when two processes race past the
existence check both try to create the table — the loser crashes:
Error: Unable to ensure migrations table:
error: relation "pgmigrations" already exists
Wrap migrate() with a pg advisory lock + idempotent pre-create:
1. `pg_advisory_lock(MIGRATE_ADVISORY_LOCK)` — serializes the whole
migrate() across every process touching this database. Losers
wait, winners proceed; after the winner releases, the loser sees
a fully-migrated schema and runner() is a no-op.
2. `CREATE TABLE IF NOT EXISTS public.pgmigrations (...)` —
pre-creates the migrations table before runner() runs. Removes
node-pg-migrate's race window entirely.
3. Belt-and-braces: catch the duplicate-table error (42P07) in case
the bundled `ensureMigrationsTable` still trips on its own
existence check despite our pre-create.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the three optional slack-trigger config fields (mention_only,
auto_resume_threads, ack_reaction) to the concierge's vocabulary so
"can we make it respond with an emoji" routes to the slack-app skill
instead of falling through to a generic answer.
- agent.md: paragraph in 'Trigger-required secrets' that names all
three fields and points at the slack-app skill. Always-loaded
preamble, so the pointer is visible regardless of which skill the
model loads.
- skills/setting-up-slack-app.md: 'Tuning the slack trigger' section
that walks picking between the three patterns (mention-only,
mention+thread, react-to-everything), wires the JSON snippet, and
warns about the no-op pairings. Failure-mode table extended with
three new gates.
- spec.json skill description (applied live, not in template):
broadened with explicit keyword triggers ('respond with an emoji',
'only when I @-mention', 'reply in the thread') so the load gate
fires for behavior questions, not just for 'wire it up' authoring.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Today's GET /preview-token/ returns just token / expires_in /
ingress_slug. Callers (agents, scripts, dev tools) then have to grep
agent-ingress source to figure out:
- the ingress base URL
- which paths each trigger exposes
- that the preview-token gates revision routing but spec.auth.modes
still needs satisfying alongside
That source-archaeology now lives in the response. Three new fields,
all derived from the revision's own spec:
endpoints: { trigger_type: { route_name: absolute_url } }
Only triggers the spec declares; no phantom URLs. Empty when
AGENT_INGRESS_PUBLIC_URL is unset (so the caller can detect the
unconfigured-deployment state explicitly).
auth: { preview_token_header, preview_token_query, spec_modes, notes }
spec_modes mirrors the spec's auth.modes order so the caller picks
the first one its credential satisfies. notes explicitly calls out
the live-vs-preview gate split.
preview_proxy: { base, allowed_paths, notes }
Same-origin Django proxy. notes call out the auth-stripping limit
so the caller doesn't try it for an oauth-required agent.
Helpers + per-trigger route catalogue live in api.py (mirrors the
`routes` arrays in each trigger handler). Backwards-compatible — the
original three fields are unchanged. hogli build:openapi regenerated
the TS types and the MCP tool defs so downstream consumers see the
new shape.
6 new tests in test_preview_token.py cover: only declared triggers in
endpoints, mode order preserved, proxy URL rooted in correct
team/slug, empty endpoints when public URL unset, live-revision
rejected, missing revision_id rejected.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Every decision point gets a log line so a "why no emoji?" question is one
grep away from the answer:
slack_event_received debug per inbound event with all config flags
slack_event_dropped_mention_only info mention_only gate dropped a non-mention
slack_event_dropped_no_owned_thread info thread_ts didn't match an owned session
ack_reaction_not_configured debug trigger has no ack_reaction set
ack_reaction_no_bot_token warn SLACK_BOT_TOKEN missing from encrypted_env
ack_reaction_no_http_client warn ingress not wired with an HttpFetcher
ack_reaction_posting debug about to fire reactions.add
ack_reaction_ok info slack returned ok
ack_reaction_already_reacted debug slack retry — normal, not an error
ack_reaction_failed warn slack returned 5xx OR { ok: false, error }
ack_reaction_threw warn transport / fetch threw
Failure-mode separation: HTTP failure (5xx, network), application failure
(slack returned 200 + { ok: false, error: ... }), and `already_reacted`
(Slack's retry produced the same event twice; the surrounding idempotency
key dedupes the enqueue but not this fire-and-forget call). All three
were silently swallowed before — now traceable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… is set Previous version listed reactions:write only as conditional on the agent using @posthog/slack-react. Now ack_reaction on the slack trigger ALSO needs it — when the scope is missing the Slack API returns missing_scope and the ingress logs ack_reaction_failed but the session still enqueues (fail-open), so the user sees no in-Slack feedback and no obvious error. Added: - Step 1.3 (scopes): reactions:write callout covers both slack-react AND ack_reaction; instructs the concierge to inspect spec.tools[] AND spec.triggers[].config.ack_reaction before listing scopes; remediation steps for the "added ack_reaction after install" case (Slack requires a re-install for new scopes to mint a token that carries them) - Common failure modes row for ack_reaction_failed / missing_scope with the OAuth-page-then-reinstall recipe Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hans
The most common authoring foot-gun (especially for AI authors): write
tools/<id>/source.ts + schema.json into the bundle but skip adding the
{kind: "custom", id, path} ref in spec.tools[]. The runner only loads what's
in spec.tools[], so the model never sees the tool — it tries to call it,
fails, and reports "the tool isn't available right now" to the user. We
watched it happen twice in real concierge sessions before catching it.
ValidationReport now carries `warnings: ValidationWarning[]` alongside
`errors`. Warnings don't block freeze; freeze still gates on
`errors.length === 0`. Two codes today:
orphan_custom_tool_dir
bundle has tools/<id>/schema.json but no spec.tools[] entry with
that path. Concierge response is almost always: patch spec to add
the ref, re-validate. If genuinely WIP, delete the dir before freeze.
orphan_skill_file
bundle has skills/<id>/SKILL.md or skills/<foo>.md but no
spec.skills[] entry references it. Same shape — either wire it in or
drop the file.
Why warnings, not errors:
- Authors sometimes ship tool source they're iterating on but don't
want exposed yet. Hard error would block.
- Keeps the freeze gate clean (errors.length === 0) without two
different blocking semantics.
Concierge authoring-new-agents skill updated with a Phase 6 table that
maps each warning code to the right remediation, plus a "don't freeze
through warnings silently — ask the user" reminder.
Tests:
- 7 new cases in validate-spec.test.ts covering both warning codes plus
the source.ts-without-schema.json case (which should NOT warn — mirrors
the runner's schema-driven load semantics) and the warnings-coexist-
with-errors case
- All 26 validate-spec tests pass
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hema
Runtime services no longer call `migrate()` on boot. Migrations are
applied exactly once per chart sync by a one-shot k8s Job running as
the `agent-migrator` Aurora role (provisioned in the companion
cloud-infra PR). The Job is wired up in the companion charts PR.
Removes today's failure cascade:
- N pods racing past `ensureMigrationsTable` and tripping
`relation "pgmigrations" already exists` (advisory-lock fix
landed earlier today; this removes the root cause)
- "permission denied for schema public" / "permission denied for
table agent_session" — runtime roles no longer need DDL since
they don't run migrations
Changes:
- services/agent-{runner,janitor}/src/index.ts: drop the
`await migrate({ databaseUrl: config.agentDbUrl })` call and the
`import { migrate } from '@posthog/agent-migrations'`.
- services/agent-{runner,janitor}/package.json: move
`@posthog/agent-migrations` from `dependencies` to
`devDependencies` (still needed by tests for the reset/migrate
harness via `services/agent-{janitor,runner}/src/*.test.ts`).
- pnpm-lock.yaml: regen.
The migrate.mjs bundle entry in services/agents/scripts/build.ts is
already wired (the image's been shipping it; chart Job consumes it
via `node services/agents/dist/migrate.mjs up`).
Local dev: still works, since the local-dev Postgres uses the
superuser `posthog` role. Anyone running an agent service against
a fresh Postgres without pre-running migrations now needs
`pnpm --filter @posthog/agent-migrations migrate` first.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Authors now write tools/<id>/source.ts + schema.json only — the janitor runs esbuild at freeze to produce tools/<id>/compiled.js inside the bundle. Validate stays pure: it parses source.ts via esbuild to catch syntax errors early without writing anything. Freeze aborts if any custom tool fails to compile. Drops the prior compiled.js authoring step from the concierge's mental model — hand-compiling TS by stripping annotations was both fragile and now collides with the freeze-time build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er isDev() `bin/start` running the runner against the local agent-sandbox-host image (built per services/agent-sandbox-host/README.md) needed SANDBOX_HOST_IMAGE set explicitly otherwise the schema rejected the unset value. Default it to `posthog/agent-sandbox-host:dev` under isDev() so the local dev loop works without configuration; prod still has to set it explicitly. Tests cover dev default, prod must-be-set behavior, and explicit override precedence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nd-written compiled.js
Two enforcement layers on top of the compile-at-freeze step so a
syntactically-valid but runtime-incompatible custom tool can't reach
a live agent and dispatch with action_not_found mid-conversation:
1. Compile step now vm-evaluates the esbuild output and confirms
`module.exports.default ?? module.exports` is `{ actions: { default: fn } }`
— the shape the runner's sandbox loader requires. Bare-function
exports, missing `actions` map, and wrong action keys all fail
freeze with a specific message instead of going live.
2. PUT /revisions/:id/file and PUT /revisions/:id/bundle now reject
any path matching `tools/<id>/compiled.js` with 422
compiled_js_is_generated. Previously hand-written compiled.js was
silently overwritten on the next freeze, which read to the model
as "my edits aren't landing" and produced multi-turn debugging
spirals.
Concierge authoring skill updated with the canonical source.ts
template + a table of look-alike export shapes that fail, so the
model doesn't have to discover the runtime contract by trial.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A session that crashes before the agent can post (sandbox acquire, MCP
open, secret resolve, in-loop model_error) used to leave its
originating Slack thread silent — the runner marked the row failed but
never reached back out. Sessions like 99386c19-cfd6 (docker-image
fallback case) sat untouched in PG with no user-visible signal.
Wires a small FailureNotifier interface in agent-shared with a
SlackFailureNotifier impl. The slack trigger now stamps
`trigger_metadata: { type: 'slack', workspace_id, channel, ts,
thread_ts }` at enqueue; the worker's pre-runSession catch block reads
it and posts a sanitized message back to the thread after the queue
row is marked failed. Raw reasons go through `categorize()` →
`userFacingMessage()` so docker / MCP / Kafka detail never leaks into
a customer channel — raw text still lands in log_entries +
conversation.errorMessage for owner-facing debug.
Lifts SlackSigningSecretResolver from agent-ingress → agent-shared
(both services now resolve the bot token through the same shared
EncryptedEnvSlackSecretResolver class).
Design sketch: /Users/benwhite/.claude/plans/agent-failure-notifier.md.
Deferred: the symmetric driver.ts emitFailure() hook for in-loop
failures, the ⚠️ react-on-progress fork, the janitor reaper for
runner-crash gaps, and the public session URL link. Steel thread
first.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ts list The right-hand "Live now" column wasn't earning its space. Drops it in favour of richer per-agent rows that show live · 24h · failed (+rate) · spend · last run inline, fanned out from the existing per-application stats endpoint. Layout flex-wraps so the stat group drops below the name/description on narrow widths instead of collapsing into itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the generic file-grain bundle store (PUT /file?path=X, PUT /bundle
with mode) with typed resource endpoints:
PUT /revisions/:id/agent_md { content }
PUT /revisions/:id/spec { spec } # author-facing slice
PUT /revisions/:id/skills/:id { description, body, files? }
DELETE /revisions/:id/skills/:id
PUT /revisions/:id/tools/:id { description, args_schema, source }
DELETE /revisions/:id/tools/:id
GET /revisions/:id/bundle -> { agent_md, skills, tools, spec }
PUT /revisions/:id/bundle full replace of the typed shape
Authors no longer write file paths. spec.skills[] and spec.tools[] (custom)
are server-derived at freeze from the typed resources in the bundle, so
orphan files, dangling refs, and rename-without-spec-patch are
structurally impossible.
Tool upload runs AST shape check + esbuild compile synchronously inside
PUT /tools/:id. Required source shape:
export default { actions: { default: async (args, ctx) => { ... } } }
Bad shapes (bare functions, missing actions, wrong key, dynamic factory
exports) return 422 tool_compile_failed with structured diagnostics
before any S3 writes -- no runtime dispatch failure.
Performance fixes shipped alongside the API:
- clone_from now parallelises bundle.copy() (was sequential; 15-file bundles
used to time out Django's 30s read timeout mid-clone, leaving half-
written drafts).
- The freeze pipeline calls bundle.list() exactly once and threads the
result through deriveAndPersistSpec + bundles.freeze(precomputedEntries),
eliminating ~50 redundant S3 HEADs per freeze.
- The freeze endpoint is now idempotent: if .frozen is already on disk
(Django proxy timed out mid-call), it re-derives the sha from the
existing manifest and returns it so the caller can stamp the row and
recover from the inconsistent state.
Test coverage:
- 28-case e2e suite at services/agent-tests/src/cases/typed-bundle-authoring.test.ts
- 16-case AST shape unit suite at services/agent-janitor/src/compile-custom-tools.test.ts
- 3 perf regression tests in server.test.ts using a Proxy-wrapped bundle
store that counts bundle.list() calls and tracks peak bundle.copy()
concurrency -- pins parallel-copy + cached-list invariants
Companion changes:
- legacy /file + /bundle (with mode) endpoints removed from janitor + Django
- legacy file-update / file-retrieve native tools replaced with typed
equivalents
- agent-console flattens the typed bundle response into BundleFile[]
- validate-spec drops orphan_skill_file / orphan_custom_tool_dir warnings
- sandbox-inprocess drops the legacy { run: fn } wrapper shape
- concierge spec rewritten to use the new typed native tools
See docs/agent-platform/plans/typed-bundle-authoring-api.md for design,
BUILD_NOTES.md for issues encountered + decisions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Bump agent-console + agent-chat to storybook ^10.2.17 (resolved 10.4.1); drop storybook-dark-mode and @storybook/theming in favour of v10's built-in globalTypes toggle in preview.tsx. - Pin framework to the workspace-resolved @storybook/react-vite path in main.ts so pnpm doesn't pick up sibling v8 installs (common/storybook, quill/apps/storybook). Re-derive __dirname from import.meta.url for ESM main load. Force esbuild.jsx='automatic' so component files don't need `import React`. - Add a tiny reactive router store (`router-store.ts`) backing the next/navigation + next/link mocks. router.push / <Link> clicks soft-nav by updating the store; usePathname/useSearchParams/useParams read from it via useSyncExternalStore. - AppShell story now hosts a <StoryRoutes> switch that resolves the current path to the right page (agents list, agent detail w/ tab segment, registry, billing). Three navigable entry stories (NavigableShell, StartOnConfiguration, StartOnSessions) + the legacy single-page stories kept for quick visual review. - focus_* client tools the dock invokes use router.push already, so they soft-nav in the story for free. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Config panel (used by RevisionsBrowser's Config card): - Flat divider-separated sections instead of nested rounded cards-in-cards. Each section is one `space-y-2 px-3 py-3` band; the outer Config card hosts them via a `divide-y` ConfigPanel root, so the dividers hug section edges and double-padding is gone. - Tools sub-grouped by kind (native / client-fulfilled / custom / custom_template) as inline collapsible groups with card grids. Native and client tool cards now open the existing detail dialog; custom tools keep linking into the bundle viewer. - MCPs expose their curated sub-tool list inline with approval-required markers — previously hidden behind the URL row. - Per-section info toggle on the right of the header (InfoIcon). Open state is `bg-primary text-primary-foreground` and the panel below uses `bg-primary/10` + `border-l-2 border-primary` so the icon and its panel read as one unit. Default copy describes what each section means; tool-group info copy explains the runtime difference between native / client / custom. - Skills render as a 2-col card grid with truncated description + title tooltip — handles the concierge's 13 skills without becoming a wall of prose. - One global filter input lifted into the Config card's header (passes through ConfigPanel as a controlled `filter` prop). Filters tools, MCP sub-tools, and skills in place; per-group counts show `n/total` while filter is active. - Model section restored at the top — chip + reasoning level. - Replaced the `structured | raw` segmented control with a single `<CodeIcon /> RAW` toggle on the right of the header. Inactive: outlined; active: solid `bg-primary` button with shadow so the mode is unmistakable. - `highlightedSection` highlight is now a 2px primary left-accent + soft `bg-info/5` row instead of a colored ring around a card — keeps focus_spec_section legible in the flat layout. - `UnstructuredFields` flattened to match — no more inner rounded card; renders as a `border-t` band continuing the section rhythm. - Drop the standalone ConfigPanel.stories.tsx; the navigable AppShell stories (StartOnConfiguration) are now the review surface. BundleTree: - Markdown viewer supports inline emphasis (bold/italic), links, blockquotes, ordered lists, and horizontal rules in addition to the existing heading/paragraph/ul/code blocks. - New regex-based TypeScript / JavaScript highlighter for `.ts` / `.tsx` / `.js` / `.jsx` files (and for fenced code inside markdown). Tokens: keywords, strings, comments, numbers, function-call identifiers, PascalCase types. No new deps. - `compiled.js` and other `.js` files now resolve to the same highlighter via `languageForPath()`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tamps it
Django's freeze view wrapped the janitor HTTP call in transaction.atomic(),
holding the agent_revision row lock. The janitor's deriveAndPersistSpec then
tried to UPDATE the same row from a different connection and blocked for the
full Django proxy timeout (~120s, instrumented + confirmed). Concierge freezes
hit this every time.
Fix: janitor no longer writes agent_revision.spec. deriveSpec computes the
derived spec (skills + custom tools from the typed bundle) and returns it in
the freeze response. Django stamps state + sha + spec in a single save(),
no atomic block needed — the janitor's idempotent freeze covers the partial-
failure recovery path that atomic used to.
Companion changes:
- New instrument({ key, log, context }, fn) helper in agent-shared/runtime
mirroring nodejs/src/common/tracing/tracing-utils.ts:instrumentFn, minus
Prometheus + OTEL deps. Replaces inline Date.now() timing markers in the
freeze + clone_from pipeline.
- Concierge gains a `choosing-the-model` skill that walks the user through
the cost/quality tradeoff before setting spec.model, recommends per job
category, and waits for an explicit pick instead of defaulting.
- Idempotent freeze path also returns the derived spec so the recovery
caller can stamp it on a retry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`set_secret` (and any future render-style client tool whose UI needs
unbounded user time) now parks the session instead of awaiting on a
5s/60s in-process bus timeout. Mirrors the approval-gated tool pattern:
synthetic queued envelope from `execute` → loop unwinds → worker hands
the session back to the queue → user submits via `/send` → marker in
pending_inputs → runner's resume scanner injects a wake message → model
sees the real outcome on a fresh turn.
What changed:
- Spec: new `interactive: boolean` field on `kind: "client"` tools
(services/agent-shared/src/spec/spec.ts + products/agent_platform/backend/spec_schema.py).
When true, the runner skips `dispatchClientTool` and returns a
`{queued, interactive, call_id, tool_id, message}` envelope from
`execute`. timeout_ms cap raised to 600s for the non-interactive path.
- Marker: new `__POSTHOG_CLIENT_TOOL_RESULT__:<json>` shape parallel to
the approval marker, lives in agent-shared/runtime so both ingress and
runner can use it.
- Ingress `/send`: ChatSendBodySchema accepts either `{message}` or
`{client_tool_result: {call_id, result | error}}`. The latter writes
the marker into pending_inputs + re-queues — no new endpoint.
- Runner: `getSteeringMessages` scans pending_inputs for client-tool
markers (before the approval-marker check), synthesises a wake user
message carrying the real outcome envelope, and emits a
`client_tool_result` SSE event so live consumers see the closure.
- Frontend: render-style resolves now POST `/send` via the new
`sendClientToolResult` helper. Reducer recognises the queued envelope
on `tool_result` and flips the part to `fulfillment: 'client'` without
setting `result` — keeps SecretInline mounted. New `client_tool_result`
case finalises the result across any assistant turn. Conversation
reconstruction does the same on reload (`apiClient.conversationToTurns`).
- Concierge: `set_secret` marked `interactive: true`, description +
`secrets-and-integrations` skill updated to teach the park+wake loop.
Seed script updated for the new typed-bundle authoring API.
E2e: `services/agent-tests/src/cases/interactive-client-tool.test.ts`
covers the happy path (park → /send result → wake → model resumes) and
the error variant (`/send` with `error` → wake envelope carries ok:false).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ConciergeDock used to render <FixtureConciergeDock /> for both
"resolving" and "404 / not-deployed" states. A user who typed before
`getAgent(slug)` resolved hit the fake runner and got back the fixture
fallback ("Got it. In the real build I'd take the next step — for the
v0 mock I only have a handful of scripted responses wired up. Try one
of the suggested prompts at the top of the dock?") even though the
agent was deployed and would have answered in another second.
Split the resolution into three explicit states: `pending` (loading
placeholder, no input), `not_deployed` (genuine 404 → fixture, as
before), `resolved` (real runner). Non-404 errors now stay in
`pending` rather than falling through to the fixture, so a transient
network/auth blip can't surface a mock reply either.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Production code no longer imports `useFakeRunner` or the fixture
scripts (`conciergeScripts` / `fallbackScript` / `waitingSession`):
- `useFakeRunner` moved off the main `@posthog/agent-chat` entry. It's
re-exported from `@posthog/agent-chat/fixtures` only, so anything
pulling it in is explicitly on a non-production import path
(Storybook stories, the console's `mockApi`). Anyone who reaches
for it from a real path gets a build-time miss.
- `FixtureConciergeDock` deleted from `Dock.tsx`. The dock used to
fall back to it whenever the concierge slug didn't resolve — both
in a genuine 404 and in the brief window before resolution — and
the v0-mock fallback string was reaching real users. ConciergeDock
now shows a small text stub ("Loading concierge…" / "No concierge
deployed for `<slug>` in this project.") for the pending and 404
states. No fake runner, no scripted fallback.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion `useSetDockConciergeAgent` runs `setConciergeAgent(null)` on layout unmount, so navigating between two pages that both want the same concierge transitioned the slug `agent-concierge → null → agent-concierge`. ConciergeDock reacted to the transient null by resetting state to `pending` and starting a fresh `getAgent` fetch, which (a) re-mounted RealConciergeDock + lost in-flight chat state, and (b) could leave the dock stuck on "Loading concierge…" if the two fetches raced through their cancel logic. Fix: ignore transient nulls, cache the resolved teamId:slug in a ref so the same combo never refetches, and keep the last resolved state on transient errors instead of dropping to `pending`. The dock now stays stable across route changes and can't get stranded on the loading stub. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The concierge header had 7 chrome buttons plus three status pills, which read as busy at typical dock widths. Consolidated: - Focus toggle drops its "Focus" label and becomes an icon-only state pill — the eye/eye-off + state colour already says enough at a glance, freeing horizontal space. - "Open in session view", "Dock to side / Float panel" and the existing "Render markdown" toggle moved into a single settings dropdown (gear icon). The standalone open-session and dock-mode toggles are gone; the open-session entry only appears when there's an active session id. - "New" stays as a labelled button — it's a primary action and deserves the label. - Session history dropdown, hide-dock chevron (keyboard shortcut'd), and the status pills are unchanged. Net: 7 buttons → 4 buttons + 1 dropdown, without losing any functionality. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ bug fixes Header redesign: - The dock header is now two rows. Row 1 is mode + status pills on the left, chrome controls (focus, history, settings, hide) on the right. Row 2 is the primary "New" action on the left and the page subject — with an optional "on <agent name>" sub-line so the user always knows which agent the concierge is reasoning about, even when the page title is generic (Documentation, Sessions, etc). - "New" is now a solid bordered/shadowed button on the far left so the primary action stands apart from the secondary chrome. Focus indicator (new): - A thin info-coloured bar pinned to the top edge of the viewport appears only when focus mode is on AND there is an active concierge session id. Communicates "concierge is following you" without stealing chrome. Click to pause focus mode. - Plumbed via a new `activeConciergeSessionId` field on DockStore; RealConciergeDock pushes the live session id into it. ConciergeDock fixes: - Dropped the resolvedKey ref optimisation that could leave the dock stuck on "Loading concierge…" when exiting playground (in StrictMode, the ref survived across the remount while the state reset to `pending`, so the early-return guard skipped the fetch). Always fetch on a non-null slug; the functional setState still avoids the pending flash when the same slug is already resolved. - Errors still leave the previous state untouched instead of falling back to pending — no regression on the original navigation-stability fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per spec: Row 1 (top): - Left: a single mode pill — "Concierge" (success-tinted bg + steady green dot) or "Playground" (primary bg + animated dot + the agent name appended). Status pills (Draft, Reconnecting) sit alongside. - Right: config controls only — Focus toggle (concierge), Settings menu (gear), Hide dock (chevron). Row 2 (bottom): - Left: the session label — first-message snippet from history, else "Started Xm ago" relative timestamp, else "New conversation" when there's no active session yet. - Right: History dropdown + New button (or Exit in playground), as a regular outline button now that it sits alongside History instead of standing alone as the primary action. `describeContext` import dropped since the mode label is now derived locally from `context.mode` rather than the package helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A `border-b border-border/60` on the mode/config row (with matching `border-primary/30` in playground mode so the divider tones with the playground tint) makes the two sections read as distinct strips instead of one busy block.
Horizontal padding moved from the container to each row, so the border between Row 1 and Row 2 runs edge-to-edge instead of being inset by the container's `px-3`.
veria-ai review on #63988: - bump undici 7.8.0 → ^7.24.0 in agent-shared (resolves 7.24.8). 7.8.0 has OSV advisories on the tool egress path (request/response smuggling + decompression resource-exhaustion) fixed in the 7.24.x line. - http-request: stream the response body and stop at max_response_bytes, cancelling the stream at the cap, instead of res.text()-then-slice — so an oversized or highly-compressed response is never fully materialized before truncation. Adds a streaming-path test. Tool consolidation: - remove @posthog/web-search — never wired in prod (the provider is only set in tests), so it always threw "web.search provider not configured". - remove @posthog/web-fetch — a strict subset of @posthog/http-request (GET is the default method). Example specs, docs, and case tests migrated to @posthog/http-request. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ience Rework the posthog auth/tool tenancy model so a single agent (e.g. the agent builder) can serve a whole org while acting as the calling user: - Tools take an explicit project_id. The @posthog/* data tools no longer derive their operating team from the session principal; each project-scoped tool takes a project_id arg, and the agent resolves it via the get_context client tool or the new @posthog/list-projects tool. Removes posthogUserTeamId from ToolContext. - Add @posthog/list-projects (minimal id/name/org) for disambiguation. - posthog auth mode gains audience: 'project' | 'organization' (default 'project'). The ingress verifier enforces it — project: caller can access the agent's owning team (RBAC-aware probe); organization: caller is a member of the agent's owning org (resolved via a cached posthog_team lookup, since the revision store and Django DBs differ). Failures return not_in_project / not_in_org. Opening to any user across orgs is intentionally not yet expressible. - Bind agent-concierge to its owning organization; teach its agent.md to resolve the project before project-scoped tools. - Mirror audience in the Django spec schema (spec_schema.py). - Tests: verifier audience cases (10/10), e2e auth modes (13/13), Django spec schema (38/38), query/_posthog-api updates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
MCP ref url/header `${SECRET}` substitution checked only that the secret
existed, never its `spec.secrets[].allowed_hosts` binding. An author could
point `mcps[].url` at a host they control and set
`headers.Authorization = "Bearer ${SLACK_BOT_TOKEN}"`, exfiltrating an
encrypted-env secret they otherwise can't read.
Mirror `@posthog/http-request`'s final-URL host binding: resolve the final
URL first, reject bare-string/unbound secrets, and substitute URL/header
placeholders only when the final host matches the secret's allowed_hosts.
Wire the lookup from spec in the worker, and add a freeze-time check in the
janitor so bare-string MCP secrets are caught before deploy.
Fail-closed tightening: existing specs referencing a bare-string secret in
an MCP url/header will report that MCP as unavailable until converted to the
object form `{ name, allowed_hosts: [...] }`.
Generated-By: PostHog Code
Task-Id: 2fd40826-ab6a-49b7-9e56-f88bf4cc90c5
# Conflicts: # pnpm-lock.yaml
…udience Commit ccdbed2 ("per-tool project_id + explicit posthog auth audience") changed the @posthog/* data tools to take an explicit project_id arg and added audience to the posthog auth mode, but left several tests asserting the old principal-derived-team behavior, leaving the branch red. - agent-shared: per-trigger auth test now expects the audience: 'project' default on the posthog auth mode. - agent-tests: pass project_id on @posthog/query and @posthog/agent-applications-list calls (the harness query echo only matches a numeric /api/projects/<id>/query/ path); rework posthog-tool-auth to assert the explicit-project contract. Each affected file passes in isolation; the suites must run serially (shared real-PG test DB). Generated-By: PostHog Code Task-Id: 2fd40826-ab6a-49b7-9e56-f88bf4cc90c5
Query snapshots: Backend query snapshots updatedChanges: 2 snapshots (2 modified, 0 added, 0 deleted) What this means:
Next steps:
|
Restore the quill package to master: remove the branch-added `ghost` button variant (button.tsx + button.css) and the @types/react / @types/react-dom devDeps added to the blocks and quill package.json, and resync the lockfile importers. These leaked onto the integration branch from now-removed agent-console work; the quill package should track master. Generated-By: PostHog Code Task-Id: 2fd40826-ab6a-49b7-9e56-f88bf4cc90c5
Drop the leaked agent-platform explanatory comments (and stray blank line) from bin/migrate and rust/bin/migrate-entry, restoring both to master. Generated-By: PostHog Code Task-Id: 2fd40826-ab6a-49b7-9e56-f88bf4cc90c5
These frontend files carried changes unrelated to the agent platform — merge drift / incidental tweaks that leaked onto the integration branch. Restore them to master: - lib/api.ts — branch was behind master's tracingSpans pagination - TaxonomicFilter/headless/AutocompleteInput.tsx — stray RefObject tweak - integrations/SlackIntegration.stories.tsx — dropped story component - vite.config.mts — unrelated @marsidev/react-turnstile optimizeDeps hint Kept the agent-platform-essential frontend changes: the AGENT_PLATFORM feature flag, the agent_approvals API scope (scopes.tsx + types.ts), personalAPIKeysLogic flag gate, and the jest ignore for agent_platform node services. Generated-By: PostHog Code Task-Id: 2fd40826-ab6a-49b7-9e56-f88bf4cc90c5
| ) | ||
| if (allowed) { | ||
| log('info', 'tool.dispatch.per_asker_authorised', { tool: id }) | ||
| return real(toolCallId, (args ?? {}) as Record<string, unknown>) |
There was a problem hiding this comment.
High: Approval bypass for session-principal tools
When a gated tool’s policy includes session_principal, this branch runs the real tool immediately if the latest user sender matches the session principal. An attacker who can influence content the agent reads can steer the model into calling a gated destructive tool, and it will execute as the user without the separate approval UI or explicit decision step; this is especially risky for the agent-management tools that can promote or archive revisions.
Use session_principal to scope who is allowed to decide the approval, but still queue the approval and require an out-of-band decision before real(...) runs. If a fast path is required, it should be tied to a fresh explicit UI confirmation token, not just the model emitting the tool call after a matching user message.
# Conflicts: # frontend/jest.config.ts # pnpm-lock.yaml
There was a problem hiding this comment.
LGTM
This is a well-engineered feature branch. Security controls are solid: preview tokens are scoped with short TTL + audience claims, secrets use nonce-based indirection with session-scoped lifetime, approval authorization properly gates on SessionAuthentication or explicit agent_approvals:write scope, and sandbox resources are hard-capped via Zod schema. Race condition handling uses FOR UPDATE SKIP LOCKED for queue claims and transactional re-reads for idempotent elevation decisions. The defineRoute migration centralizes validation correctly.
CI Failure: E2E Hobby CI (not caused by this PR)
The "Wait for Docker image build" job timed out because the Container Images CI workflow never ran for this commit — the E2E Hobby CI waited 1 hour polling for a check that was never created. This matches a known pattern of Hobby CI failures unrelated to PR changes.
Tag @mendral-app with feedback or questions. View session
| # put a human in the loop at decide time can request it via consent. | ||
| authenticator = request.successful_authenticator | ||
| is_session = isinstance(authenticator, SessionAuthentication) | ||
| is_oauth_with_decide_scope = isinstance(authenticator, OAuthAccessTokenAuthentication) and ( |
There was a problem hiding this comment.
Medium: OAuth tokens can bypass human-only approvals
This treats any OAuth access token with agent_approvals:write as equivalent to an interactive human decision. A third-party OAuth client that gets a team admin to consent once can later approve or reject queued tool requests without a live user action, including requests whose spec set allow_agent_approver: false; restrict this path to a trusted first-party client or require a per-decision proof from the interactive app.
Rebuilt onto current master after the agent-platform feature branch (#63988) merged — all underlying agent_platform/services restructuring is already in master; this commit isolates the remote-authoring deltas. Bundles the prior PR commits (see PR description for details): - expose authoring playbooks over MCP (agent-resolve-resource tool, scope-aware tool surface, MCP resources at posthog://agent-platform/ playbooks/<id>) - serializer fidelity for first-revision authoring (bundle_uri allow_blank + server-side fs://<slug>/ fill, preview-proxy request serializer) - round-trip provider-safe tool names so @posthog/meta-end-turn dispatches on the first try - migrate example specs to canonical auth.modes[] - warn on provider-safe-name collisions - gate preview-proxy run/send/cancel behind agents:write Co-Authored-By: Danilo Campos <danilo@posthog.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The long-lived feature branch for the Agent Platform — the shared home for all agent platform services (agent-ingress, approvals, sandbox, real-inference, MCP stream handler, chat handlers, and the supporting Django + Node service packages). Targets
master; service work lands here, and the branch is kept current withmastervia periodic merges.Scope
Consolidated the original in-flight agent-platform PRs (each squashed to one commit):
#62774 · #62901 · #63903 · #63908 · #63921 · #63922 · #63929 · #63930 · #63936 · #63941 · #63943 · #63946 · #63948 · #63949 · #63950 · #63953 · #63954
Plus a fix to the pre-existing
real-inference@posthog/queryauth gap (also up as #64002). Excludes #62772 (agent-platform-coding).Bot-feedback follow-ups
Additional commits address open veria-ai and greptile findings raised across the rolled-up PRs — landed directly on
agent-platformrather than re-cycling each individual PR.Security / correctness:
max_memory_mb≤ 16384 MiB,max_cpu_cores≤ 8 — in both the Django spec schema and the shared ZodSpecLimitsSchema. Closes the unbounded-limits DoS surface.allow_agent_approver: Falsegate on/approvals/<id>/decidefrom "reject Personal API keys" to "allowlistSessionAuthentication". The PAT-specific check left OAuth bearers withagents:writeas a bypass path. From fix(agent-platform): reject PATs on approval-decide when agent approver disallowed #63953.application_idcross-checks inapprovals_retrieveandapprovals_decide—getForApplicationalready 404s upstream when the URL's application id doesn't own the approval. Updates the matching test to mock the janitor 404 (the real production path). From fix(agent-platform): add tenant-scoped reads for approval + revision stores #63946.Architecture cleanup:
mcpStreamHandlerto thedefineRoutepattern the chat handlers use, dropping the bespokesafeParse(req.query)+invalid_queryerror path.agent-ingress-scoped-session-fetchsemgrep rule with a second pattern (queue.get($ID)) to close the destructuring escape hatch.DecideElevationResulton thedecisionfield —aclEntryrequired on thegrantarm, absent ondecline. Removes theresult.aclEntry!non-null assertion inapplyElevationGrant. From fix(agent-platform): harden slack signature ts + make elevation decisions atomic #63941.Test-style sweep (CLAUDE.md's "prefer parameterized tests"): collapsed individual
it-per-case blocks across #63922, #63929, #63930, #63936, #63941, #63943, #63946, #63953 intoit.each/parameterized.expand. Added a previously uncoveredposthog-mode case where the introspect response has noteamfield.Deferred / not addressed
caller_idheader) — the per-caller binding still rides a client header; a proper fix needs an authenticated credential (HMAC claim or trusted-proxy header strip). Held for a deliberate redesign rather than a one-shot patch.http_requestsecret exfil host binding) — needs a per-secret host allowlist authored in the spec, which ripples through the Django JSON-schema mirror and OpenAPI/Orval codegen. Tracked separately.jtireplay cache) — fix(agent-platform): add jti + replay guard to internal JWTs #63952 closed by the original author as not worth the per-pod cache complexity over the existing 60s TTL. Accepted risk; revisit if cross-pod replay becomes a concern.Validation
All six node service packages typecheck clean; full local agent-service suites pass (1106 unit/e2e + real-inference 14/14 across Anthropic + OpenAI); backend spec-schema tests pass.
🤖 Generated with Claude Code
Note
Canonical Agent Platform feature branch consolidating 17+ PRs into a single integration branch. Introduces agent-ingress, approvals, sandbox, runner, janitor, MCP stream handler, chat handlers, and the agent-console — plus the supporting Django backend and shared Node service packages. Includes security hardening (resource limit clamping, approval auth gate broadening, discriminated unions), migration serialization via advisory locks, Slack BYO trigger with mention_only/auto_resume/ack_reaction, freeze-time tool compilation with AST validation, typed bundle authoring API, failure notifier, and concierge skill updates.
Written by Mendral for commit 5ef5057.