Skip to content

feat(agent-platform): core Agent Platform feature branch (all agent platform services)#63988

Merged
dmarticus merged 577 commits into
masterfrom
agent-platform
Jun 17, 2026
Merged

feat(agent-platform): core Agent Platform feature branch (all agent platform services)#63988
dmarticus merged 577 commits into
masterfrom
agent-platform

Conversation

@benjackwhite

@benjackwhite benjackwhite commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

The long-lived feature branch for the Agent Platform — the shared home for all agent platform services (agent-ingress, approvals, sandbox, real-inference, MCP stream handler, chat handlers, and the supporting Django + Node service packages). Targets master; service work lands here, and the branch is kept current with master via periodic merges.

Previously stood up as a throwaway integration branch consolidating in-flight PRs into a single base. Now promoted to the canonical Agent Platform feature branch — this is where all agent platform service work converges before shipping to master.

Scope

Consolidated the original in-flight agent-platform PRs (each squashed to one commit):
#62774 · #62901 · #63903 · #63908 · #63921 · #63922 · #63929 · #63930 · #63936 · #63941 · #63943 · #63946 · #63948 · #63949 · #63950 · #63953 · #63954

Plus a fix to the pre-existing real-inference @posthog/query auth gap (also up as #64002). Excludes #62772 (agent-platform-coding).

Bot-feedback follow-ups

Additional commits address open veria-ai and greptile findings raised across the rolled-up PRs — landed directly on agent-platform rather than re-cycling each individual PR.

Security / correctness:

  • V1 clamp sandbox resource limits — max_memory_mb ≤ 16384 MiB, max_cpu_cores ≤ 8 — in both the Django spec schema and the shared Zod SpecLimitsSchema. Closes the unbounded-limits DoS surface.
  • V3 broaden the allow_agent_approver: False gate on /approvals/<id>/decide from "reject Personal API keys" to "allowlist SessionAuthentication". The PAT-specific check left OAuth bearers with agents:write as a bypass path. From fix(agent-platform): reject PATs on approval-decide when agent approver disallowed #63953.
  • G4 drop the now-dead application_id cross-checks in approvals_retrieve and approvals_decidegetForApplication already 404s upstream when the URL's application id doesn't own the approval. Updates the matching test to mock the janitor 404 (the real production path). From fix(agent-platform): add tenant-scoped reads for approval + revision stores #63946.

Architecture cleanup:

  • G1 migrate mcpStreamHandler to the defineRoute pattern the chat handlers use, dropping the bespoke safeParse(req.query) + invalid_query error path.
  • G2 extend the agent-ingress-scoped-session-fetch semgrep rule with a second pattern (queue.get($ID)) to close the destructuring escape hatch.
  • G3 discriminate DecideElevationResult on the decision field — aclEntry required on the grant arm, absent on decline. Removes the result.aclEntry! non-null assertion in applyElevationGrant. From fix(agent-platform): harden slack signature ts + make elevation decisions atomic #63941.

Test-style sweep (CLAUDE.md's "prefer parameterized tests"): collapsed individual it-per-case blocks across #63922, #63929, #63930, #63936, #63941, #63943, #63946, #63953 into it.each / parameterized.expand. Added a previously uncovered posthog-mode case where the introspect response has no team field.

Deferred / not addressed

  • V2 (fix(agent-platform): add per-caller binding to shared_secret principal #63930 spoofable shared-secret caller_id header) — the per-caller binding still rides a client header; a proper fix needs an authenticated credential (HMAC claim or trusted-proxy header strip). Held for a deliberate redesign rather than a one-shot patch.
  • F6 (http_request secret exfil host binding) — needs a per-secret host allowlist authored in the spec, which ripples through the Django JSON-schema mirror and OpenAPI/Orval codegen. Tracked separately.
  • F15 (internal JWT jti replay cache) — fix(agent-platform): add jti + replay guard to internal JWTs #63952 closed by the original author as not worth the per-pod cache complexity over the existing 60s TTL. Accepted risk; revisit if cross-pod replay becomes a concern.

Validation

All six node service packages typecheck clean; full local agent-service suites pass (1106 unit/e2e + real-inference 14/14 across Anthropic + OpenAI); backend spec-schema tests pass.

🤖 Generated with Claude Code


Note

Canonical Agent Platform feature branch consolidating 17+ PRs into a single integration branch. Introduces agent-ingress, approvals, sandbox, runner, janitor, MCP stream handler, chat handlers, and the agent-console — plus the supporting Django backend and shared Node service packages. Includes security hardening (resource limit clamping, approval auth gate broadening, discriminated unions), migration serialization via advisory locks, Slack BYO trigger with mention_only/auto_resume/ack_reaction, freeze-time tool compilation with AST validation, typed bundle authoring API, failure notifier, and concierge skill updates.

Written by Mendral for commit 5ef5057.

mendral-app Bot and others added 30 commits June 8, 2026 13:58
The build command `--filter '@posthog/quill-*'` matches all quill
packages including quill-charts, but the install step only installs
transitive deps of agent-console (which doesn't depend on quill-charts).
This causes tsc/module resolution failures.

Replace the glob filter with explicit package names (tokens, primitives,
components, blocks, quill) to exclude quill-charts from both the
Dockerfile and CI workflow build steps.
Same effect as the previous commit but lets pnpm derive the build set
from quill's workspace dep graph. `--filter '@posthog/quill...'`
selects `@posthog/quill` plus every workspace package it depends on
(tokens, primitives, components, blocks today; whatever it depends
on tomorrow). quill-charts is excluded naturally because quill
doesn't depend on it — same outcome as the explicit allowlist, but
new quill subpackages don't silently fail to build the next time one
gets added.

Only changes the four agent-console contexts. Left untouched:

  - packages/quill/package.json — workspace root build, legitimately
    builds every member when invoked from inside the quill workspace.
  - services/mcp/package.json — mcp depends on `@posthog/quill-charts`
    directly so its install path includes charts' deps. Building
    charts there is correct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… allowlist

Substantial batch covering the agent platform's authoring + Slack flows
plus the agent-console's session + access UX.

Slack BYO:
- Per-app SLACK_BOT_TOKEN + SLACK_SIGNING_SECRET via TRIGGER_REQUIRED_SECRETS
- Native @posthog/slack-* tools read from ctx.secret() (no team integration)
- Django serializer surfaces computed slack_events_url / slack_interactivity_url
- AGENT_INGRESS_PUBLIC_URL Django setting + ingress boot log echoes it
- bin/agent-tunnel: cloudflared wrapper that writes the URL into .env.local
  and removes it on exit
- bin/mprocs.yaml: agent-ingress re-sources .env.local on each restart

Concierge native writes:
- 13 new native @posthog/agent-applications-* write tools
  (create, partial-update, revisions-{create,new-draft,partial-update,
  file-update,validate,freeze,promote,archive}-create, env-keys-list/get,
  set-env-create) so the concierge no longer depends on the MCP transport
  for authoring
- Promoted concierge bundle drops the MCP block, uses natives
- Updated agent.md, authoring-new-agents, secrets-and-integrations,
  setting-up-slack-app skills with worked spec example, focus_* slug
  requirement, promote-before-URL ordering for Slack

Console:
- Per-browser session history menu in DockHeader (localStorage backed,
  20-entry FIFO, terminal entries kept for read-only playback)
- runner.switchToSession() to attach to a past thread without unmount
- AGENT_CONSOLE_ALLOWED_TEAM_IDS env gates OAuth callback + every
  /api/auth/me refresh; defaults to "1,2"

Branch unwedge:
- posthog/jwt.py: restored get_oidc_verification_keys + signing-key
  helpers that a bad merge dropped
- posthog/api/__init__.py: sdk_doctor -> sdk_health rename

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ack_reaction

Three additions to the slack trigger config that let agents handle channels
the way real Slack bots do — without re-@-mentioning every turn, and with
instant feedback that the bot saw the message.

mention_only (now enforced):
  Drop plain message events; only app_mention events seed sessions. Default
  false to preserve back-compat for bots that subscribe to message.channels
  by design.

auto_resume_threads:
  Relaxes mention_only for replies in threads the bot already owns. When a
  message event comes in with a thread_ts matching an existing session's
  external_key, the trigger accepts it and continues the conversation.
  Sessions seeded this way carry `mention: false` in the [slack] envelope so
  the model can judge whether the message is actually addressed to it before
  responding. Defense in depth: lookup is a single indexed PG read,
  performed only when mention_only would have dropped.

ack_reaction:
  Fire-and-forget reactions.add posted from the ingress on accept, using the
  agent's SLACK_BOT_TOKEN. Authored as an emoji name (e.g. "eyes"); the ack
  lands in Slack within the 3s event-ack window so users see "I saw it"
  even when the runner takes a moment to produce a real turn. Fails open:
  missing token, revoked auth, slack.com 5xx, already_reacted — all become
  silent no-ops; the session still enqueues.

Wired:
- Schema: services/agent-shared/src/spec/spec.ts +
  products/agent_platform/backend/spec_schema.py (Django mirror)
- Handler: services/agent-ingress/src/triggers/slack.ts — gate runs before
  identity resolution so dropped events don't pay for the AgentUser write
- Harness: pass opts.http to buildApp so tests can intercept reactions.add
- 8 new e2e cases in slack-trigger.test.ts (mention_only=false back-compat,
  mention_only=true drop, auto_resume accept-owned, auto_resume drop-unowned,
  ack_reaction fires, ack_reaction unset no-op, slack 500 fails open,
  no SLACK_BOT_TOKEN no-crash)

Concierge bundle tweaks for clearer set_secret guidance picked up alongside.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… racing creates

Two agent-platform pods (runner + janitor, plus replicas) call
`migrate()` on boot concurrently. The bundled node-pg-migrate's
`ensureMigrationsTable` issues a plain `CREATE TABLE` rather than
`CREATE TABLE IF NOT EXISTS`, so when two processes race past the
existence check both try to create the table — the loser crashes:

  Error: Unable to ensure migrations table:
    error: relation "pgmigrations" already exists

Wrap migrate() with a pg advisory lock + idempotent pre-create:

  1. `pg_advisory_lock(MIGRATE_ADVISORY_LOCK)` — serializes the whole
     migrate() across every process touching this database. Losers
     wait, winners proceed; after the winner releases, the loser sees
     a fully-migrated schema and runner() is a no-op.
  2. `CREATE TABLE IF NOT EXISTS public.pgmigrations (...)` —
     pre-creates the migrations table before runner() runs. Removes
     node-pg-migrate's race window entirely.
  3. Belt-and-braces: catch the duplicate-table error (42P07) in case
     the bundled `ensureMigrationsTable` still trips on its own
     existence check despite our pre-create.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the three optional slack-trigger config fields (mention_only,
auto_resume_threads, ack_reaction) to the concierge's vocabulary so
"can we make it respond with an emoji" routes to the slack-app skill
instead of falling through to a generic answer.

- agent.md: paragraph in 'Trigger-required secrets' that names all
  three fields and points at the slack-app skill. Always-loaded
  preamble, so the pointer is visible regardless of which skill the
  model loads.
- skills/setting-up-slack-app.md: 'Tuning the slack trigger' section
  that walks picking between the three patterns (mention-only,
  mention+thread, react-to-everything), wires the JSON snippet, and
  warns about the no-op pairings. Failure-mode table extended with
  three new gates.
- spec.json skill description (applied live, not in template):
  broadened with explicit keyword triggers ('respond with an emoji',
  'only when I @-mention', 'reply in the thread') so the load gate
  fires for behavior questions, not just for 'wire it up' authoring.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Today's GET /preview-token/ returns just token / expires_in /
ingress_slug. Callers (agents, scripts, dev tools) then have to grep
agent-ingress source to figure out:
  - the ingress base URL
  - which paths each trigger exposes
  - that the preview-token gates revision routing but spec.auth.modes
    still needs satisfying alongside

That source-archaeology now lives in the response. Three new fields,
all derived from the revision's own spec:

  endpoints: { trigger_type: { route_name: absolute_url } }
    Only triggers the spec declares; no phantom URLs. Empty when
    AGENT_INGRESS_PUBLIC_URL is unset (so the caller can detect the
    unconfigured-deployment state explicitly).

  auth: { preview_token_header, preview_token_query, spec_modes, notes }
    spec_modes mirrors the spec's auth.modes order so the caller picks
    the first one its credential satisfies. notes explicitly calls out
    the live-vs-preview gate split.

  preview_proxy: { base, allowed_paths, notes }
    Same-origin Django proxy. notes call out the auth-stripping limit
    so the caller doesn't try it for an oauth-required agent.

Helpers + per-trigger route catalogue live in api.py (mirrors the
`routes` arrays in each trigger handler). Backwards-compatible — the
original three fields are unchanged. hogli build:openapi regenerated
the TS types and the MCP tool defs so downstream consumers see the
new shape.

6 new tests in test_preview_token.py cover: only declared triggers in
endpoints, mode order preserved, proxy URL rooted in correct
team/slug, empty endpoints when public URL unset, live-revision
rejected, missing revision_id rejected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Every decision point gets a log line so a "why no emoji?" question is one
grep away from the answer:

  slack_event_received           debug  per inbound event with all config flags
  slack_event_dropped_mention_only       info   mention_only gate dropped a non-mention
  slack_event_dropped_no_owned_thread    info   thread_ts didn't match an owned session
  ack_reaction_not_configured            debug  trigger has no ack_reaction set
  ack_reaction_no_bot_token              warn   SLACK_BOT_TOKEN missing from encrypted_env
  ack_reaction_no_http_client            warn   ingress not wired with an HttpFetcher
  ack_reaction_posting                   debug  about to fire reactions.add
  ack_reaction_ok                        info   slack returned ok
  ack_reaction_already_reacted           debug  slack retry — normal, not an error
  ack_reaction_failed                    warn   slack returned 5xx OR { ok: false, error }
  ack_reaction_threw                     warn   transport / fetch threw

Failure-mode separation: HTTP failure (5xx, network), application failure
(slack returned 200 + { ok: false, error: ... }), and `already_reacted`
(Slack's retry produced the same event twice; the surrounding idempotency
key dedupes the enqueue but not this fire-and-forget call). All three
were silently swallowed before — now traceable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… is set

Previous version listed reactions:write only as conditional on the agent
using @posthog/slack-react. Now ack_reaction on the slack trigger ALSO
needs it — when the scope is missing the Slack API returns missing_scope
and the ingress logs ack_reaction_failed but the session still enqueues
(fail-open), so the user sees no in-Slack feedback and no obvious error.

Added:
- Step 1.3 (scopes): reactions:write callout covers both slack-react AND
  ack_reaction; instructs the concierge to inspect spec.tools[] AND
  spec.triggers[].config.ack_reaction before listing scopes; remediation
  steps for the "added ack_reaction after install" case (Slack requires
  a re-install for new scopes to mint a token that carries them)
- Common failure modes row for ack_reaction_failed / missing_scope with
  the OAuth-page-then-reinstall recipe

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hans

The most common authoring foot-gun (especially for AI authors): write
tools/<id>/source.ts + schema.json into the bundle but skip adding the
{kind: "custom", id, path} ref in spec.tools[]. The runner only loads what's
in spec.tools[], so the model never sees the tool — it tries to call it,
fails, and reports "the tool isn't available right now" to the user. We
watched it happen twice in real concierge sessions before catching it.

ValidationReport now carries `warnings: ValidationWarning[]` alongside
`errors`. Warnings don't block freeze; freeze still gates on
`errors.length === 0`. Two codes today:

  orphan_custom_tool_dir
    bundle has tools/<id>/schema.json but no spec.tools[] entry with
    that path. Concierge response is almost always: patch spec to add
    the ref, re-validate. If genuinely WIP, delete the dir before freeze.

  orphan_skill_file
    bundle has skills/<id>/SKILL.md or skills/<foo>.md but no
    spec.skills[] entry references it. Same shape — either wire it in or
    drop the file.

Why warnings, not errors:
  - Authors sometimes ship tool source they're iterating on but don't
    want exposed yet. Hard error would block.
  - Keeps the freeze gate clean (errors.length === 0) without two
    different blocking semantics.

Concierge authoring-new-agents skill updated with a Phase 6 table that
maps each warning code to the right remediation, plus a "don't freeze
through warnings silently — ask the user" reminder.

Tests:
- 7 new cases in validate-spec.test.ts covering both warning codes plus
  the source.ts-without-schema.json case (which should NOT warn — mirrors
  the runner's schema-driven load semantics) and the warnings-coexist-
  with-errors case
- All 26 validate-spec tests pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hema

Runtime services no longer call `migrate()` on boot. Migrations are
applied exactly once per chart sync by a one-shot k8s Job running as
the `agent-migrator` Aurora role (provisioned in the companion
cloud-infra PR). The Job is wired up in the companion charts PR.

Removes today's failure cascade:
  - N pods racing past `ensureMigrationsTable` and tripping
    `relation "pgmigrations" already exists` (advisory-lock fix
    landed earlier today; this removes the root cause)
  - "permission denied for schema public" / "permission denied for
    table agent_session" — runtime roles no longer need DDL since
    they don't run migrations

Changes:
  - services/agent-{runner,janitor}/src/index.ts: drop the
    `await migrate({ databaseUrl: config.agentDbUrl })` call and the
    `import { migrate } from '@posthog/agent-migrations'`.
  - services/agent-{runner,janitor}/package.json: move
    `@posthog/agent-migrations` from `dependencies` to
    `devDependencies` (still needed by tests for the reset/migrate
    harness via `services/agent-{janitor,runner}/src/*.test.ts`).
  - pnpm-lock.yaml: regen.

The migrate.mjs bundle entry in services/agents/scripts/build.ts is
already wired (the image's been shipping it; chart Job consumes it
via `node services/agents/dist/migrate.mjs up`).

Local dev: still works, since the local-dev Postgres uses the
superuser `posthog` role. Anyone running an agent service against
a fresh Postgres without pre-running migrations now needs
`pnpm --filter @posthog/agent-migrations migrate` first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Authors now write tools/<id>/source.ts + schema.json only — the janitor
runs esbuild at freeze to produce tools/<id>/compiled.js inside the
bundle. Validate stays pure: it parses source.ts via esbuild to catch
syntax errors early without writing anything. Freeze aborts if any
custom tool fails to compile.

Drops the prior compiled.js authoring step from the concierge's
mental model — hand-compiling TS by stripping annotations was both
fragile and now collides with the freeze-time build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er isDev()

`bin/start` running the runner against the local agent-sandbox-host
image (built per services/agent-sandbox-host/README.md) needed
SANDBOX_HOST_IMAGE set explicitly otherwise the schema rejected the
unset value. Default it to `posthog/agent-sandbox-host:dev` under
isDev() so the local dev loop works without configuration; prod
still has to set it explicitly.

Tests cover dev default, prod must-be-set behavior, and explicit
override precedence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nd-written compiled.js

Two enforcement layers on top of the compile-at-freeze step so a
syntactically-valid but runtime-incompatible custom tool can't reach
a live agent and dispatch with action_not_found mid-conversation:

1. Compile step now vm-evaluates the esbuild output and confirms
   `module.exports.default ?? module.exports` is `{ actions: { default: fn } }`
   — the shape the runner's sandbox loader requires. Bare-function
   exports, missing `actions` map, and wrong action keys all fail
   freeze with a specific message instead of going live.
2. PUT /revisions/:id/file and PUT /revisions/:id/bundle now reject
   any path matching `tools/<id>/compiled.js` with 422
   compiled_js_is_generated. Previously hand-written compiled.js was
   silently overwritten on the next freeze, which read to the model
   as "my edits aren't landing" and produced multi-turn debugging
   spirals.

Concierge authoring skill updated with the canonical source.ts
template + a table of look-alike export shapes that fail, so the
model doesn't have to discover the runtime contract by trial.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A session that crashes before the agent can post (sandbox acquire, MCP
open, secret resolve, in-loop model_error) used to leave its
originating Slack thread silent — the runner marked the row failed but
never reached back out. Sessions like 99386c19-cfd6 (docker-image
fallback case) sat untouched in PG with no user-visible signal.

Wires a small FailureNotifier interface in agent-shared with a
SlackFailureNotifier impl. The slack trigger now stamps
`trigger_metadata: { type: 'slack', workspace_id, channel, ts,
thread_ts }` at enqueue; the worker's pre-runSession catch block reads
it and posts a sanitized message back to the thread after the queue
row is marked failed. Raw reasons go through `categorize()` →
`userFacingMessage()` so docker / MCP / Kafka detail never leaks into
a customer channel — raw text still lands in log_entries +
conversation.errorMessage for owner-facing debug.

Lifts SlackSigningSecretResolver from agent-ingress → agent-shared
(both services now resolve the bot token through the same shared
EncryptedEnvSlackSecretResolver class).

Design sketch: /Users/benwhite/.claude/plans/agent-failure-notifier.md.

Deferred: the symmetric driver.ts emitFailure() hook for in-loop
failures, the ⚠️ react-on-progress fork, the janitor reaper for
runner-crash gaps, and the public session URL link. Steel thread
first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ts list

The right-hand "Live now" column wasn't earning its space. Drops it in
favour of richer per-agent rows that show live · 24h · failed (+rate) ·
spend · last run inline, fanned out from the existing per-application
stats endpoint. Layout flex-wraps so the stat group drops below the
name/description on narrow widths instead of collapsing into itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the generic file-grain bundle store (PUT /file?path=X, PUT /bundle
with mode) with typed resource endpoints:

  PUT  /revisions/:id/agent_md          { content }
  PUT  /revisions/:id/spec              { spec }              # author-facing slice
  PUT  /revisions/:id/skills/:id        { description, body, files? }
  DELETE /revisions/:id/skills/:id
  PUT  /revisions/:id/tools/:id         { description, args_schema, source }
  DELETE /revisions/:id/tools/:id
  GET  /revisions/:id/bundle            -> { agent_md, skills, tools, spec }
  PUT  /revisions/:id/bundle            full replace of the typed shape

Authors no longer write file paths. spec.skills[] and spec.tools[] (custom)
are server-derived at freeze from the typed resources in the bundle, so
orphan files, dangling refs, and rename-without-spec-patch are
structurally impossible.

Tool upload runs AST shape check + esbuild compile synchronously inside
PUT /tools/:id. Required source shape:

    export default { actions: { default: async (args, ctx) => { ... } } }

Bad shapes (bare functions, missing actions, wrong key, dynamic factory
exports) return 422 tool_compile_failed with structured diagnostics
before any S3 writes -- no runtime dispatch failure.

Performance fixes shipped alongside the API:
- clone_from now parallelises bundle.copy() (was sequential; 15-file bundles
  used to time out Django's 30s read timeout mid-clone, leaving half-
  written drafts).
- The freeze pipeline calls bundle.list() exactly once and threads the
  result through deriveAndPersistSpec + bundles.freeze(precomputedEntries),
  eliminating ~50 redundant S3 HEADs per freeze.
- The freeze endpoint is now idempotent: if .frozen is already on disk
  (Django proxy timed out mid-call), it re-derives the sha from the
  existing manifest and returns it so the caller can stamp the row and
  recover from the inconsistent state.

Test coverage:
- 28-case e2e suite at services/agent-tests/src/cases/typed-bundle-authoring.test.ts
- 16-case AST shape unit suite at services/agent-janitor/src/compile-custom-tools.test.ts
- 3 perf regression tests in server.test.ts using a Proxy-wrapped bundle
  store that counts bundle.list() calls and tracks peak bundle.copy()
  concurrency -- pins parallel-copy + cached-list invariants

Companion changes:
- legacy /file + /bundle (with mode) endpoints removed from janitor + Django
- legacy file-update / file-retrieve native tools replaced with typed
  equivalents
- agent-console flattens the typed bundle response into BundleFile[]
- validate-spec drops orphan_skill_file / orphan_custom_tool_dir warnings
- sandbox-inprocess drops the legacy { run: fn } wrapper shape
- concierge spec rewritten to use the new typed native tools

See docs/agent-platform/plans/typed-bundle-authoring-api.md for design,
BUILD_NOTES.md for issues encountered + decisions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Bump agent-console + agent-chat to storybook ^10.2.17 (resolved 10.4.1); drop storybook-dark-mode and @storybook/theming in favour of v10's built-in globalTypes toggle in preview.tsx.
- Pin framework to the workspace-resolved @storybook/react-vite path in main.ts so pnpm doesn't pick up sibling v8 installs (common/storybook, quill/apps/storybook). Re-derive __dirname from import.meta.url for ESM main load. Force esbuild.jsx='automatic' so component files don't need `import React`.
- Add a tiny reactive router store (`router-store.ts`) backing the next/navigation + next/link mocks. router.push / <Link> clicks soft-nav by updating the store; usePathname/useSearchParams/useParams read from it via useSyncExternalStore.
- AppShell story now hosts a <StoryRoutes> switch that resolves the current path to the right page (agents list, agent detail w/ tab segment, registry, billing). Three navigable entry stories (NavigableShell, StartOnConfiguration, StartOnSessions) + the legacy single-page stories kept for quick visual review.
- focus_* client tools the dock invokes use router.push already, so they soft-nav in the story for free.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Config panel (used by RevisionsBrowser's Config card):
- Flat divider-separated sections instead of nested rounded cards-in-cards. Each section is one `space-y-2 px-3 py-3` band; the outer Config card hosts them via a `divide-y` ConfigPanel root, so the dividers hug section edges and double-padding is gone.
- Tools sub-grouped by kind (native / client-fulfilled / custom / custom_template) as inline collapsible groups with card grids. Native and client tool cards now open the existing detail dialog; custom tools keep linking into the bundle viewer.
- MCPs expose their curated sub-tool list inline with approval-required markers — previously hidden behind the URL row.
- Per-section info toggle on the right of the header (InfoIcon). Open state is `bg-primary text-primary-foreground` and the panel below uses `bg-primary/10` + `border-l-2 border-primary` so the icon and its panel read as one unit. Default copy describes what each section means; tool-group info copy explains the runtime difference between native / client / custom.
- Skills render as a 2-col card grid with truncated description + title tooltip — handles the concierge's 13 skills without becoming a wall of prose.
- One global filter input lifted into the Config card's header (passes through ConfigPanel as a controlled `filter` prop). Filters tools, MCP sub-tools, and skills in place; per-group counts show `n/total` while filter is active.
- Model section restored at the top — chip + reasoning level.
- Replaced the `structured | raw` segmented control with a single `<CodeIcon /> RAW` toggle on the right of the header. Inactive: outlined; active: solid `bg-primary` button with shadow so the mode is unmistakable.
- `highlightedSection` highlight is now a 2px primary left-accent + soft `bg-info/5` row instead of a colored ring around a card — keeps focus_spec_section legible in the flat layout.
- `UnstructuredFields` flattened to match — no more inner rounded card; renders as a `border-t` band continuing the section rhythm.
- Drop the standalone ConfigPanel.stories.tsx; the navigable AppShell stories (StartOnConfiguration) are now the review surface.

BundleTree:
- Markdown viewer supports inline emphasis (bold/italic), links, blockquotes, ordered lists, and horizontal rules in addition to the existing heading/paragraph/ul/code blocks.
- New regex-based TypeScript / JavaScript highlighter for `.ts` / `.tsx` / `.js` / `.jsx` files (and for fenced code inside markdown). Tokens: keywords, strings, comments, numbers, function-call identifiers, PascalCase types. No new deps.
- `compiled.js` and other `.js` files now resolve to the same highlighter via `languageForPath()`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tamps it

Django's freeze view wrapped the janitor HTTP call in transaction.atomic(),
holding the agent_revision row lock. The janitor's deriveAndPersistSpec then
tried to UPDATE the same row from a different connection and blocked for the
full Django proxy timeout (~120s, instrumented + confirmed). Concierge freezes
hit this every time.

Fix: janitor no longer writes agent_revision.spec. deriveSpec computes the
derived spec (skills + custom tools from the typed bundle) and returns it in
the freeze response. Django stamps state + sha + spec in a single save(),
no atomic block needed — the janitor's idempotent freeze covers the partial-
failure recovery path that atomic used to.

Companion changes:
- New instrument({ key, log, context }, fn) helper in agent-shared/runtime
  mirroring nodejs/src/common/tracing/tracing-utils.ts:instrumentFn, minus
  Prometheus + OTEL deps. Replaces inline Date.now() timing markers in the
  freeze + clone_from pipeline.
- Concierge gains a `choosing-the-model` skill that walks the user through
  the cost/quality tradeoff before setting spec.model, recommends per job
  category, and waits for an explicit pick instead of defaulting.
- Idempotent freeze path also returns the derived spec so the recovery
  caller can stamp it on a retry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`set_secret` (and any future render-style client tool whose UI needs
unbounded user time) now parks the session instead of awaiting on a
5s/60s in-process bus timeout. Mirrors the approval-gated tool pattern:
synthetic queued envelope from `execute` → loop unwinds → worker hands
the session back to the queue → user submits via `/send` → marker in
pending_inputs → runner's resume scanner injects a wake message → model
sees the real outcome on a fresh turn.

What changed:

- Spec: new `interactive: boolean` field on `kind: "client"` tools
  (services/agent-shared/src/spec/spec.ts + products/agent_platform/backend/spec_schema.py).
  When true, the runner skips `dispatchClientTool` and returns a
  `{queued, interactive, call_id, tool_id, message}` envelope from
  `execute`. timeout_ms cap raised to 600s for the non-interactive path.
- Marker: new `__POSTHOG_CLIENT_TOOL_RESULT__:<json>` shape parallel to
  the approval marker, lives in agent-shared/runtime so both ingress and
  runner can use it.
- Ingress `/send`: ChatSendBodySchema accepts either `{message}` or
  `{client_tool_result: {call_id, result | error}}`. The latter writes
  the marker into pending_inputs + re-queues — no new endpoint.
- Runner: `getSteeringMessages` scans pending_inputs for client-tool
  markers (before the approval-marker check), synthesises a wake user
  message carrying the real outcome envelope, and emits a
  `client_tool_result` SSE event so live consumers see the closure.
- Frontend: render-style resolves now POST `/send` via the new
  `sendClientToolResult` helper. Reducer recognises the queued envelope
  on `tool_result` and flips the part to `fulfillment: 'client'` without
  setting `result` — keeps SecretInline mounted. New `client_tool_result`
  case finalises the result across any assistant turn. Conversation
  reconstruction does the same on reload (`apiClient.conversationToTurns`).
- Concierge: `set_secret` marked `interactive: true`, description +
  `secrets-and-integrations` skill updated to teach the park+wake loop.
  Seed script updated for the new typed-bundle authoring API.

E2e: `services/agent-tests/src/cases/interactive-client-tool.test.ts`
covers the happy path (park → /send result → wake → model resumes) and
the error variant (`/send` with `error` → wake envelope carries ok:false).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ConciergeDock used to render <FixtureConciergeDock /> for both
"resolving" and "404 / not-deployed" states. A user who typed before
`getAgent(slug)` resolved hit the fake runner and got back the fixture
fallback ("Got it. In the real build I'd take the next step — for the
v0 mock I only have a handful of scripted responses wired up. Try one
of the suggested prompts at the top of the dock?") even though the
agent was deployed and would have answered in another second.

Split the resolution into three explicit states: `pending` (loading
placeholder, no input), `not_deployed` (genuine 404 → fixture, as
before), `resolved` (real runner). Non-404 errors now stay in
`pending` rather than falling through to the fixture, so a transient
network/auth blip can't surface a mock reply either.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Production code no longer imports `useFakeRunner` or the fixture
scripts (`conciergeScripts` / `fallbackScript` / `waitingSession`):

- `useFakeRunner` moved off the main `@posthog/agent-chat` entry. It's
  re-exported from `@posthog/agent-chat/fixtures` only, so anything
  pulling it in is explicitly on a non-production import path
  (Storybook stories, the console's `mockApi`). Anyone who reaches
  for it from a real path gets a build-time miss.
- `FixtureConciergeDock` deleted from `Dock.tsx`. The dock used to
  fall back to it whenever the concierge slug didn't resolve — both
  in a genuine 404 and in the brief window before resolution — and
  the v0-mock fallback string was reaching real users. ConciergeDock
  now shows a small text stub ("Loading concierge…" / "No concierge
  deployed for `<slug>` in this project.") for the pending and 404
  states. No fake runner, no scripted fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion

`useSetDockConciergeAgent` runs `setConciergeAgent(null)` on layout
unmount, so navigating between two pages that both want the same
concierge transitioned the slug `agent-concierge → null →
agent-concierge`. ConciergeDock reacted to the transient null by
resetting state to `pending` and starting a fresh `getAgent` fetch,
which (a) re-mounted RealConciergeDock + lost in-flight chat state,
and (b) could leave the dock stuck on "Loading concierge…" if the
two fetches raced through their cancel logic.

Fix: ignore transient nulls, cache the resolved teamId:slug in a ref
so the same combo never refetches, and keep the last resolved state on
transient errors instead of dropping to `pending`. The dock now stays
stable across route changes and can't get stranded on the loading
stub.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The concierge header had 7 chrome buttons plus three status pills,
which read as busy at typical dock widths. Consolidated:

- Focus toggle drops its "Focus" label and becomes an icon-only state
  pill — the eye/eye-off + state colour already says enough at a
  glance, freeing horizontal space.
- "Open in session view", "Dock to side / Float panel" and the existing
  "Render markdown" toggle moved into a single settings dropdown
  (gear icon). The standalone open-session and dock-mode toggles are
  gone; the open-session entry only appears when there's an active
  session id.
- "New" stays as a labelled button — it's a primary action and
  deserves the label.
- Session history dropdown, hide-dock chevron (keyboard shortcut'd),
  and the status pills are unchanged.

Net: 7 buttons → 4 buttons + 1 dropdown, without losing any
functionality.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ bug fixes

Header redesign:
- The dock header is now two rows. Row 1 is mode + status pills on the
  left, chrome controls (focus, history, settings, hide) on the right.
  Row 2 is the primary "New" action on the left and the page subject —
  with an optional "on <agent name>" sub-line so the user always knows
  which agent the concierge is reasoning about, even when the page
  title is generic (Documentation, Sessions, etc).
- "New" is now a solid bordered/shadowed button on the far left so the
  primary action stands apart from the secondary chrome.

Focus indicator (new):
- A thin info-coloured bar pinned to the top edge of the viewport
  appears only when focus mode is on AND there is an active concierge
  session id. Communicates "concierge is following you" without
  stealing chrome. Click to pause focus mode.
- Plumbed via a new `activeConciergeSessionId` field on DockStore;
  RealConciergeDock pushes the live session id into it.

ConciergeDock fixes:
- Dropped the resolvedKey ref optimisation that could leave the dock
  stuck on "Loading concierge…" when exiting playground (in StrictMode,
  the ref survived across the remount while the state reset to
  `pending`, so the early-return guard skipped the fetch). Always
  fetch on a non-null slug; the functional setState still avoids the
  pending flash when the same slug is already resolved.
- Errors still leave the previous state untouched instead of falling
  back to pending — no regression on the original navigation-stability
  fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per spec:

Row 1 (top):
- Left: a single mode pill — "Concierge" (success-tinted bg + steady
  green dot) or "Playground" (primary bg + animated dot + the agent
  name appended). Status pills (Draft, Reconnecting) sit alongside.
- Right: config controls only — Focus toggle (concierge), Settings
  menu (gear), Hide dock (chevron).

Row 2 (bottom):
- Left: the session label — first-message snippet from history,
  else "Started Xm ago" relative timestamp, else "New conversation"
  when there's no active session yet.
- Right: History dropdown + New button (or Exit in playground), as
  a regular outline button now that it sits alongside History
  instead of standing alone as the primary action.

`describeContext` import dropped since the mode label is now derived
locally from `context.mode` rather than the package helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A `border-b border-border/60` on the mode/config row (with matching
`border-primary/30` in playground mode so the divider tones with the
playground tint) makes the two sections read as distinct strips
instead of one busy block.
Horizontal padding moved from the container to each row, so the
border between Row 1 and Row 2 runs edge-to-edge instead of being
inset by the container's `px-3`.
Comment thread products/agent_platform/services/agent-tools/src/tools/web-fetch.v1.ts Outdated
Comment thread products/agent_platform/services/agent-shared/package.json Outdated
benjackwhite and others added 2 commits June 17, 2026 11:51
veria-ai review on #63988:
- bump undici 7.8.0 → ^7.24.0 in agent-shared (resolves 7.24.8). 7.8.0 has
  OSV advisories on the tool egress path (request/response smuggling +
  decompression resource-exhaustion) fixed in the 7.24.x line.
- http-request: stream the response body and stop at max_response_bytes,
  cancelling the stream at the cap, instead of res.text()-then-slice — so an
  oversized or highly-compressed response is never fully materialized before
  truncation. Adds a streaming-path test.

Tool consolidation:
- remove @posthog/web-search — never wired in prod (the provider is only set
  in tests), so it always threw "web.search provider not configured".
- remove @posthog/web-fetch — a strict subset of @posthog/http-request
  (GET is the default method). Example specs, docs, and case tests migrated
  to @posthog/http-request.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread products/agent_platform/services/agent-runner/src/loop/mcp-clients.ts Outdated
benjackwhite and others added 3 commits June 17, 2026 12:36
…ience

Rework the posthog auth/tool tenancy model so a single agent (e.g. the
agent builder) can serve a whole org while acting as the calling user:

- Tools take an explicit project_id. The @posthog/* data tools no longer
  derive their operating team from the session principal; each
  project-scoped tool takes a project_id arg, and the agent resolves it
  via the get_context client tool or the new @posthog/list-projects tool.
  Removes posthogUserTeamId from ToolContext.
- Add @posthog/list-projects (minimal id/name/org) for disambiguation.
- posthog auth mode gains audience: 'project' | 'organization' (default
  'project'). The ingress verifier enforces it — project: caller can
  access the agent's owning team (RBAC-aware probe); organization: caller
  is a member of the agent's owning org (resolved via a cached posthog_team
  lookup, since the revision store and Django DBs differ). Failures return
  not_in_project / not_in_org. Opening to any user across orgs is
  intentionally not yet expressible.
- Bind agent-concierge to its owning organization; teach its agent.md to
  resolve the project before project-scoped tools.
- Mirror audience in the Django spec schema (spec_schema.py).
- Tests: verifier audience cases (10/10), e2e auth modes (13/13), Django
  spec schema (38/38), query/_posthog-api updates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
MCP ref url/header `${SECRET}` substitution checked only that the secret
existed, never its `spec.secrets[].allowed_hosts` binding. An author could
point `mcps[].url` at a host they control and set
`headers.Authorization = "Bearer ${SLACK_BOT_TOKEN}"`, exfiltrating an
encrypted-env secret they otherwise can't read.

Mirror `@posthog/http-request`'s final-URL host binding: resolve the final
URL first, reject bare-string/unbound secrets, and substitute URL/header
placeholders only when the final host matches the secret's allowed_hosts.
Wire the lookup from spec in the worker, and add a freeze-time check in the
janitor so bare-string MCP secrets are caught before deploy.

Fail-closed tightening: existing specs referencing a bare-string secret in
an MCP url/header will report that MCP as unavailable until converted to the
object form `{ name, allowed_hosts: [...] }`.

Generated-By: PostHog Code
Task-Id: 2fd40826-ab6a-49b7-9e56-f88bf4cc90c5
@tests-posthog tests-posthog Bot disabled auto-merge June 17, 2026 11:03
tests-posthog Bot and others added 3 commits June 17, 2026 11:04
…udience

Commit ccdbed2 ("per-tool project_id + explicit posthog auth
audience") changed the @posthog/* data tools to take an explicit
project_id arg and added audience to the posthog auth mode, but left
several tests asserting the old principal-derived-team behavior, leaving
the branch red.

- agent-shared: per-trigger auth test now expects the audience: 'project'
  default on the posthog auth mode.
- agent-tests: pass project_id on @posthog/query and
  @posthog/agent-applications-list calls (the harness query echo only
  matches a numeric /api/projects/<id>/query/ path); rework
  posthog-tool-auth to assert the explicit-project contract.

Each affected file passes in isolation; the suites must run serially
(shared real-PG test DB).

Generated-By: PostHog Code
Task-Id: 2fd40826-ab6a-49b7-9e56-f88bf4cc90c5
@tests-posthog

tests-posthog Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Query snapshots: Backend query snapshots updated

Changes: 2 snapshots (2 modified, 0 added, 0 deleted)

What this means:

  • Query snapshots have been automatically updated to match current output
  • These changes reflect modifications to database queries or schema

Next steps:

  • Review the query changes to ensure they're intentional
  • If unexpected, investigate what caused the query to change

Review snapshot changes →

Restore the quill package to master: remove the branch-added `ghost`
button variant (button.tsx + button.css) and the @types/react /
@types/react-dom devDeps added to the blocks and quill package.json,
and resync the lockfile importers. These leaked onto the integration
branch from now-removed agent-console work; the quill package should
track master.

Generated-By: PostHog Code
Task-Id: 2fd40826-ab6a-49b7-9e56-f88bf4cc90c5
Drop the leaked agent-platform explanatory comments (and stray blank
line) from bin/migrate and rust/bin/migrate-entry, restoring both to
master.

Generated-By: PostHog Code
Task-Id: 2fd40826-ab6a-49b7-9e56-f88bf4cc90c5
These frontend files carried changes unrelated to the agent platform —
merge drift / incidental tweaks that leaked onto the integration branch.
Restore them to master:

- lib/api.ts — branch was behind master's tracingSpans pagination
- TaxonomicFilter/headless/AutocompleteInput.tsx — stray RefObject tweak
- integrations/SlackIntegration.stories.tsx — dropped story component
- vite.config.mts — unrelated @marsidev/react-turnstile optimizeDeps hint

Kept the agent-platform-essential frontend changes: the AGENT_PLATFORM
feature flag, the agent_approvals API scope (scopes.tsx + types.ts),
personalAPIKeysLogic flag gate, and the jest ignore for agent_platform
node services.

Generated-By: PostHog Code
Task-Id: 2fd40826-ab6a-49b7-9e56-f88bf4cc90c5
)
if (allowed) {
log('info', 'tool.dispatch.per_asker_authorised', { tool: id })
return real(toolCallId, (args ?? {}) as Record<string, unknown>)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High: Approval bypass for session-principal tools

When a gated tool’s policy includes session_principal, this branch runs the real tool immediately if the latest user sender matches the session principal. An attacker who can influence content the agent reads can steer the model into calling a gated destructive tool, and it will execute as the user without the separate approval UI or explicit decision step; this is especially risky for the agent-management tools that can promote or archive revisions.

Use session_principal to scope who is allowed to decide the approval, but still queue the approval and require an out-of-band decision before real(...) runs. If a fast path is required, it should be tied to a fresh explicit UI confirmation token, not just the model emitting the tool call after a matching user message.

# Conflicts:
#	frontend/jest.config.ts
#	pnpm-lock.yaml

@mendral-app mendral-app Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This is a well-engineered feature branch. Security controls are solid: preview tokens are scoped with short TTL + audience claims, secrets use nonce-based indirection with session-scoped lifetime, approval authorization properly gates on SessionAuthentication or explicit agent_approvals:write scope, and sandbox resources are hard-capped via Zod schema. Race condition handling uses FOR UPDATE SKIP LOCKED for queue claims and transactional re-reads for idempotent elevation decisions. The defineRoute migration centralizes validation correctly.

CI Failure: E2E Hobby CI (not caused by this PR)

The "Wait for Docker image build" job timed out because the Container Images CI workflow never ran for this commit — the E2E Hobby CI waited 1 hour polling for a check that was never created. This matches a known pattern of Hobby CI failures unrelated to PR changes.

Tag @mendral-app with feedback or questions. View session

# put a human in the loop at decide time can request it via consent.
authenticator = request.successful_authenticator
is_session = isinstance(authenticator, SessionAuthentication)
is_oauth_with_decide_scope = isinstance(authenticator, OAuthAccessTokenAuthentication) and (

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: OAuth tokens can bypass human-only approvals

This treats any OAuth access token with agent_approvals:write as equivalent to an interactive human decision. A third-party OAuth client that gets a team admin to consent once can later approve or reject queued tool requests without a live user action, including requests whose spec set allow_agent_approver: false; restrict this path to a trusted first-party client or require a per-decision proof from the interactive app.

@dmarticus dmarticus merged commit d436e80 into master Jun 17, 2026
285 of 289 checks passed
@dmarticus dmarticus deleted the agent-platform branch June 17, 2026 15:37
@deployment-status-posthog

deployment-status-posthog Bot commented Jun 17, 2026

Copy link
Copy Markdown

Deploy status

Environment Status Deployed At Workflow
dev ✅ Deployed 2026-06-17 16:36 UTC Run
prod-us ✅ Deployed 2026-06-17 16:50 UTC Run
prod-eu ✅ Deployed 2026-06-17 16:55 UTC Run

dmarticus added a commit that referenced this pull request Jun 17, 2026
Rebuilt onto current master after the agent-platform feature branch
(#63988) merged — all underlying agent_platform/services restructuring
is already in master; this commit isolates the remote-authoring deltas.

Bundles the prior PR commits (see PR description for details):
- expose authoring playbooks over MCP (agent-resolve-resource tool,
  scope-aware tool surface, MCP resources at posthog://agent-platform/
  playbooks/<id>)
- serializer fidelity for first-revision authoring (bundle_uri allow_blank
  + server-side fs://<slug>/ fill, preview-proxy request serializer)
- round-trip provider-safe tool names so @posthog/meta-end-turn dispatches
  on the first try
- migrate example specs to canonical auth.modes[]
- warn on provider-safe-name collisions
- gate preview-proxy run/send/cancel behind agents:write

Co-Authored-By: Danilo Campos <danilo@posthog.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants