Skip to content

Embed agent terminal UI via xterm.js + container PTY proxy #450

@jrf0110

Description

@jrf0110

Overview

Embed a live terminal view of agent sessions into the Gastown dashboard using xterm.js connected to the container's existing PTY infrastructure. This gives users direct visibility into what agents are doing — the same interface a local user sees — and optionally lets them interact with agents (send follow-ups, answer questions, abort).

Parent: #204

Background

The container runs agents via createOpencode() (the @kilocode/sdk). Each in-process kilo serve instance already exposes /pty/* routes on its internal port via bun-pty — full PTY infrastructure exists but is unused because agents are currently headless. No one connects to it.

The Kilo desktop app uses ghostty-web (WASM terminal) connected via WebSocket to these same PTY endpoints. The cloud needs the same connection, just plumbed through the Cloudflare Worker → container boundary.

Architecture

Browser (xterm.js React component)
  ↕ WebSocket (raw PTY bytes)
Gastown Worker (WS upgrade + proxy)
  ↕ containerFetch / WS proxy
TownContainerDO
  ↕ WebSocket proxy to container port 8080
Container control-server (:8080)
  ↕ Internal WS proxy (new routes)
kilo serve (:4096+) — already has /pty/* routes
  ↕ bun-pty
Shell in agent worktree

Implementation

1. Container control-server: PTY proxy routes (~50 lines)

Add routes to proxy PTY operations to the internal kilo serve instances:

// POST /agents/:agentId/pty — create a PTY session
// GET  /agents/:agentId/pty — list PTY sessions
// PUT  /agents/:agentId/pty/:ptyId — resize
// DELETE /agents/:agentId/pty/:ptyId — destroy

// WS /agents/:agentId/pty/:ptyId/connect — bidirectional PTY byte stream

Each route looks up the agent's serverPort from the agents Map and proxies to http://127.0.0.1:${serverPort}/pty/.... The WebSocket route does bidirectional byte forwarding.

2. TownContainerDO: PTY WebSocket proxying

Extend the existing WebSocket proxying (used for agent event streams) to support PTY connections. The PTY WebSocket carries raw bytes (not JSON events), so the proxy is a simple bidirectional pipe — no parsing or transformation needed.

3. Gastown Worker: PTY route + WS upgrade

Add a route that upgrades to WebSocket and forwards to the TownContainerDO:

WS /api/towns/:townId/agents/:agentId/pty/:ptyId/connect

Auth uses the existing town auth middleware (CF Access + org membership check).

4. Frontend: xterm.js React component

A self-contained React component (~200 lines):

interface AgentTerminalProps {
  townId: string;
  agentId: string;
  ptyId?: string;  // auto-create if not provided
}
  • Uses @xterm/xterm with @xterm/addon-fit for auto-sizing
  • Connects to wss://gastown-api/api/towns/${townId}/agents/${agentId}/pty/${ptyId}/connect
  • Handles resize events (sends resize to the PTY via PUT)
  • Reconnection with backfill on disconnect

5. Dashboard integration

The terminal component embeds in two places:

  • Agent detail panel: Click "Terminal" tab to see the live agent session
  • Agent stream panel: The existing "Watch" button on agent cards opens the terminal view

The terminal coexists with the structured event stream — users can switch between "Events" (the JSON-based message/tool stream) and "Terminal" (the raw PTY view). Both show the same agent session from different angles.

What this enables

  • See exactly what agents see — the real kilo CLI output, not a filtered/reformatted version
  • Interactive agent sessions — type into the terminal to send follow-up messages, answer permission prompts, or course-correct
  • Debugging — when an agent is stuck, see the raw terminal state (error messages, prompts, hung processes)
  • Familiar interface — anyone who's used kilo locally knows exactly what they're looking at

xterm.js vs ghostty-web

The desktop app uses ghostty-web (Ghostty's WASM build). For the cloud:

Aspect ghostty-web xterm.js
Size Larger WASM binary ~200KB JS
React integration Manual DOM mounting Native @xterm/xterm package
Ecosystem Smaller Mature, widely used
Protocol Same — raw bytes over WS Same — raw bytes over WS

Both consume the same WebSocket protocol. xterm.js is the better fit for a React/Next.js app — smaller bundle, native integration, large ecosystem of addons.

Acceptance Criteria

  • Container control-server has PTY proxy routes (create, list, resize, destroy, WS connect)
  • TownContainerDO proxies PTY WebSocket connections
  • Gastown worker exposes PTY WS upgrade endpoint with auth
  • React component renders xterm.js terminal connected to agent PTY
  • Terminal auto-fits to container dimensions and handles resize
  • Terminal is accessible from agent detail panel in the dashboard
  • User can type into the terminal to interact with the agent session
  • Terminal reconnects gracefully on WebSocket disconnect

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestkilo-auto-fixAuto-generated label by Kilokilo-triagedAuto-generated label by Kilo

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions