-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Overview
A collection of capabilities that are either impossible in local Gastown or dramatically better in the cloud. These are not prioritized for any specific phase — they're ideas to inform architecture decisions and inspire future work.
Parent: #204
Multi-Tenancy and Shared Towns
Local Gastown is single-player — one human, one machine, one Mayor. The cloud makes towns collaborative. Multiple engineers share a town, all talking to the same Mayor, watching the same dashboard, seeing each other's convoys land. The Mayor has context about what everyone is doing. "Don't touch the auth module, Sarah's convoy is refactoring it right now" becomes something the Mayor can say because it has the full picture.
This extends to org-level fleet management — a VP of Engineering sees aggregate metrics across all towns (cost, velocity, quality), and can spot patterns like "the payments team's polecats have a 40% rework rate on the billing repo, but the platform team's are at 8%."
See plans/gastown-org-level-architecture.md for the full org-level spec.
Agent Marketplace and Shared Formulas
Local Gastown's Mol Mall is a concept in the docs but barely implemented. The cloud can make this real — a hosted formula registry where users publish and install workflow templates. "Here's a formula for migrating a React codebase from class components to hooks" or "Here's a 12-step molecule for adding comprehensive test coverage to an untested module." Formulas become a product surface, not just an internal tool.
Beyond formulas, agent configurations become shareable — system prompts, quality gate configs, model selections that produce good results for specific kinds of work. "This polecat config with Claude Opus and this system prompt produces 95% first-pass merge rate on TypeScript refactors" becomes community knowledge.
Cross-Session Intelligence at Scale
Local Gastown's seance system lets an agent query one predecessor. The cloud has every agent event stored in AgentDOs. Build cross-agent, cross-session search: "Which agent last touched the payment processing module, and what did they change?" The cloud can answer this by querying across all agent event histories in a town. This is institutional memory that scales — every agent's work is indexed and searchable.
When a polecat is assigned work on a file, the system can automatically surface relevant context from previous agents who touched that file. "Toast worked on this module 3 days ago and left a note about a tricky race condition in the checkout flow." The polecat gets this in its prime context without anyone asking for it. Local Gastown can't do this because session histories aren't centrally indexed.
Real-Time Cost Attribution and Budget Controls
The cloud routes all LLM calls through the Kilo gateway, which means per-bead cost tracking in real time. "This convoy cost $47 across 12 beads and 3 rigs." Local Gastown tracks costs retroactively via gt costs record on session stop, but the cloud can do it live — show a cost ticker on the dashboard that updates as agents stream tokens.
This enables budget guardrails: "This rig has a $200/day budget. When polecats approach the limit, stop slinging new work and notify the Mayor." Or per-convoy budgets: "This refactoring convoy is capped at $500. If we're at $450 and 3 beads are still open, escalate to the human to decide if it's worth continuing." Local Gastown has no concept of cost limits — agents run until the work is done or the human intervenes.
Speculative Execution and A/B Testing
With the container model, sling the same bead to multiple polecats with different configurations — different models, different system prompts, different temperatures — and let the Refinery pick the best result. "Give this task to three polecats: one with Opus, one with Sonnet, one with Gemini. Merge whichever passes quality gates first, discard the rest."
This turns agent orchestration into an experimentation platform. Over time, the system accumulates data: "For TypeScript refactoring tasks, Sonnet 4.6 produces first-pass merges 82% of the time at $0.40/bead. Opus produces 94% at $2.10/bead. For this task complexity, Sonnet is the better value." The cloud can make this recommendation automatically because it has the data. Local Gastown can compare models via the gt-model-eval framework, but it's manual and offline.
Webhook-Driven Beads
The cloud sits behind a web server with existing GitHub/GitLab webhook infrastructure. GitHub issues can automatically become Gastown beads. A new PR from an external contributor triggers a review bead. A failing CI run creates an escalation. A Slack message creates a bead. The Mayor sees it all in its inbox and can decide to act.
Local Gastown is pull-only — agents check for work on their hooks. The cloud can be push-driven, reacting to external events in real time. A GitHub issue labeled gastown auto-creates a bead, the Mayor assesses it, slings it if appropriate, and the human who filed the issue sees a comment: "A polecat is working on this. Track progress at [dashboard link]."
Persistent Agent Reputation
Local Gastown tracks agent CVs (work history per polecat identity), but it's per-machine and per-town. The cloud can build cross-town agent reputation at the platform level. Not just "Toast completed 47 beads" but "Across all Kilo Cloud Gastown towns, Claude Opus polecats with the standard polecat prompt have a 91% first-pass merge rate on Python repos, dropping to 73% on Rust repos." This becomes a data product — model performance benchmarks derived from real production work across the entire user base.
Always-On with Zero Maintenance
The most obvious but most impactful: the cloud town is always running. No laptop that needs to stay open, no tmux sessions to babysit, no gt mayor attach to check on things. Tell the Mayor to refactor a module at 5 PM, close your laptop, and check the dashboard from your phone at dinner. The convoy landed, the PR is merged, the tests pass. The entire watchdog chain (alarm-driven health checks, triage agents for ambiguous situations) runs without any human presence.
Local CLI Bridge (Stretch Goal)
Connect your local Kilo CLI to your cloud Gastown instance. kilo gastown connect <town-url> authenticates and registers your local instance as a crew-like agent in the cloud town. You get all the coordination benefits (beads, mail, identity, attribution, convoy tracking) while running locally with full filesystem access and your own dev environment. Your local session appears in the cloud dashboard as an active agent.
This bridges "fully cloud-hosted" and "I want to work locally but with cloud coordination." The tool plugin's HTTP API surface is the same whether the agent runs in a Cloudflare Container or on someone's laptop.
SolidJS Dashboard: Native Integration with Kilo's Web UI
The Gastown dashboard currently lives in the Next.js (React) app. The Kilo desktop/web app (packages/app/) is built with SolidJS — including all the session UI, message rendering, tool execution cards, diff viewer, file tabs, terminal panel, and prompt dock. These are mature, production-quality components that represent significant engineering investment.
A future direction is to migrate the Gastown dashboard to SolidJS and serve it directly from the gastown Cloudflare Worker, making the UI a first-class part of the gastown service rather than a feature embedded in the main Next.js app. This enables:
-
Native reuse of Kilo's UI components. The session viewer, message timeline, tool execution cards, diff viewer, and terminal panel from
@opencode-ai/uiandpackages/app/can be used directly — no framework bridging, no React ports, no iframe isolation. The same<SessionTurn>,<BasicTool>,<Code>, and<Diff>components that render the desktop app render the cloud agent streams. -
Two interaction modes for agents. The xterm.js terminal (Phase 2.5) provides raw PTY access to agent sessions. The SolidJS session viewer provides the rich, structured view — markdown message bubbles, expandable tool call cards with input/output, inline diffs, file tabs. Both show the same agent session. Users pick the view that suits their workflow. The terminal is power-user/debugging; the session viewer is the polished product experience.
-
Independent deployment. The gastown dashboard deploys with the gastown worker on its own release cycle, decoupled from the main Next.js app. UI changes to the gastown experience don't require rebuilding and deploying the entire Kilo web app.
-
Server connection is already compatible. The Kilo app connects to kilo serve via HTTP REST + SSE (via
@kilocode/sdk). The cloud container runs kilo serve instances. The SDK client is framework-agnostic. The SolidJS dashboard would use the samecreateOpencodeClient()to connect to agent sessions through the gastown worker proxy — the same code path, just routed through a different network layer.
What this looks like architecturally:
Browser
│
├── Main Kilo App (Next.js/React) — account, billing, integrations, cloud agent
│ └── /gastown/* routes → redirect to gastown UI
│
└── Gastown UI (SolidJS, served from gastown worker)
├── Town dashboard, rig workbench, bead board, convoy tracker
├── Mayor chat (using Kilo app's session components)
├── Agent stream viewer (using Kilo app's message timeline)
├── Agent terminal (xterm.js — kept from Phase 2.5)
└── All backed by gastown worker API + TownDO
The main Next.js app handles everything outside of Gastown (account management, billing, integrations, standalone cloud agent sessions). Gastown owns its own UI surface, served from its own worker, using the same component library as the rest of Kilo.
This is a significant architectural shift — not a near-term task. It depends on the Kilo app's component library being extractable as a standalone SolidJS package, and on the gastown worker being able to serve static assets (or using Cloudflare Pages alongside the worker). But it's the natural end state: the gastown UI is a SolidJS app that shares components with the Kilo desktop/web app, giving both the same quality of agent interaction rendering.
UI-Aware Mayor: Dashboard Context Injection
The Mayor should know what the user is looking at. When the user is staring at a failed bead, the Mayor shouldn't need to be told "bead gt-abc failed" — it should already know, because the dashboard is feeding the user's navigation context into the Mayor's session.
How it works
The dashboard client maintains a user activity stream — a rolling log of what the user has done and what they're currently viewing. On every Mayor message submission, this context is injected as a system block alongside the user's message:
<user-context>
<current-view page="rig-detail" rig="frontend" />
<viewing-bead bead-id="gt-abc" title="Fix token refresh" status="failed"
assignee="Toast" failure-reason="Tests failed: auth.test.ts line 42" />
<recent-actions>
- 2m ago: Viewed convoy "auth-refactor" (3/5 beads closed)
- 5m ago: Viewed agent Toast (status: dead, last activity: 8m ago)
- 8m ago: Opened bead "Fix token refresh" detail panel
- 12m ago: Navigated to rig "frontend"
</recent-actions>
</user-context>
The Mayor sees this context and can respond intelligently without the user having to explain: "I see you're looking at bead gt-abc which failed because of a test failure in auth.test.ts. The agent Toast that was working on it is dead. Would you like me to re-sling this to a new polecat, or should I look at the test failure first?"
What gets tracked
| Signal | Source | Example |
|---|---|---|
| Current page/view | Router location | rig-detail, town-home, convoy-detail |
| Currently viewed object | Active slide-over / detail panel | Bead, agent, convoy, escalation |
| Recent navigation | Page view history (last 10-15 actions, 30 min window) | "Viewed rig X", "Opened bead Y", "Watched agent Z" |
| Recent user actions | Dashboard interactions | "Acknowledged escalation", "Created bead", "Clicked Watch on agent" |
| Selected objects | Multi-select or highlighted items | Beads selected on the kanban board |
What does NOT get tracked
- Exact mouse movements or scroll position
- Time spent on each view (beyond ordering)
- Anything from outside the Gastown dashboard
Implementation
The frontend maintains a bounded circular buffer of UserActivityEvent objects:
type UserActivityEvent = {
timestamp: string;
action: 'page_view' | 'object_view' | 'object_action';
page?: string;
object?: { type: string; id: string; summary: string };
detail?: string;
};When sending a Mayor message, the frontend includes the current context and recent activity in a context field on the message payload. The TownDO resolves the referenced objects (fetches current bead status, agent state, etc.) and injects the resolved context into the Mayor's prompt as a system block.
Progressive richness
The context injection can start simple and get richer over time:
- Phase 1: Current page + currently viewed object (just the bead/agent/rig the user has open)
- Phase 2: Recent navigation trail (last 10 actions)
- Phase 3: Inferred intent — "the user has been looking at failed beads for the last 5 minutes, they're probably trying to diagnose a problem"
- Phase 4: Proactive Mayor suggestions — the Mayor notices the user's pattern and offers help before they ask. "I see you've been looking at 3 failed beads in the frontend rig. They all failed on the same test. Want me to investigate the common root cause?"
Why this matters
Without UI context, the Mayor is a blank-slate chatbot that requires the user to explain everything. With UI context, the Mayor becomes a copilot that shares the user's visual field. The conversation goes from:
User: "The auth fix failed, can you look at bead gt-abc and re-sling it?"
to:
User: "Can you fix this?"
Mayor: "The bead you're viewing (gt-abc: Fix token refresh) failed because auth.test.ts:42 expected a 401 but got a 500. Toast, the agent that worked on it, is dead. I'll create a new bead with the test failure context and sling it to a fresh polecat. The new agent will know exactly what went wrong."
The user's dashboard becomes a shared workspace where the Mayor can see what the user sees.
Code Diffs Linked to Beads
Every bead that results in code changes should have its diff viewable directly from the dashboard. When a polecat works on a bead and pushes a branch, the diff between the branch and its base (main) is the artifact of that work. It should be a first-class, clickable object on the bead detail panel — not something you have to go to GitHub to find.
How it works
When a polecat calls gt_done and pushes its branch, the TownDO records the branch name and base commit on the bead (or its review_metadata satellite). The dashboard can then fetch and render the diff on demand.
Two sources for the diff data:
-
Container git: The container's git manager can run
git diff main...<branch>in the rig's repo and return the result via a new API endpoint (GET /agents/:agentId/difforGET /git/diff?rig=X&branch=Y). This works while the container is running and the branch exists locally. -
GitHub/GitLab API: For diffs that outlive the container session (or when the container has slept), fetch the diff via the platform API using the org's integration token.
GET /repos/{owner}/{repo}/compare/{base}...{head}on GitHub returns the full diff. This is the durable path — it works as long as the branch exists on the remote.
What the user sees
On the bead detail panel, a "Diff" tab appears when the bead has associated code changes:
- File-level diff summary (files changed, insertions, deletions)
- Expandable per-file unified diff with syntax highlighting
- Click a file to see the full diff for that file
For beads that went through the Refinery and merged, the diff shows what landed on main (the squash-merge commit). For beads still in progress, the diff shows the current state of the polecat's branch vs main.
What the Mayor sees
When the Mayor checks on a bead or convoy, diff summaries can be included in the context: "Toast's bead changed 4 files (+120 -45): src/auth.ts, src/middleware.ts, test/auth.test.ts, package.json." This gives the Mayor a sense of the scope and nature of the work without reading the full diff.
Convoy-level diffs
At the convoy level, aggregate the diffs across all tracked beads: total files changed, total lines added/removed, and a merged file list. "The auth refactor convoy touched 12 files across 2 rigs, +340 -180." This is the high-level summary for the user who wants to know the blast radius of a batch of work.