Skip to content

Upgrade openclaw and remove bespoke gateway proxy/supervisor code #583

@jrf0110

Description

@jrf0110

Overview

OpenClaw is pinned at 2026.2.22 in the Dockerfile (npm install -g openclaw@2026.2.22). The latest version (2026.2.24) includes built-in Kilo Gateway as a first-class model provider (openclaw/openclaw#20212). This means the bespoke KiloCode provider configuration code in start-openclaw.sh can be replaced by OpenClaw's native onboarding and env var support.

What the PR adds to OpenClaw

The Kilo Gateway provider PR adds:

  • kilocode as a first-class provider option (auth, onboarding, model definition, config application)
  • KILOCODE_API_KEY env var recognition for auth resolution
  • Non-interactive onboarding: openclaw onboard --non-interactive --kilocode-api-key <key> (with auto-inference from --auth-choice kilocode-api-key)
  • Default model: kilocode/anthropic/claude-opus-4.6
  • Provider uses openai-completions API routed through https://api.kilo.ai/api/gateway/
  • 18 unit tests covering provider config, model definition, env var resolution, alias handling

Important: This PR is purely about the model provider — it does NOT change OpenClaw's gateway runtime, process lifecycle, supervision, or proxy behavior. OpenClaw's "gateway" (the openclaw gateway process) is a separate concept from "Kilo Gateway" (the model routing API).

Context: OpenClaw's gateway terminology

OpenClaw has its own gateway concept — a single always-on process for routing, control plane, and channel connections. This is what our controller supervises and proxies to. The "Kilo Gateway" added in the PR is a model provider (like OpenRouter), not a replacement for OpenClaw's gateway process. Our controller, supervisor, and proxy infrastructure remain necessary.

What can be simplified

start-openclaw.sh — KiloCode provider patching (lines 197–247)

Currently, start-openclaw.sh has a ~50-line Node.js block that manually builds the models.providers.kilocode config object:

config.models.providers[providerName] = {
  baseUrl: baseUrl,
  apiKey: process.env.KILOCODE_API_KEY,
  api: "openai-completions",
  models: models,
};
config.agents.defaults.model = { primary: defaultModel };

With native kilocode provider support in OpenClaw, the openclaw onboard command now handles this. We can pass --kilocode-api-key during onboard and OpenClaw will configure the provider, base URL, and default model natively. The manual JSON patching for the provider config, model list, and agent defaults can be removed.

What stays in start-openclaw.sh: The non-provider patching is still needed — gateway auth token, channel config (Telegram/Discord/Slack), exec policy, allowed origins, controlUi.allowInsecureAuth, bind/port settings. These are KiloClaw-specific operational config, not provider config.

KILOCODE_MODELS_JSON — remove the full model catalog env var

The control plane currently sends the entire model catalog (~300-600 models, ~24-48KB) as a single JSON-serialized environment variable. The full flow:

  1. Browser fetches all models from /api/openrouter/models via useOpenRouterModels() hook (src/app/api/openrouter/hooks.ts:75-87)
  2. UI maps to [{ id, name }, ...] and sends every model (not a user-selected subset) via tRPC — both CreateInstanceCard.tsx:93 and SettingsTab.tsx:190
  3. tRPC router (src/routers/kiloclaw-router.ts) forwards kilocodeModels to the KiloClaw worker via provision() or patchKiloCodeConfig()
  4. Platform routes (kiloclaw/src/routes/platform.ts:44,115,131) validate and pass to the DO
  5. DO (kiloclaw/src/durable-objects/kiloclaw-instance.ts:350,496,583) stores kilocodeModels in SQLite
  6. buildEnvVars() (kiloclaw/src/gateway/env.ts:119-121) calls JSON.stringify(userConfig.kilocodeModels) into KILOCODE_MODELS_JSON
  7. This lands in the Fly Machine's config.env as a single plaintext env var
  8. start-openclaw.sh (lines 210-233) parses it back out and patches it into config.models.providers.kilocode.models

With native Kilo Gateway provider support, OpenClaw can resolve available models from the kilocode provider at runtime rather than having the control plane pre-bake the full catalog into an env var. The entire chain — kilocodeModels in the DO, KILOCODE_MODELS_JSON in buildEnvVars(), the model list patching in start-openclaw.sh, and the UI sending the model list on provision/config-patch — can all be removed.

Local development: base URL override

Problem: The openclaw PR hardcodes the kilocode provider's base URL to https://api.kilo.ai/api/gateway/ in OpenClaw's source. It is not configurable via env var or CLI flag. This breaks local development.

Currently, local dev works because start-openclaw.sh manually writes the baseUrl into the provider config, reading from KILOCODE_API_BASE_URL. In dev, this points at a Cloudflare tunnel URL (https://<random>.trycloudflare.com/api/openrouter/) that routes back to the local Next.js app at localhost:3000. Without the ability to override the base URL, Fly machines spawned during local dev would hit production api.kilo.ai instead of the local app.

The current dev flow:

kiloclaw/.dev.vars:
  KILOCODE_API_BASE_URL=https://<tunnel>.trycloudflare.com/api/openrouter/

→ buildEnvVars() passes it as plaintext env var
→ Fly machine receives it
→ start-openclaw.sh writes it into config.models.providers.kilocode.baseUrl
→ OpenClaw routes model requests through the tunnel to localhost:3000

Plan: Keep a minimal post-onboard config patch in start-openclaw.sh that overwrites config.models.providers.kilocode.baseUrl only when KILOCODE_API_BASE_URL is set. Production doesn't set it (falls through to OpenClaw's default), dev sets it to the tunnel URL. This means the provider patching block isn't fully eliminated, but it shrinks from ~50 lines to a small conditional override.

Future improvement: The cleaner long-term fix is to upstream KILOCODE_API_BASE_URL env var support into OpenClaw's kilocode provider so the base URL can be overridden at runtime without config patching. OpenClaw already supports baseUrl as a config field on providers, so it's a matter of checking the env var before falling back to the hardcoded default. This would let us remove the last remaining provider patch from start-openclaw.sh.

Onboard command update

The current onboard invocation:

openclaw onboard --non-interactive --accept-risk \
    --mode local \
    --gateway-port 3001 \
    --gateway-bind loopback \
    --skip-channels \
    --skip-skills \
    --skip-health

Should become:

openclaw onboard --non-interactive --accept-risk \
    --mode local \
    --gateway-port 3001 \
    --gateway-bind loopback \
    --skip-channels \
    --skip-skills \
    --skip-health \
    --kilocode-api-key "$KILOCODE_API_KEY"

This tells OpenClaw to configure the kilocode provider natively during onboard.

What does NOT change

  • Controller (controller/) — Our control plane interface to the machine. Supervisor, proxy, auth, management endpoints, health — all stay. These manage OpenClaw's gateway process, which is unrelated to the Kilo Gateway model provider.
  • Worker proxy (src/index.ts) — HTTP/WebSocket proxying through Fly is our infrastructure.
  • Gateway token derivation (src/auth/gateway-token.ts) — Per-sandbox HMAC tokens for authenticating with the OpenClaw gateway process.
  • Platform API routes (src/routes/platform.ts) — Gateway management RPCs through the DO to the controller.
  • DO gateway controller calls (src/durable-objects/kiloclaw-instance.ts) — RPC to the controller's /_kilo/gateway/* endpoints.
  • Starting-up page (src/pages/starting-up.ts) — Our friendly 503 page for when the machine is booting.
  • JWT auth, Fly Machine lifecycle, env var encryption, DO state management — All independent of OpenClaw version.

Tasks

  • Upgrade openclaw from 2026.2.22 to 2026.2.24 (or latest) in the Dockerfile
  • Update start-openclaw.sh onboard command to pass --kilocode-api-key
  • Remove the manual KiloCode provider JSON patching from start-openclaw.sh (lines 197–247: models.providers.kilocode, agents.defaults.model, model list loading)
  • Remove KILOCODE_MODELS_JSON env var from the control plane pipeline:
    • Remove kilocodeModels from UserConfig type and buildEnvVars() in kiloclaw/src/gateway/env.ts
    • Remove kilocodeModels storage from the DO (kiloclaw/src/durable-objects/kiloclaw-instance.ts)
    • Remove kilocodeModels from platform route schemas (kiloclaw/src/routes/platform.ts)
    • Remove kilocodeModels from instance config schema (kiloclaw/src/schemas/instance-config.ts)
    • Remove kilocodeModels from the tRPC router (src/routers/kiloclaw-router.ts)
    • Remove kilocodeModels from the UI provision/config flows (CreateInstanceCard.tsx, SettingsTab.tsx)
    • Remove kilocodeModels from src/lib/kiloclaw/types.ts
  • Verify KILOCODE_DEFAULT_MODEL still works (may still need to be patched, or OpenClaw may handle it natively)
  • Keep a minimal baseUrl config patch in start-openclaw.sh for local dev (overwrite config.models.providers.kilocode.baseUrl only when KILOCODE_API_BASE_URL is set)
  • Update tests for changed components
  • Verify end-to-end: provision → start → proxy HTTP/WebSocket → chat through Kilo Gateway → gateway management → stop → restart → destroy
  • Update Dockerfile cache bust comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestkilo-auto-fixAuto-generated label by Kilokilo-triagedAuto-generated label by Kilo

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions