Kimi Proxy

[!WARNING] ⚠️ Experimental: This project is still in development. Use with caution in production.

Makes kimi-k2-thinking usable across multiple LLM providers by normalizing API formats, fixing tool call and thinking format issues, and optionally ensuring the model always uses a tool call for agentic workflows.

The proxy and transformation pipelines are built generically and can be easily extended to support any model and any provider.

Features

Multi-provider proxy for kimi-k2-thinking and other models

Seamlessly route requests to OpenAI-compatible APIs, OpenRouter, or Vertex AI using a unified client model name.

Format fixes for tool calls and thinking blocks

For some providers, kimi-k2-thinking returns tool calls and thinking content in non-standard formats. The proxy normalizes these to the standard Anthropic format that clients expect.

Example: Tool call normalization from content

What the kimi-k2 provider returns (tool calls embedded in content with <|tool_call_begin|> markers):

{
  "content": "Let me search for that.     <|tool_call_begin|>    functions.lookup:42  <|tool_call_argument_begin|>   {\"term\":\"express\"}   <|tool_call_begin|>  "
}

What clients receive (normalized):

{
  "content": "Let me search for that.",
  "tool_calls": [
    {
      "id": "42",
      "type": "function",
      "function": {
        "name": "lookup",
        "arguments": "{\"term\":\"express\"}"
      }
    }
  ],
  "finish_reason": "tool_calls"
}

Example: Thinking tags extraction and cleanup

What kimi-k2 returns:

(no content)(no content)  Let me break down... </think>   The answer is 42.

What clients receive:

{
  "content": "The answer is 42.",
  "thinking": "Let me break down..."
}

Tool call enforcement (optional) for reliable agentic workflows

Enable with ensure_tool_call: true in model config. The proxy detects missing tool calls and re-prompts the model with a reminder.

Example enforcement flow:

System: You are a helpful assistant with access to tools.
        Always reply with at least one tool call so the client can continue.

User: What's the weather in SF?

Assistant: Let me check that for you.

System: Reminder: The client will not continue unless you reply with a tool call.

Assistant: {
  "tool_calls": [{
    "id": "get_weather:0",
    "type": "function",
    "function": {
      "name": "get_weather",
      "arguments": "{\"location\": \"SF\"}"
    }
  }]
}

Request/response logging with web dashboard

All requests and responses are logged to SQLite and viewable through a built-in web dashboard at the root path.

Load balancing with multiple strategies

Distribute traffic across providers using round-robin, weighted random, random, or first strategies.

Extensible architecture for adding new models and providers
Provider support: OpenAI-compatible APIs, OpenRouter, Vertex AI

Quick Start

pnpm install
cp .env.example .env
cp model-config.example.yaml model-config.yaml
# Edit .env and model-config.yaml with your provider keys and models
pnpm run dev

The API runs on http://127.0.0.1:8000 and serves the dashboard at /.

Configuration

Providers

Set environment variables in .env:

Generic OpenAI: OPENAI_BASE_URL, OPENAI_API_KEY
OpenRouter: OPENROUTER_API_KEY, OPENROUTER_PROVIDERS (optional), OPENROUTER_ORDER (optional)
Vertex AI: VERTEX_PROJECT_ID, VERTEX_LOCATION, GOOGLE_APPLICATION_CREDENTIALS

Models

Edit model-config.yaml to map client model names to upstream providers:

default_strategy: round_robin
models:
  - name: kimi-k2-thinking
    provider: vertex
    model: moonshotai/kimi-k2-thinking-maas
    # Optional: enforce tool call consistency for reliable agentic workflows
    ensure_tool_call: true
  - name: kimi-k2-thinking
    provider: openrouter
    model: moonshot-ai/kimi-k2-thinking
    weight: 2

Dashboard

The web dashboard shows request/response logs and metrics. Access it at the root path when running the proxy.

Development

pnpm run dev      # Run with hot reload
pnpm run test     # Run unit tests
pnpm run build    # TypeScript build

Docker

docker compose up --build -d  # Production stack with web dashboard
docker compose -f docker-compose.dev.yml watch  # Development with hot reload

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
frontend		frontend
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
eslint.config.js		eslint.config.js
header_fix_summary.txt		header_fix_summary.txt
model-config.example.yaml		model-config.example.yaml
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kimi Proxy

Features

Quick Start

Configuration

Providers

Models

Dashboard

Development

Docker

About

Uh oh!

Languages

License

abcpro1/kimi-proxy

Folders and files

Latest commit

History

Repository files navigation

Kimi Proxy

Features

Quick Start

Configuration

Providers

Models

Dashboard

Development

Docker

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages