[!WARNING]
⚠️ Experimental: This project is still in development. Use with caution in production.
Makes kimi-k2-thinking usable across multiple LLM providers by normalizing API formats, fixing tool call and thinking format issues, and optionally ensuring the model always uses a tool call for agentic workflows.
The proxy and transformation pipelines are built generically and can be easily extended to support any model and any provider.
Multi-provider proxy for kimi-k2-thinking and other models
Seamlessly route requests to OpenAI-compatible APIs, OpenRouter, or Vertex AI using a unified client model name.
Format fixes for tool calls and thinking blocks
For some providers, kimi-k2-thinking returns tool calls and thinking content in non-standard formats. The proxy normalizes these to the standard Anthropic format that clients expect.
Example: Tool call normalization from content
What the kimi-k2 provider returns (tool calls embedded in content with <|tool_call_begin|> markers):
{
"content": "Let me search for that. <|tool_call_begin|> functions.lookup:42 <|tool_call_argument_begin|> {\"term\":\"express\"} <|tool_call_begin|> "
}What clients receive (normalized):
{
"content": "Let me search for that.",
"tool_calls": [
{
"id": "42",
"type": "function",
"function": {
"name": "lookup",
"arguments": "{\"term\":\"express\"}"
}
}
],
"finish_reason": "tool_calls"
}Example: Thinking tags extraction and cleanup
What kimi-k2 returns:
(no content)(no content) Let me break down... </think> The answer is 42.
What clients receive:
{
"content": "The answer is 42.",
"thinking": "Let me break down..."
}Tool call enforcement (optional) for reliable agentic workflows
Enable with ensure_tool_call: true in model config. The proxy detects missing tool calls and re-prompts the model with a reminder.
Example enforcement flow:
System: You are a helpful assistant with access to tools.
Always reply with at least one tool call so the client can continue.
User: What's the weather in SF?
Assistant: Let me check that for you.
System: Reminder: The client will not continue unless you reply with a tool call.
Assistant: {
"tool_calls": [{
"id": "get_weather:0",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"SF\"}"
}
}]
}
Request/response logging with web dashboard
All requests and responses are logged to SQLite and viewable through a built-in web dashboard at the root path.
Load balancing with multiple strategies
Distribute traffic across providers using round-robin, weighted random, random, or first strategies.
- Extensible architecture for adding new models and providers
- Provider support: OpenAI-compatible APIs, OpenRouter, Vertex AI
pnpm install
cp .env.example .env
cp model-config.example.yaml model-config.yaml
# Edit .env and model-config.yaml with your provider keys and models
pnpm run devThe API runs on http://127.0.0.1:8000 and serves the dashboard at /.
Set environment variables in .env:
- Generic OpenAI:
OPENAI_BASE_URL,OPENAI_API_KEY - OpenRouter:
OPENROUTER_API_KEY,OPENROUTER_PROVIDERS(optional),OPENROUTER_ORDER(optional) - Vertex AI:
VERTEX_PROJECT_ID,VERTEX_LOCATION,GOOGLE_APPLICATION_CREDENTIALS
Edit model-config.yaml to map client model names to upstream providers:
default_strategy: round_robin
models:
- name: kimi-k2-thinking
provider: vertex
model: moonshotai/kimi-k2-thinking-maas
# Optional: enforce tool call consistency for reliable agentic workflows
ensure_tool_call: true
- name: kimi-k2-thinking
provider: openrouter
model: moonshot-ai/kimi-k2-thinking
weight: 2The web dashboard shows request/response logs and metrics. Access it at the root path when running the proxy.
pnpm run dev # Run with hot reload
pnpm run test # Run unit tests
pnpm run build # TypeScript builddocker compose up --build -d # Production stack with web dashboard
docker compose -f docker-compose.dev.yml watch # Development with hot reload