This project is a swiss-army knife for anyone working with language models and agentic workflows. It sits between any LLM-aware client and any LLM backend, presenting multiple front-end APIs (OpenAI, Anthropic, Gemini) while routing to whichever provider you choose. With the proxy you can translate, reroute, and augment requests on the fly, execute chat-embedded commands, override models, rotate API keys, prevent leaks, and inspect traffic — all from a single drop-in gateway.
- Use Cases
- Killer Features
- Supported APIs (Front-Ends) and Providers (Back-Ends)
- Gemini Backends Overview
- Quick Start
- Using It Day-To-Day
- Security
- Debugging (Wire Capture)
- Optional Capabilities (Short List)
- Example Config (minimal)
- Popular Scenarios
- Errors and Troubleshooting
- Support
- License
- Changelog
- Connect Any App to Any Model: Seamlessly route requests from any LLM-powered application to any model, even across different protocols. Use clients like Anthropic's Claude Code CLI with a Gemini 2.5 Pro model, or Codex CLI with a Kimi K2 model.
- Override Hardcoded Models: Force an application to use a model of your choice, even if the developers didn't provide an option to change it.
- Inspect and Debug Prompts: Capture and analyze the exact prompts your agent sends to the LLM provider to debug and refine interactions.
- Customize System Prompts: Rewrite or modify an agent's system prompt to better suit your specific needs and improve its performance.
- Leverage Your LLM Subscriptions: Use your personal subscriptions, like OpenAI Plus/Pro or Anthropic Pro/MAX plans, with any third-party application, not just those developed by the LLM vendor.
- Automated Model Tuning for Precision: The proxy automatically detects when a model struggles with tasks like precise file edits and adjusts its parameters to improve accuracy on subsequent attempts.
- Automatic Tool Call Repair: If a model generates invalid tool calls, the proxy automatically corrects them before they can cause errors in your agent.
- Automated Error Detection and Steering: Detect when an LLM is stuck in a loop or fails to follow instructions, and automatically generate steering commands to get it back on track.
- Block Harmful Tool Calls: Prevent potentially destructive actions, such as deleting your git repository, by detecting and blocking harmful tool calls at the proxy level.
- Maximize Free Tiers with API Key Rotation: Aggregate all your API keys and use auto-rotation to seamlessly switch between them, allowing you to take full advantage of multiple free-tier allowances.
- Multiple front-ends, many providers: exposes OpenAI, Anthropic, and Gemini APIs while routing to OpenAI, Anthropic, Gemini, OpenRouter, ZAI, Qwen, and more
- OpenAI compatibility: drop-in
/v1/chat/completions
for most clients and coding agents - Streaming everywhere: consistent streaming and non‑streaming support across providers
- Gemini OAuth personal gateway: use Google's free personal OAuth (CLI-style) through an OpenAI-compatible endpoint
- Failover routing: fall back to alternate models/providers on rate limits or outages
- Automated API key rotation: rotate across multiple keys to reduce throttling and extend free-tier allowances
- Rate limits and context: lightweight rate limiting and per-model context window enforcement
- Loop detection: detect repeated patterns and halt infinite loops
- Dangerous-command prevention: steer away from destructive shell actions
- Key hygiene: redact API keys in prompts and logs
- Repair helpers: tool-call and JSON repair to fix malformed model outputs
- In-chat switching: change back-end and model on the fly with
!/backend(...)
and!/model(...)
- Force model override: make clients use the model you choose without changing client code
- Wire capture and audit: optional request/response capture file plus usage tracking
These are ready out of the box. Front-ends are the client-facing APIs the proxy exposes; back-ends are the providers the proxy calls.
API surface | Path(s) | Typical clients | Notes |
---|---|---|---|
OpenAI Chat Completions | /v1/chat/completions |
Most OpenAI SDKs/tools, coding agents | Default front-end |
Anthropic Messages | /anthropic/v1/messages (+ /anthropic/v1/models , /health , /info ) |
Claude Code, Anthropic SDK | Also available on a dedicated port (see Setup) |
Google Gemini v1beta | /v1beta/models , :generateContent , :streamGenerateContent |
Gemini-compatible tools/SDKs | Translates to your chosen provider |
Backend ID | Provider | Authentication | Notes |
---|---|---|---|
openai |
OpenAI | OPENAI_API_KEY |
Standard OpenAI API |
openai-oauth |
OpenAI (ChatGPT/Codex OAuth) | Local .codex/auth.json |
Uses ChatGPT login token instead of API key |
anthropic |
Anthropic | ANTHROPIC_API_KEY |
Claude models via Messages API |
anthropic-oauth |
Anthropic (OAuth) | Local OAuth token | Claude via OAuth credential flow |
gemini |
Google Gemini | GEMINI_API_KEY |
Metered API key |
gemini-cli-oauth-personal |
Google Gemini (CLI) | OAuth (no key) | Free-tier personal OAuth like the Gemini CLI |
gemini-cli-cloud-project |
Google Gemini (GCP) | OAuth + GOOGLE_CLOUD_PROJECT (+ ADC) |
Bills to your GCP project |
openrouter |
OpenRouter | OPENROUTER_API_KEY |
Access to many hosted models |
zai |
ZAI | ZAI_API_KEY |
Zhipu/Z.ai access (OpenAI-compatible) |
zai-coding-plan |
ZAI Coding Plan | ZAI_API_KEY |
Works with any supported front-end and coding agent |
qwen-oauth |
Alibaba Qwen | Local oauth_creds.json |
Qwen CLI OAuth; OpenAI-compatible endpoint |
Choose the Gemini integration that fits your environment.
Backend | Authentication | Cost | Best for |
---|---|---|---|
gemini |
API key (GEMINI_API_KEY ) |
Metered (pay-per-use) | Production apps, high-volume usage |
gemini-cli-oauth-personal |
OAuth (no API key) | Free tier with limits | Local development, testing, personal use |
gemini-cli-cloud-project |
OAuth + GOOGLE_CLOUD_PROJECT (ADC/service account) |
Billed to your GCP project | Enterprise, team workflows, central billing |
Notes
- Personal OAuth uses credentials from the local Google CLI/Code Assist-style flow and does not require a
GEMINI_API_KEY
. - Cloud Project requires
GOOGLE_CLOUD_PROJECT
and Application Default Credentials (or a service account file).
Quick setup
For gemini
(API key)
export GEMINI_API_KEY="AIza..."
python -m src.core.cli --default-backend gemini
For gemini-cli-oauth-personal
(free personal OAuth)
# Install and authenticate with the Google Gemini CLI (one-time):
gemini auth
# Then start the proxy using the personal OAuth backend
python -m src.core.cli --default-backend gemini-cli-oauth-personal
For gemini-cli-cloud-project
(GCP-billed)
export GOOGLE_CLOUD_PROJECT="your-project-id"
# Provide Application Default Credentials via one of the following:
# Option A: User credentials (interactive)
gcloud auth application-default login
# Option B: Service account file
export GOOGLE_APPLICATION_CREDENTIALS="/absolute/path/to/service-account.json"
python -m src.core.cli --default-backend gemini-cli-cloud-project
- Export provider keys (only for the back-ends you plan to use)
export OPENAI_API_KEY=...
export ANTHROPIC_API_KEY=...
export GEMINI_API_KEY=...
export OPENROUTER_API_KEY=...
export ZAI_API_KEY=...
# GCP-based Gemini back-end
export GOOGLE_CLOUD_PROJECT=your-project-id
- Start the proxy
python -m src.core.cli --default-backend openai
Useful flags
--host 0.0.0.0
and--port 8000
to change bind address--config config/config.example.yaml
to load a saved config--capture-file wire.log
to record requests/replies (see Debugging)--disable-auth
for local only (forces host=127.0.0.1)
- Point your client at the proxy
- OpenAI-compatible tools: set
OPENAI_API_BASE=http://localhost:8000/v1
andOPENAI_API_KEY
to your proxy key if auth is enabled - Claude Code (Anthropic): set
ANTHROPIC_API_URL=http://localhost:8001
andANTHROPIC_API_KEY
to your proxy key - Gemini clients: call the
/v1beta/...
endpoints onhttp://localhost:8000
Tip: Anthropic compatibility is exposed both at /anthropic/...
on the main port and, if configured, on a dedicated Anthropic port (defaults to main port + 1). Override via ANTHROPIC_PORT
.
- Switch back-end or model on the fly in the chat input:
!/backend(openai)
!/model(gpt-4o-mini)
!/oneoff(openrouter:qwen/qwen3-coder)
- Keep your existing tools; just point them to the proxy endpoint.
- The proxy handles streaming, retries/failover (if enabled), and output repair.
- Do not store provider API keys in config files; use environment variables only.
- Common keys:
OPENAI_API_KEY
,ANTHROPIC_API_KEY
,GEMINI_API_KEY
,OPENROUTER_API_KEY
,ZAI_API_KEY
,GOOGLE_CLOUD_PROJECT
. - Optional proxy auth: set
LLM_INTERACTIVE_PROXY_API_KEY
and require clients to sendAuthorization: Bearer <key>
. - Built-in redaction masks API keys in prompts and logs.
Write outbound requests and inbound replies/streams to a rotating file for troubleshooting.
- CLI:
--capture-file wire.jsonl
plus optional rotation caps:--capture-rotate-interval SECONDS
--capture-total-max-bytes N
--capture-max-files N
- The capture records source/destination, headers and payloads, and keeps secrets redacted when prompt redaction is enabled.
- Failover and retries: route requests to a next-best model when one fails
- JSON repair: fix common JSON formatting issues (streaming and non‑streaming)
- Tool-call repair: convert textual tool calls to proper
tool_calls
- Loop detection: stop repeated identical tool calls
- Dangerous-command prevention: steer away from destructive shell actions
- Identity header override: control X-Title/Referer/User-Agent per back-end
- Content rewriting: REPLACE/PREPEND/APPEND rules on inbound/outbound content
- Context window enforcement: per-model token limits with friendly errors
# config.yaml
backends:
openai:
type: openai
default_backend: openai
proxy:
host: 0.0.0.0
port: 8000
auth:
# Set LLM_INTERACTIVE_PROXY_API_KEY env var to enable
disable_auth: false
Run: python -m src.core.cli --config config.yaml
- Start the proxy with your preferred back-end (e.g., OpenAI or OpenRouter)
- Ensure Anthropic front-end is reachable (main port
/anthropic/...
orANTHROPIC_PORT
) - Set
export ANTHROPIC_API_URL=http://localhost:8001
export ANTHROPIC_API_KEY=<your-proxy-key>
Then launch claude
. You can switch models during a session:
!/backend(openrouter)
!/model(claude-3-5-sonnet-20241022)
- Use back-end
zai-coding-plan
; it works with any supported front-end and any coding agent - Point OpenAI-compatible tools at
http://localhost:8000/v1
- Metered API key (
gemini
), free personal OAuth (gemini-cli-oauth-personal
), or GCP‑billed (gemini-cli-cloud-project
). Pick one and set the required env vars.
- 401/403 from proxy: missing/invalid
Authorization
header when proxy auth is enabled - 400 Bad Request: malformed payload; ensure you send an OpenAI/Anthropic/Gemini-compatible body
- 422 Unprocessable Entity: validation error; check error details for the field
- 503 Service Unavailable: upstream provider is unreachable; try another model or enable failover
- Model not found: ensure the model name exists for the selected back-end
- Enable wire capture for tricky issues:
--capture-file wire.jsonl
- Use in-chat
!/backend(...)
and!/model(...)
to isolate provider/model problems - Check environment variables are set for the back-end you selected
- Issues: open a ticket in the repository's issue tracker
This project is licensed under the AGPL-3.0-or-later (GNU Affero General Public License v3.0 or later) — see the LICENSE file for details.
See the full change history in CHANGELOG.md