Skip to content

matdev83/llm-interactive-proxy

Repository files navigation

LLM Interactive Proxy

CI Architecture Check Coverage Python License Last Commit Open Issues

This project is a swiss-army knife for anyone working with language models and agentic workflows. It sits between any LLM-aware client and any LLM backend, presenting multiple front-end APIs (OpenAI, Anthropic, Gemini) while routing to whichever provider you choose. With the proxy you can translate, reroute, and augment requests on the fly, execute chat-embedded commands, override models, rotate API keys, prevent leaks, and inspect traffic — all from a single drop-in gateway.

Contents

Use Cases

  • Connect Any App to Any Model: Seamlessly route requests from any LLM-powered application to any model, even across different protocols. Use clients like Anthropic's Claude Code CLI with a Gemini 2.5 Pro model, or Codex CLI with a Kimi K2 model.
  • Override Hardcoded Models: Force an application to use a model of your choice, even if the developers didn't provide an option to change it.
  • Inspect and Debug Prompts: Capture and analyze the exact prompts your agent sends to the LLM provider to debug and refine interactions.
  • Customize System Prompts: Rewrite or modify an agent's system prompt to better suit your specific needs and improve its performance.
  • Leverage Your LLM Subscriptions: Use your personal subscriptions, like OpenAI Plus/Pro or Anthropic Pro/MAX plans, with any third-party application, not just those developed by the LLM vendor.
  • Automated Model Tuning for Precision: The proxy automatically detects when a model struggles with tasks like precise file edits and adjusts its parameters to improve accuracy on subsequent attempts.
  • Automatic Tool Call Repair: If a model generates invalid tool calls, the proxy automatically corrects them before they can cause errors in your agent.
  • Automated Error Detection and Steering: Detect when an LLM is stuck in a loop or fails to follow instructions, and automatically generate steering commands to get it back on track.
  • Block Harmful Tool Calls: Prevent potentially destructive actions, such as deleting your git repository, by detecting and blocking harmful tool calls at the proxy level.
  • Maximize Free Tiers with API Key Rotation: Aggregate all your API keys and use auto-rotation to seamlessly switch between them, allowing you to take full advantage of multiple free-tier allowances.

Killer Features

Compatibility

  • Multiple front-ends, many providers: exposes OpenAI, Anthropic, and Gemini APIs while routing to OpenAI, Anthropic, Gemini, OpenRouter, ZAI, Qwen, and more
  • OpenAI compatibility: drop-in /v1/chat/completions for most clients and coding agents
  • Streaming everywhere: consistent streaming and non‑streaming support across providers
  • Gemini OAuth personal gateway: use Google's free personal OAuth (CLI-style) through an OpenAI-compatible endpoint

Reliability

  • Failover routing: fall back to alternate models/providers on rate limits or outages
  • Automated API key rotation: rotate across multiple keys to reduce throttling and extend free-tier allowances
  • Rate limits and context: lightweight rate limiting and per-model context window enforcement

Safety & Integrity

  • Loop detection: detect repeated patterns and halt infinite loops
  • Dangerous-command prevention: steer away from destructive shell actions
  • Key hygiene: redact API keys in prompts and logs
  • Repair helpers: tool-call and JSON repair to fix malformed model outputs

Control & Ergonomics

  • In-chat switching: change back-end and model on the fly with !/backend(...) and !/model(...)
  • Force model override: make clients use the model you choose without changing client code

Observability

  • Wire capture and audit: optional request/response capture file plus usage tracking

Supported APIs (Front-Ends) and Providers (Back-Ends)

These are ready out of the box. Front-ends are the client-facing APIs the proxy exposes; back-ends are the providers the proxy calls.

Front-ends

API surface Path(s) Typical clients Notes
OpenAI Chat Completions /v1/chat/completions Most OpenAI SDKs/tools, coding agents Default front-end
Anthropic Messages /anthropic/v1/messages (+ /anthropic/v1/models, /health, /info) Claude Code, Anthropic SDK Also available on a dedicated port (see Setup)
Google Gemini v1beta /v1beta/models, :generateContent, :streamGenerateContent Gemini-compatible tools/SDKs Translates to your chosen provider

Back-ends

Backend ID Provider Authentication Notes
openai OpenAI OPENAI_API_KEY Standard OpenAI API
openai-oauth OpenAI (ChatGPT/Codex OAuth) Local .codex/auth.json Uses ChatGPT login token instead of API key
anthropic Anthropic ANTHROPIC_API_KEY Claude models via Messages API
anthropic-oauth Anthropic (OAuth) Local OAuth token Claude via OAuth credential flow
gemini Google Gemini GEMINI_API_KEY Metered API key
gemini-cli-oauth-personal Google Gemini (CLI) OAuth (no key) Free-tier personal OAuth like the Gemini CLI
gemini-cli-cloud-project Google Gemini (GCP) OAuth + GOOGLE_CLOUD_PROJECT (+ ADC) Bills to your GCP project
openrouter OpenRouter OPENROUTER_API_KEY Access to many hosted models
zai ZAI ZAI_API_KEY Zhipu/Z.ai access (OpenAI-compatible)
zai-coding-plan ZAI Coding Plan ZAI_API_KEY Works with any supported front-end and coding agent
qwen-oauth Alibaba Qwen Local oauth_creds.json Qwen CLI OAuth; OpenAI-compatible endpoint

Gemini Backends Overview

Choose the Gemini integration that fits your environment.

Backend Authentication Cost Best for
gemini API key (GEMINI_API_KEY) Metered (pay-per-use) Production apps, high-volume usage
gemini-cli-oauth-personal OAuth (no API key) Free tier with limits Local development, testing, personal use
gemini-cli-cloud-project OAuth + GOOGLE_CLOUD_PROJECT (ADC/service account) Billed to your GCP project Enterprise, team workflows, central billing

Notes

  • Personal OAuth uses credentials from the local Google CLI/Code Assist-style flow and does not require a GEMINI_API_KEY.
  • Cloud Project requires GOOGLE_CLOUD_PROJECT and Application Default Credentials (or a service account file).

Quick setup

For gemini (API key)

export GEMINI_API_KEY="AIza..."
python -m src.core.cli --default-backend gemini

For gemini-cli-oauth-personal (free personal OAuth)

# Install and authenticate with the Google Gemini CLI (one-time):
gemini auth

# Then start the proxy using the personal OAuth backend
python -m src.core.cli --default-backend gemini-cli-oauth-personal

For gemini-cli-cloud-project (GCP-billed)

export GOOGLE_CLOUD_PROJECT="your-project-id"

# Provide Application Default Credentials via one of the following:
# Option A: User credentials (interactive)
gcloud auth application-default login

# Option B: Service account file
export GOOGLE_APPLICATION_CREDENTIALS="/absolute/path/to/service-account.json"

python -m src.core.cli --default-backend gemini-cli-cloud-project

Quick Start

  1. Export provider keys (only for the back-ends you plan to use)
export OPENAI_API_KEY=...
export ANTHROPIC_API_KEY=...
export GEMINI_API_KEY=...
export OPENROUTER_API_KEY=...
export ZAI_API_KEY=...
# GCP-based Gemini back-end
export GOOGLE_CLOUD_PROJECT=your-project-id
  1. Start the proxy
python -m src.core.cli --default-backend openai

Useful flags

  • --host 0.0.0.0 and --port 8000 to change bind address
  • --config config/config.example.yaml to load a saved config
  • --capture-file wire.log to record requests/replies (see Debugging)
  • --disable-auth for local only (forces host=127.0.0.1)
  1. Point your client at the proxy
  • OpenAI-compatible tools: set OPENAI_API_BASE=http://localhost:8000/v1 and OPENAI_API_KEY to your proxy key if auth is enabled
  • Claude Code (Anthropic): set ANTHROPIC_API_URL=http://localhost:8001 and ANTHROPIC_API_KEY to your proxy key
  • Gemini clients: call the /v1beta/... endpoints on http://localhost:8000

Tip: Anthropic compatibility is exposed both at /anthropic/... on the main port and, if configured, on a dedicated Anthropic port (defaults to main port + 1). Override via ANTHROPIC_PORT.

Using It Day-To-Day

  • Switch back-end or model on the fly in the chat input:
    • !/backend(openai)
    • !/model(gpt-4o-mini)
    • !/oneoff(openrouter:qwen/qwen3-coder)
  • Keep your existing tools; just point them to the proxy endpoint.
  • The proxy handles streaming, retries/failover (if enabled), and output repair.

Security

  • Do not store provider API keys in config files; use environment variables only.
  • Common keys: OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, OPENROUTER_API_KEY, ZAI_API_KEY, GOOGLE_CLOUD_PROJECT.
  • Optional proxy auth: set LLM_INTERACTIVE_PROXY_API_KEY and require clients to send Authorization: Bearer <key>.
  • Built-in redaction masks API keys in prompts and logs.

Debugging (Wire Capture)

Write outbound requests and inbound replies/streams to a rotating file for troubleshooting.

  • CLI: --capture-file wire.jsonl plus optional rotation caps:
    • --capture-rotate-interval SECONDS
    • --capture-total-max-bytes N
    • --capture-max-files N
  • The capture records source/destination, headers and payloads, and keeps secrets redacted when prompt redaction is enabled.

Optional Capabilities (Short List)

  • Failover and retries: route requests to a next-best model when one fails
  • JSON repair: fix common JSON formatting issues (streaming and non‑streaming)
  • Tool-call repair: convert textual tool calls to proper tool_calls
  • Loop detection: stop repeated identical tool calls
  • Dangerous-command prevention: steer away from destructive shell actions
  • Identity header override: control X-Title/Referer/User-Agent per back-end
  • Content rewriting: REPLACE/PREPEND/APPEND rules on inbound/outbound content
  • Context window enforcement: per-model token limits with friendly errors

Example Config (minimal)

# config.yaml
backends:
  openai:
    type: openai
default_backend: openai
proxy:
  host: 0.0.0.0
  port: 8000
auth:
  # Set LLM_INTERACTIVE_PROXY_API_KEY env var to enable
  disable_auth: false

Run: python -m src.core.cli --config config.yaml

Popular Scenarios

Claude Code with any model/provider

  1. Start the proxy with your preferred back-end (e.g., OpenAI or OpenRouter)
  2. Ensure Anthropic front-end is reachable (main port /anthropic/... or ANTHROPIC_PORT)
  3. Set
export ANTHROPIC_API_URL=http://localhost:8001
export ANTHROPIC_API_KEY=<your-proxy-key>

Then launch claude. You can switch models during a session:

!/backend(openrouter)
!/model(claude-3-5-sonnet-20241022)

Z.AI Coding Plan with coding agents

  • Use back-end zai-coding-plan; it works with any supported front-end and any coding agent
  • Point OpenAI-compatible tools at http://localhost:8000/v1

Gemini options

  • Metered API key (gemini), free personal OAuth (gemini-cli-oauth-personal), or GCP‑billed (gemini-cli-cloud-project). Pick one and set the required env vars.

Errors and Troubleshooting

  • 401/403 from proxy: missing/invalid Authorization header when proxy auth is enabled
  • 400 Bad Request: malformed payload; ensure you send an OpenAI/Anthropic/Gemini-compatible body
  • 422 Unprocessable Entity: validation error; check error details for the field
  • 503 Service Unavailable: upstream provider is unreachable; try another model or enable failover
  • Model not found: ensure the model name exists for the selected back-end

Tips

  • Enable wire capture for tricky issues: --capture-file wire.jsonl
  • Use in-chat !/backend(...) and !/model(...) to isolate provider/model problems
  • Check environment variables are set for the back-end you selected

Support

  • Issues: open a ticket in the repository's issue tracker

License

This project is licensed under the AGPL-3.0-or-later (GNU Affero General Public License v3.0 or later) — see the LICENSE file for details.

Changelog

See the full change history in CHANGELOG.md