A swiss-army knife proxy that sits between your LLM client and provider—giving you a universal adapter, cost optimization, and full visibility with zero code changes.
git clone https://github.com/matdev83/llm-interactive-proxy.git
cd llm-interactive-proxy
python -m venv .venv
source .venv/Scripts/activate # Windows: .venv\Scripts\activate
pip install -e .[dev]export OPENAI_API_KEY="your-key-here"
python -m src.core.cli --default-backend openai:gpt-4o# Instead of direct API calls:
from openai import OpenAI
client = OpenAI(api_key="your-key")
# Use the proxy (base_url only):
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="dummy-key" # Proxy handles real authentication
)
# Now use normally - requests go through the proxy
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)That's it. All your existing code works unchanged—the proxy handles routing, translation, and monitoring transparently.
See Quick Start Guide for detailed configuration.
One configuration. Any client. Any provider.
Stop rewriting your code every time you want to try a different LLM. Stop managing API keys in a dozen different tools. Stop wondering why your agent is stuck in an infinite loop or why your API bill suddenly spiked.
Tired of juggling multiple LLM subscriptions?
Connect all your premium accounts—GPT Plus/Pro, Gemini Advanced, Qwen, GLM Code, and more—through one endpoint. Use them all without switching tools.
Worried about agent misbehavior?
Fix stuck agents with automatic loop detection. Reduce token costs with intelligent context compression. Get a second opinion mid-conversation by switching models seamlessly.
Need more control over what LLMs actually do?
Rewrite prompts and responses on-the-fly without touching client code. Block dangerous git commands before they execute. Add a "guardian angel" model that monitors and helps when your primary model drifts off track.
Want visibility into what's happening?
Capture every request and response in CBOR format. Debug issues, audit usage, and understand exactly what your LLM apps are doing.
Zero changes to your client code. Just point it at the proxy and gain control.
- Protocol Translation — Use OpenAI SDK with Anthropic, Claude client with Gemini, any combination
- Subscription Consolidation — Leverage all your premium LLM accounts through one endpoint
- Flexible Deployment — Single-user mode for development, multi-user mode for production
- Smart Routing — Rotate API keys to maximize free tiers, automatically fallback to cheaper models
- Context Window Compression — Reduce token usage and improve inference speed without losing quality
- Full Observability — Wire capture, usage tracking, token counting, performance metrics
- Loop Detection — Automatically detect and resolve infinite loops and repetitive patterns
- Dynamic Model Switching — Change models mid-conversation for diverse perspectives without losing context
- Quality Verifier — Deploy a secondary model to verify responses when the primary model struggles
- Prompt & Response Rewriting — Modify content on-the-fly to fine-tune agent behavior
- Tool Call Reactors — Override and intercept tool calls to suppress unwanted behaviors
- Usage Limits — Enforce quotas and control resource consumption
- Key Isolation — Configure API keys once, never expose them to clients
- Directory Sandboxing — Restrict LLM tool access to designated safe directories
- Command Protection — Block harmful operations like aggressive git commands
- Tool Access Control — Fine-grained control over which tools LLMs can invoke
- B2BUA Session Isolation — Internal session identity generation and strict trust boundaries (enabled by default; use
--disable-b2bua-session-handlingto opt out)
See User Guide for the complete feature list.
graph TD
subgraph "Clients"
A[OpenAI Client]
B[OpenAI Responses API Client]
C[Anthropic Client]
D[Gemini Client]
E[Any LLM App]
end
subgraph "LLM Interactive Proxy"
FE["Front-end APIs<br/>(OpenAI, Anthropic, Gemini)"]
Core["Core Proxy Logic<br/>(Routing, Translation, Safety)"]
BE["Back-end Connectors<br/>(OpenAI, Anthropic, Gemini, etc.)"]
FE --> Core --> BE
end
subgraph "Providers"
P1[OpenAI API]
P2[Anthropic API]
P3[Google Gemini API]
P4[OpenRouter API]
end
A --> FE
B --> FE
C --> FE
D --> FE
BE --> P1
BE --> P2
BE --> P3
BE --> P4
- User Guide - Feature documentation, configuration, backends, debugging
- Development Guide - Architecture, building, testing, contributing
- Configuration Guide - Complete parameter reference
- CHANGELOG - Version history and updates
- CONTRIBUTING - Contribution guidelines
The proxy exposes multiple standard API surfaces, allowing you to use your favorite clients with any backend:
- OpenAI Chat Completions (
/v1/chat/completions) - Compatible with OpenAI SDKs and most tools. - OpenAI Responses (
/v1/responses) - Optimized for structured output generation. - OpenAI Models (
/v1/models) - Unified model discovery across all backends. - Anthropic Messages (
/anthropic/v1/messages) - Native support for Claude clients/SDKs. - Dedicated Anthropic Server (
http://host:8001/v1/messages) - Drop-in replacement for Anthropic API on a separate port (default: 8001). - Google Gemini v1beta (
/v1beta/models,:generateContent) - Native support for Gemini tools.
See Front-End APIs Overview for more details.
-
OpenAI (Legacy) (GPT-4, GPT-4o, o1, standard Chat Completions)
-
OpenAI Responses API (Optimized for structured output generation)
-
Anthropic (Claude 3.5 Sonnet, Opus, Haiku)
-
Google Gemini (API Key, OAuth, GCP, Vertex AI, Auto-OAuth)
-
OpenRouter (Access to 100+ models)
-
ZAI (Zhipu AI) (GLM models, including support for the GLM Coding Plan)
-
Alibaba Qwen (Coding-optimized LLM models)
-
MiniMax (Hailuo AI reasoning models)
-
InternLM (InternLM AI models with API key rotation)
-
ZenMux (Unified model aggregator)
-
Moonshot AI (Kimi models, including Kimi Code for coding)
-
Cline (Specialized debugging backend)
-
Hybrid (Virtual backend for two-phase reasoning)
-
Antigravity (Internal debugging backends for Gemini/Claude)
See Backends Overview for full details and configuration.
The proxy supports two operational modes to enforce appropriate security boundaries:
- Single User Mode (default): For local development. Allows OAuth connectors, optional authentication, localhost-only binding.
- Multi User Mode: For production/shared deployments. Blocks OAuth connectors, requires authentication for remote access, allows any IP binding.
# Single User Mode (default) - local development
./.venv/Scripts/python.exe -m src.core.cli
# Multi User Mode - production deployment
./.venv/Scripts/python.exe -m src.core.cli --multi-user-mode --host=0.0.0.0 --api-keys key1,key2See Access Modes User Guide for detailed documentation.
- GitHub Issues - Report bugs or request features
- Discussions - Ask questions and share ideas
This project is licensed under the GNU AGPL v3.0 or later.
# Run tests
python -m pytest
# Run linter
python -m ruff --fix check .
# Format code
python -m black .See Development Guide for more details.