A production-ready, OpenAI-compatible API proxy for Qwen models with intelligent model routing, automatic tool injection, and comprehensive validation.
- Smart Aliases: Use friendly names like
Qwen_Research,Qwen_Think,Qwen_Code - Auto-Tool Injection: Web search automatically added server-side
- Default Fallback: Unknown models β
qwen3-max-latestwith web search - Backward Compatible: Direct Qwen model names work unchanged
- OpenAPI Validation: All requests validated against official OpenAI spec
- Anonymous Mode: Works without API keys (or any key works)
- Bearer Token Caching: Automated authentication with Playwright
- Request/Response Sanitization: Full validation middleware
- Async/Await: Non-blocking I/O for high concurrency
- Streaming Support: Full SSE streaming for real-time responses
- Request Tracking: Built-in monitoring and analytics
- Health Checks:
/healthand/v1/modelsendpoints
- Alias for:
qwen3-max-latest - Auto-Tools:
web_search(always applied) - Max Tokens: Provider default
- Use Case: General purpose, unknown model names
- Example:
# These all route to qwen3-max-latest + web_search: model="gpt-4" model="claude-3-opus" model="random-model-name"
- Routes to:
qwen-deep-research - Auto-Tools: NONE (clean research mode)
- Max Tokens: Provider default
- Use Case: Deep research without tool interference
- Example:
client = OpenAI(api_key="sk-any", base_url="http://localhost:8096/v1") response = client.chat.completions.create( model="Qwen_Research", # Case-insensitive messages=[{"role": "user", "content": "Research quantum computing"}] )
- Routes to:
qwen3-235b-a22b-2507 - Auto-Tools:
web_search(always applied) - Max Tokens:
81,920(extended context) - Use Case: Complex reasoning with web access
- Example:
response = client.chat.completions.create( model="Qwen_Think", messages=[{"role": "user", "content": "Solve this complex problem..."}] ) # Server automatically adds web_search tool + 81920 token limit
- Routes to:
qwen3-coder-plus - Auto-Tools:
web_search(always applied) - Max Tokens: Provider default
- Use Case: Code generation with web documentation access
- Example:
response = client.chat.completions.create( model="Qwen_Code", messages=[{"role": "user", "content": "Write a Python REST API"}] ) # Web search helps with latest library documentation
These models pass through without transformation:
# Backward compatibility - work as expected:
model="qwen2.5-max"
model="qwen2.5-turbo"
model="qwen-deep-research"
model="qwen-max-latest"
model="qwen3-max-latest"
model="qwen3-235b-a22b-2507"
model="qwen3-coder-plus"
model="qwen-math-plus"
model="qwen-math-turbo"
model="qwen-coder-turbo"
model="qwen-vl-max"
model="qwen-vl-plus"# Set your Qwen credentials
export QWEN_EMAIL="your-email@example.com"
export QWEN_PASSWORD="your-password"
# Deploy everything (setup + auth + server + tests)
curl -sSL https://raw.githubusercontent.com/Zeeeepa/qwen-api/main/deploy_qwen_api.sh | bash# Clone repository
git clone https://github.com/Zeeeepa/qwen-api.git
cd qwen-api
# Set credentials
export QWEN_EMAIL="your-email@example.com"
export QWEN_PASSWORD="your-password"
# Run deployment script
bash scripts/all.sh# 1. Setup environment
bash scripts/setup.sh
# 2. Extract authentication token
python3 scripts/extract_bearer_token.py
# 3. Start server
bash scripts/start.sh
# 4. Test API (optional)
bash scripts/send_request.sh- Python: 3.11 or higher
- OS: Linux, macOS, Windows (WSL2)
- Memory: 512MB minimum, 2GB recommended
- Disk: 500MB for dependencies + browsers
# Core dependencies (auto-installed by setup script)
pip install -r requirements.txt
# Key packages:
# - fastapi + granian (async web server)
# - playwright (browser automation)
# - httpx (HTTP client)
# - pydantic (data validation)# 1. Create virtual environment
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 2. Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
# 3. Install Playwright browsers
playwright install --with-deps chromium
# 4. Create .env file
cat > .env << EOF
LISTEN_PORT=8096
ANONYMOUS_MODE=true
EOF
# 5. Create directories
mkdir -p logs cacheThe deployment script automatically:
- Launches headless Chromium browser
- Logs into Qwen with your credentials
- Extracts Bearer token from network traffic
- Caches token to
.qwen_bearer_token - Reuses cached token until expiration
# Run Playwright authentication
python3 scripts/extract_bearer_token.py
# Token saved to: .qwen_bearer_token
# Format: Bearer eyJ...No credentials needed! The server works in anonymous mode:
# Any API key works:
client = OpenAI(api_key="sk-anything", base_url="http://localhost:8096/v1")
client = OpenAI(api_key="fake-key-123", base_url="http://localhost:8096/v1")
client = OpenAI(api_key="", base_url="http://localhost:8096/v1")from openai import OpenAI
# Initialize client
client = OpenAI(
api_key="sk-any", # Any key works!
base_url="http://localhost:8096/v1"
)
# Example 1: Unknown model β Default fallback
response = client.chat.completions.create(
model="gpt-4", # Routes to qwen3-max-latest + web_search
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
# Example 2: Research mode (no tools)
response = client.chat.completions.create(
model="Qwen_Research", # qwen-deep-research, no tools
messages=[{"role": "user", "content": "Research topic..."}]
)
# Example 3: Thinking mode (extended context)
response = client.chat.completions.create(
model="Qwen_Think", # qwen3-235b-a22b-2507 + web_search + 81920 tokens
messages=[{"role": "user", "content": "Complex reasoning..."}]
)
# Example 4: Code generation
response = client.chat.completions.create(
model="Qwen_Code", # qwen3-coder-plus + web_search
messages=[{"role": "user", "content": "Write FastAPI endpoint"}]
)
# Example 5: Streaming
stream = client.chat.completions.create(
model="Qwen_Think",
messages=[{"role": "user", "content": "Write a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")# Test with any model name
curl -X POST http://localhost:8096/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-any" \
-d '{
"model": "Qwen_Think",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Streaming
curl -X POST http://localhost:8096/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-any" \
-d '{
"model": "Qwen_Code",
"messages": [{"role": "user", "content": "Write Python code"}],
"stream": true
}'import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'sk-any',
baseURL: 'http://localhost:8096/v1'
});
// Use any model alias
const response = await client.chat.completions.create({
model: 'Qwen_Think',
messages: [{ role: 'user', content: 'Hello!' }]
});
console.log(response.choices[0].message.content);# Test all routing scenarios + tool integration
python3 test_all_routing_scenarios.pyπ COMPREHENSIVE ROUTING & TOOL INTEGRATION TESTS
================================================================================
Testing against: http://localhost:8096/v1
Total scenarios: 5 routing + 2 web search tests
β
SCENARIO 1: Default Fallback (gpt-5 β qwen3-max-latest + web_search)
β
SCENARIO 2: Qwen_Research (β qwen-deep-research, no tools)
β
SCENARIO 3: Qwen_Think (β qwen3-235b-a22b-2507 + web_search + 81920 tokens)
β
SCENARIO 4: Qwen_Code (β qwen3-coder-plus + web_search)
β
SCENARIO 5: Direct Model (qwen2.5-max β qwen2.5-max, no changes)
β
WEB SEARCH TEST 1: gpt-4 with web search
β
WEB SEARCH TEST 2: Qwen_Think with web search
π TEST SUMMARY
Total Tests: 7
Passed: 7
Failed: 0
Pass Rate: 100.0%
π ALL TESTS PASSED! π
# Health check
curl http://localhost:8096/health
# List models
curl http://localhost:8096/v1/models
# Simple request
python3 scripts/send_request.shqwen-api/
βββ app/
β βββ core/
β β βββ openai.py # OpenAI endpoints (/chat/completions)
β βββ middleware/
β β βββ openapi_validator.py # Request/response validation
β βββ model_router.py # β Intelligent routing + tool injection
β βββ providers/
β β βββ base.py
β β βββ provider_factory.py
β β βββ qwen_simple_proxy.py
β βββ utils/
β βββ logger.py
β βββ request_tracker.py
βββ scripts/
β βββ setup.sh # Environment setup
β βββ extract_bearer_token.py # Playwright authentication
β βββ start.sh # Start server
β βββ deploy.sh # All-in-one deployment
β βββ send_request.sh # Test script
βββ start.py # Server entry point (replaces main.py)
βββ test_all_routing_scenarios.py # Comprehensive test suite
βββ requirements.txt
βββ qwen.json # OpenAPI spec for validation
βββ README.md # This file
# .env file configuration
LISTEN_PORT=8096 # Server port
ANONYMOUS_MODE=true # Allow any API key
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR
# Optional: Runtime settings
QWEN_EMAIL=your-email@example.com
QWEN_PASSWORD=your-passwordEdit start.py to customize:
# Port binding
port = int(os.getenv("LISTEN_PORT", "8096"))
# Log level
log_level = os.getenv("LOG_LEVEL", "INFO").upper()
# Worker configuration (Granian)
workers = 1 # Increase for production
threads = 1 # HTTP/1.1 threadsEdit app/model_router.py to customize aliases:
MODEL_CONFIGS = {
"qwen_research": {
"actual_model": "qwen-deep-research",
"tools": [], # No tools
"max_tokens": None,
},
"qwen_think": {
"actual_model": "qwen3-235b-a22b-2507",
"tools": ["web_search"],
"max_tokens": 81920, # Extended context
},
# Add your custom aliases here...
}# Check port availability
lsof -i :8096
# Check logs
tail -f logs/server.log
# Verify Python version
python3 --version # Should be 3.11+
# Reinstall dependencies
pip install -r requirements.txt --force-reinstall# Re-extract token
rm .qwen_bearer_token
python3 scripts/extract_bearer_token.py
# Check token validity
cat .qwen_bearer_token# Ensure server is running
curl http://localhost:8096/health
# Check server logs for errors
tail -20 logs/server.log
# Restart server
pkill -f "python3 start.py"
bash scripts/start.sh# Check model router logs
grep "Auto-injecting tools" logs/server.log
# Verify model alias resolution
grep "Model transformation" logs/server.log
# Expected output:
# π Model transformation: gpt-4 β qwen3-max-latest
# π οΈ Auto-injecting tools for gpt-4: ['web_search']POST /v1/chat/completions
Request:
{
"model": "Qwen_Think",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"stream": false,
"max_tokens": 1000,
"temperature": 0.7
}Response:
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1234567890,
"model": "qwen3-235b-a22b-2507",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}]
}GET /v1/models
Response:
{
"object": "list",
"data": [
{
"id": "qwen3-max-latest",
"object": "model",
"created": 1234567890,
"owned_by": "qwen"
},
{
"id": "qwen-deep-research",
"object": "model",
"created": 1234567890,
"owned_by": "qwen"
}
]
}GET /health
Response:
{
"status": "ok",
"service": "qwen-ai2api-server",
"version": "0.2.0"
}Contributions welcome! Please:
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing) - Open Pull Request
MIT License - see LICENSE file for details.
- Qwen Team - For the amazing language models
- OpenAI - For the API specification
- FastAPI - For the excellent web framework
- Playwright - For browser automation
- β¨ Added intelligent model routing
- β¨ Implemented 4 model aliases (Qwen, Qwen_Research, Qwen_Think, Qwen_Code)
- β¨ Auto-tool injection (web_search)
- β¨ OpenAPI validation middleware
- β¨ Comprehensive test suite
- π Fixed streaming response handling
- π Complete documentation
- π Initial release
- β OpenAI-compatible endpoints
- β Bearer token authentication
- β Basic request/response handling
Made with β€οΈ for the AI community