A high-performance, cost-optimized LiteLLM Proxy deployment for Google Cloud Run. This setup is designed for enterprise-grade applications requiring model routing, caching, and observability.
- Scalable Serverless: Deploys to Google Cloud Run with optimized 2 vCPU / 4GB RAM specs.
- Enterprise Caching: Built-in Redis integration with TLS support for sub-second latent responses and heavy cost savings.
- Full Observability: Pre-configured for Langfuse, Context7, and Tavily.
- MCP & Skills Support: Ready for Model Context Protocol (MCP) servers and Anthropic-compatible
/skillsendpoints. - Secure by Default:
- Zero hardcoded secrets (Env-var injection with Secret Manager).
- Encrypted database storage with custom salt keys.
- IAM-based invocation control.
- Multi-Model Support: Gemini, Kimi, Z-AI (GLM), MiniMax, Deepinfra, NVIDIA NIM, and GitHub Models.
Choose the deployment path that matches your needs:
Best for: Getting started quickly, testing, proof-of-concept
- β One-click deployment
- β No pre-configuration required
- β Guides you through setup wizard
β οΈ Secrets stored as environment variables (see Option 2 for production)
Guide: Production with Secret Manager
Best for: Production environments, enterprise deployments
- β Secrets stored in Google Secret Manager
- β IAM-based access control
- β Full audit trails
- β Recommended for sensitive workloads
Guide: deploy_gcloud.sh
Best for: Developers who prefer command-line control
- β Full control over deployment
- β Integrates with CI/CD pipelines
- β Custom deployment scripts
Ensure you have Docker installed and a .env file based on .env.example.
docker-compose upThis will start:
- LiteLLM Proxy on
http://localhost:4000 - PostgreSQL (local) on port 5432
- Redis (local) on port 6379
(Note: Ensure your Google Cloud project is active and billing is enabled)
We use a streamlined deployment script (deploy_gcloud.sh) for production updates.
Prerequisites:
- Google Cloud SDK installed (
gcloud). - Authenticated session (
gcloud auth login). - Active project set (
gcloud config set project YOUR_PROJECT_ID).
The project includes an automated provisioning and deployment workflow using Google Cloud Secret Manager and Cloud SQL.
Run the interactive provisioning script to set up Cloud SQL, Redis, and Secrets:
./scripts/provision_gcloud.shThis will create/verify:
- Cloud SQL Instance (Postgres)
- Memorystore for Redis
- Required Secrets in Secret Manager
- Service Account and IAM roles
- Ensure you are authenticated:
gcloud auth login - Set your active project:
gcloud config set project YOUR_PROJECT_ID - Run the deployment script:
Note: This script builds the image via Cloud Build and deploys to Cloud Run with secret references.
./deploy_gcloud.sh
For more details on production security, see docs/PRODUCTION-SECRETS.md.
- Production with Secret Manager - Secure production deployment guide
- Production Best Practices - Machine and worker optimization
- Cost Optimization - Google Cloud Run pricing and savings
- Caching Implementation - Redis caching configuration and verification
- Agent Overview - High-level overview of agent support.
- Agent Integration - Connect LangChain, DSPy, AutoGen, etc.
- Quick Start Guide - Step-by-step deployment walkthrough
- Deployment Logic - Deep dive into the automated deployment script
Click the "Use this template" button at the top of the repository to create your own copy. This allows you to customize the configuration and deployment scripts for your specific needs.
The core configuration is split into three files in the config/ directory:
| File | Purpose |
|---|---|
config/local.yaml |
Local testing with manual key injection |
config/prod.yaml |
Cloud Run production, uses os.environ for secrets |
config/dev.yaml |
Docker Compose local development |
- Resources: 2 vCPU / 4GB RAM
- Workers: 8 workers (
--num_workers 8) to match uvicorn to CPU allocation - Database Pooling: Limit set to 20 to prevent connection exhaustion
- Secrets managed via Google Secret Manager in production (see docs/PRODUCTION-SECRETS.md)
LITELLM_SALT_KEYused for internal database encryptionLITELLM_MASTER_KEYfor authenticated proxy access
Grant access to the service using the Google Cloud SDK:
gcloud run services add-iam-policy-binding litellm-proxy \
--member="user:NAME@DOMAIN.COM" \
--role="roles/run.invoker" \
--region="us-central1" \
--project="YOUR_PROJECT_ID"Built with β€οΈ by Qredence.