Skip to content

LiteLLM Proxy deployment for Google Cloud Run with enterprise caching and observability

License

Notifications You must be signed in to change notification settings

Qredence/litellm-cloudrun-deploy

Repository files navigation

litellm-cloudrun-deploy πŸš€

License: MIT Build Status Documentation

A high-performance, cost-optimized LiteLLM Proxy deployment for Google Cloud Run. This setup is designed for enterprise-grade applications requiring model routing, caching, and observability.

🌟 Key Features

  • Scalable Serverless: Deploys to Google Cloud Run with optimized 2 vCPU / 4GB RAM specs.
  • Enterprise Caching: Built-in Redis integration with TLS support for sub-second latent responses and heavy cost savings.
  • Full Observability: Pre-configured for Langfuse, Context7, and Tavily.
  • MCP & Skills Support: Ready for Model Context Protocol (MCP) servers and Anthropic-compatible /skills endpoints.
  • Secure by Default:
    • Zero hardcoded secrets (Env-var injection with Secret Manager).
    • Encrypted database storage with custom salt keys.
    • IAM-based invocation control.
  • Multi-Model Support: Gemini, Kimi, Z-AI (GLM), MiniMax, Deepinfra, NVIDIA NIM, and GitHub Models.

πŸš€ Deployment Options

Choose the deployment path that matches your needs:

Option 1: One-Click Deploy (Recommended for Testing & Evaluation)

!image

Best for: Getting started quickly, testing, proof-of-concept

  • βœ… One-click deployment
  • βœ… No pre-configuration required
  • βœ… Guides you through setup wizard
  • ⚠️ Secrets stored as environment variables (see Option 2 for production)

Option 2: Production Deployment (Recommended for Production)

Guide: Production with Secret Manager

Best for: Production environments, enterprise deployments

  • βœ… Secrets stored in Google Secret Manager
  • βœ… IAM-based access control
  • βœ… Full audit trails
  • βœ… Recommended for sensitive workloads

Option 3: Manual CLI Deployment

Guide: deploy_gcloud.sh

Best for: Developers who prefer command-line control

  • βœ… Full control over deployment
  • βœ… Integrates with CI/CD pipelines
  • βœ… Custom deployment scripts

πŸš€ Quick Start

1. Local Run (Docker)

Ensure you have Docker installed and a .env file based on .env.example.

docker-compose up

This will start:

  • LiteLLM Proxy on http://localhost:4000
  • PostgreSQL (local) on port 5432
  • Redis (local) on port 6379

2. One-Click Deploy to Cloud Run

(Note: Ensure your Google Cloud project is active and billing is enabled)

3. CLI Deployment

We use a streamlined deployment script (deploy_gcloud.sh) for production updates.

Prerequisites:

  • Google Cloud SDK installed (gcloud).
  • Authenticated session (gcloud auth login).
  • Active project set (gcloud config set project YOUR_PROJECT_ID).

Cloud Deployment

The project includes an automated provisioning and deployment workflow using Google Cloud Secret Manager and Cloud SQL.

1. Provision Infrastructure

Run the interactive provisioning script to set up Cloud SQL, Redis, and Secrets:

./scripts/provision_gcloud.sh

This will create/verify:

  • Cloud SQL Instance (Postgres)
  • Memorystore for Redis
  • Required Secrets in Secret Manager
  • Service Account and IAM roles

2. Deploy

  1. Ensure you are authenticated: gcloud auth login
  2. Set your active project: gcloud config set project YOUR_PROJECT_ID
  3. Run the deployment script:
    ./deploy_gcloud.sh
    Note: This script builds the image via Cloud Build and deploys to Cloud Run with secret references.

For more details on production security, see docs/PRODUCTION-SECRETS.md.


πŸ“– Documentation

πŸ“¦ Use as GitHub Template

Click the "Use this template" button at the top of the repository to create your own copy. This allows you to customize the configuration and deployment scripts for your specific needs.

πŸ›  Configuration

The core configuration is split into three files in the config/ directory:

File Purpose
config/local.yaml Local testing with manual key injection
config/prod.yaml Cloud Run production, uses os.environ for secrets
config/dev.yaml Docker Compose local development

Infrastructure Specs (Cloud Run)

  • Resources: 2 vCPU / 4GB RAM
  • Workers: 8 workers (--num_workers 8) to match uvicorn to CPU allocation
  • Database Pooling: Limit set to 20 to prevent connection exhaustion

πŸ”’ Security

  • Secrets managed via Google Secret Manager in production (see docs/PRODUCTION-SECRETS.md)
  • LITELLM_SALT_KEY used for internal database encryption
  • LITELLM_MASTER_KEY for authenticated proxy access

πŸ” Permissions

Grant access to the service using the Google Cloud SDK:

gcloud run services add-iam-policy-binding litellm-proxy \
    --member="user:NAME@DOMAIN.COM" \
    --role="roles/run.invoker" \
    --region="us-central1" \
    --project="YOUR_PROJECT_ID"

Built with ❀️ by Qredence.

About

LiteLLM Proxy deployment for Google Cloud Run with enterprise caching and observability

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published