A production-ready multi-LLM agent using Vertex AI Agent Builder (ADK) with Gemini as the orchestrator and support for OpenAI, Claude, and Grok β including real-time token + cost tracking.
Most examples show single-model agents.
This project demonstrates a real-world multi-model architecture:
- π§ Use Gemini (Vertex AI) as the default (fast + cost-efficient)
- π Call OpenAI / Claude / Grok only when needed
- π° Track token usage + cost per call
- βοΈ Deploy to Vertex AI Agent Engine
- π₯οΈ Develop locally with ADK Dev UI
π This is a practical production pattern for modern AI systems.
User
β
Vertex Agent Engine
β
Gemini (primary orchestrator)
β
Tool routing layer
βββ OpenAI
βββ Claude
βββ Grok
β
Response + cost tracking
- β Multi-LLM orchestration (Gemini + OpenAI + Claude + Grok)
- β Tool-based routing
- β Per-call token + cost tracking
- β
Local development UI (
adk web) - β Vertex Agent Engine deployment
- β Clean, extensible Python structure
Here is your rewritten email...
[Claude: 812 tokens | $0.0021]
multi_llm/
βββ __init__.py
βββ agent.py # shim β app.agent
βββ app/
β βββ agent.py
β βββ tools.py
β βββ config.py
βββ requirements.txt
βββ .env
- Python 3.10+
- Google Cloud project with billing enabled
- Vertex AI + Cloud Storage APIs enabled
- GCS staging bucket (e.g.
gs://my-gcp-bucket) - Authenticated locally:
gcloud auth application-default login-
API keys with available credits / billing enabled for:
- OpenAI
- Anthropic (Claude)
- xAI (Grok)
To deploy and run this project, the following roles are required:
roles/aiplatform.user(Vertex AI access)roles/storage.adminon the staging bucket (e.g.gs://ai_fnol)
If you need to create the bucket:
roles/storage.adminat the project level (temporary is fine)
By default, Vertex uses a managed service agent:
service-PROJECT_NUMBER@gcp-sa-aiplatform-re.iam.gserviceaccount.com
This works out of the box.
For production use, you should migrate to a custom service account with least-privilege access.
python -m venv ~/multi_llm_venv
source ~/multi_llm_venv/bin/activatepip install -r requirements.txtcp .env.example .envFill in:
GOOGLE_CLOUD_PROJECT=PROJECT_NAME
GOOGLE_CLOUD_LOCATION=GCP_REGION
GOOGLE_CLOUD_STAGING_BUCKET=gs://GCP_BUCKET
OPENAI_API_KEY=...
ANTHROPIC_API_KEY=...
XAI_API_KEY=...adk web multi_llmOpen:
http://127.0.0.1:8000/dev-ui
adk deploy agent_engine \
--project="PROJECT_NAME" \
--region="GCP_REGION" \
--display_name="multi_llm" \
--staging_bucket="gs://GCP_BUCKET" \
multi_llmimport vertexai
from vertexai import agent_engines
vertexai.init(project="PROJECT_NAME", location="GCP_REGION")
agent = agent_engines.get(
"projects/PROJECT_NAME/locations/GCP_REGION/reasoningEngines/Id"
)
print(agent.query(input="Compare Gemini vs OpenAI vs Claude"))Each external model call returns:
[OpenAI: 496 tokens | $0.0042]
Also available via:
get_usage_summary()
- Gemini is used as the default model
- External models are used selectively
- Keep
.venvoutside the repo to avoid deployment issues
- Session memory (conversation state)
- Long-term memory (user preferences)
- Cost-aware routing
- RAG (retrieval)
- Secret Manager integration
- Frontend / API integrations
- Migrate deployment and runtime authentication to a dedicated least-privilege service account instead of the local user IAM account
Active / Iterating toward production-grade system