Skip to content

MukundaKatta/gemini-eval-agent

Repository files navigation

gemini-eval-agent

An LLM-evaluation auditor agent built on Google Cloud Agent Builder (ADK), Gemini 2.5, and the Arize Phoenix MCP server.

Live demo: https://gemini-eval-agent-1029931682737.us-central1.run.app Demo video: https://youtu.be/3q9SFoMsAhE (1:41) License: Apache 2.0

What it does

You ask "is checkout-rag-v2 hallucinating?" or "show me the slowest 10 traces from yesterday." The agent uses the Arize Phoenix MCP tools to inspect projects, traces, experiments, and datasets, then returns a structured verdict (PASS / FAIL / NEEDS REVIEW) with cited trace IDs, evaluator scores, and a concrete next step.

The agent uses the standard Phoenix MCP tool surface (list_projects, list_traces, get_trace_detail, list_experiments, list_datasets, run_evaluation) — same as the official @arizeai/phoenix-mcp. A local stub MCP server ships with the repo so demos run without a Phoenix tenant; flip one flag and the same agent code targets a real tenant.

Architecture

┌─────────────┐  user question      ┌─────────────────────────────┐
│  Streamlit  │ ──────────────────▶ │  ADK LlmAgent (Gemini 2.5)  │
│  dashboard  │                       │  on Vertex AI               │
└─────────────┘ ◀── verdict + cites ─└────┬────────────────────────┘
                                            │ MCPToolset / stdio
                                            ▼
                                   ┌─────────────────────────┐
                                   │  Arize Phoenix MCP      │
                                   │  (stub by default,      │
                                   │  real tenant via flag)  │
                                   └─────────────────────────┘

Try it locally (no Phoenix tenant needed)

git clone https://github.com/MukundaKatta/gemini-eval-agent
cd gemini-eval-agent
python3 -m venv .venv && source .venv/bin/activate
pip install -e .

gcloud auth application-default login
export GOOGLE_CLOUD_PROJECT=your-project
export GOOGLE_GENAI_USE_VERTEXAI=true
export GOOGLE_CLOUD_LOCATION=us-central1

PYTHONPATH=src streamlit run app/dashboard.py

Try it against a real Arize Phoenix tenant

export PHOENIX_BASE_URL=https://your-tenant.phoenix.arize.com
export PHOENIX_API_KEY=...

In the dashboard sidebar, untick "Use stub Phoenix MCP". The agent now spawns the official @arizeai/phoenix-mcp npm package via npx.

Tests

PYTHONPATH=src pytest -q

11 tests cover the stub server and agent wiring.

License

Apache 2.0. Mukunda Katta, independent developer.

About

LLM-evaluation auditor agent on Google Cloud Agent Builder (ADK) + Gemini 2.5 + Arize Phoenix MCP. Apache-2.0.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors