This repository documents my personal learning journey exploring Braintrust end-to-end: simple evals → multi-step agents → observability with OpenTelemetry → CI/CD.
The Vision of this stems from this Lab I created With the help of ChatGPT for me to run thru! Do read it for context of what this GitHub Repo is here for. This Repo is the How the doc helps you with the WHY and the WHAT. FabsBraintrustE2ELabFromBasicToAdvanced.pdf
- Build and evaluate AI agents using Braintrust evals
- Compare local (Ollama) vs frontier (OpenAI, Anthropic) models
- Add observability with OpenTelemetry (
BraintrustSpanProcessor+ project scoping) - Optionally mirror telemetry to Azure Application Insights
- Automate evals in GitHub Actions CI/CD
git clone https://github.com/fabianwilliams/braintrustdevdeepdive
cd braintrustdevdeepdive
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.template .env # then fill valuesCreate a .env file:
BRAINTRUST_API_KEY=sk-...
PROJECT_NAME=Fabs27Sep25DeepDive
OPENAI_API_KEY=your_openai_key
AZURE_MONITOR_CONNECTION_STRING="your_conn_str_here" export HELLO_MODE=llm
braintrust eval evals/eval_hello.pyexport USE_LOCAL_MODEL=true
export LOCAL_OPENAI_MODEL="llama3.3:70b"
braintrust eval evals/eval_hello.pybraintrust eval evals/eval_trip.pypython observability/otel_setup.pyAdd BRAINTRUST_API_KEY to GitHub Secrets Open a PR or push to main to trigger the eval workflow
Runs and agents are logged to Braintrust (scoped to $PROJECT_NAME). Optionally forward to Azure Monitor via AZURE_MONITOR_CONNECTION_STRING.
- Explore traces in Braintrust UI or Azure Monitor.
- Submit PRs to see GitHub Actions CI run evals automatically.
This repo is meant to document experiments and invite collaboration. Feedback and forks are welcome!
evals/
eval_hello.py # Hello evaluation (string match + LLM variant)
eval_trip.py # Multi-step agent eval with LLMClassifier
agents/
plan_trip.py # Agent logic (decision -> tool -> judge -> compose)
observability/
otel_setup.py # OTel config (Braintrust + optional Azure)
.github/workflows/run-evals.yml
requirements.txt
.env.template
.gitignore
AGENT_GUIDE.md
This document brings coding assistants up to speed on the current plan.
- Repo:
braintrustdevdeepdive - Language: Python 3.10+
- Environment: MacOS, VS Code, Ollama local models, Docker Desktop
- Core libraries:
braintrust,autoevals,openai,opentelemetry-sdk
- Create & run evals (Hello World → multi-step agent).
- Integrate local (Ollama) and frontier (OpenAI/Anthropic) models.
- Add OpenTelemetry observability (Braintrust backend + Azure Monitor).
- Automate evals in GitHub Actions CI/CD.
- Always use
.envfor secrets and.gitignoreto exclude it. - Eval scripts should be named
eval_*.py. - Instrument spans with OTel semantic conventions (
ai.model.id,ai.prompt,ai.response). - For multi-step agents, log custom traces (
braintrust.trace()).
- Generate eval datasets (synthetic or task-specific).
- Wrap OpenAI / local models with Braintrust proxy.
- Add/modify CI workflows (
.github/workflows/run-evals.yml). - Extend observability by tagging spans with context (
gen_ai.conversation.id,gen_ai.feedback).
- I will experiment interactively and may diverge from this plan.
- Coding assistants should suggest eval cases, observability improvements, and error handling.
- Assume all work should be CI/CD ready with eval regression checks.