Braintrust Dev Deep Dive (Personal Lab)

This repository documents my personal learning journey exploring Braintrust end-to-end: simple evals → multi-step agents → observability with OpenTelemetry → CI/CD.

⚠️ Note: This is not an official Microsoft or Braintrust project.

The Vision of this stems from this Lab I created With the help of ChatGPT for me to run thru! Do read it for context of what this GitHub Repo is here for. This Repo is the How the doc helps you with the WHY and the WHAT. FabsBraintrustE2ELabFromBasicToAdvanced.pdf

Goals

Build and evaluate AI agents using Braintrust evals
Compare local (Ollama) vs frontier (OpenAI, Anthropic) models
Add observability with OpenTelemetry (BraintrustSpanProcessor + project scoping)
Optionally mirror telemetry to Azure Application Insights
Automate evals in GitHub Actions CI/CD

Setup

git clone https://github.com/fabianwilliams/braintrustdevdeepdive
cd braintrustdevdeepdive
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.template .env  # then fill values

Create a .env file:

BRAINTRUST_API_KEY=sk-...
PROJECT_NAME=Fabs27Sep25DeepDive
OPENAI_API_KEY=your_openai_key 
AZURE_MONITOR_CONNECTION_STRING="your_conn_str_here"

Usage

LLM Mode

export HELLO_MODE=llm
braintrust eval evals/eval_hello.py

LOCAL model

export USE_LOCAL_MODEL=true
export LOCAL_OPENAI_MODEL="llama3.3:70b"
braintrust eval evals/eval_hello.py

Multi Step Agent Eval

braintrust eval evals/eval_trip.py

Observability

python observability/otel_setup.py

CI/CD

Add BRAINTRUST_API_KEY to GitHub Secrets Open a PR or push to main to trigger the eval workflow

Runs and agents are logged to Braintrust (scoped to $PROJECT_NAME). Optionally forward to Azure Monitor via AZURE_MONITOR_CONNECTION_STRING.

Explore traces in Braintrust UI or Azure Monitor.
Submit PRs to see GitHub Actions CI run evals automatically.

This repo is meant to document experiments and invite collaboration. Feedback and forks are welcome!

Folder Structure

evals/
  eval_hello.py        # Hello evaluation (string match + LLM variant)
  eval_trip.py         # Multi-step agent eval with LLMClassifier
agents/
  plan_trip.py         # Agent logic (decision -> tool -> judge -> compose)
observability/
  otel_setup.py        # OTel config (Braintrust + optional Azure)
.github/workflows/run-evals.yml
requirements.txt
.env.template
.gitignore
AGENT_GUIDE.md

Agent Coding Guide: Braintrust Lab Context

This document brings coding assistants up to speed on the current plan.

Project Context

Repo: braintrustdevdeepdive
Language: Python 3.10+
Environment: MacOS, VS Code, Ollama local models, Docker Desktop
Core libraries: braintrust, autoevals, openai, opentelemetry-sdk

Objectives

Create & run evals (Hello World → multi-step agent).
Integrate local (Ollama) and frontier (OpenAI/Anthropic) models.
Add OpenTelemetry observability (Braintrust backend + Azure Monitor).
Automate evals in GitHub Actions CI/CD.

Development Guidelines

Always use .env for secrets and .gitignore to exclude it.
Eval scripts should be named eval_*.py.
Instrument spans with OTel semantic conventions (ai.model.id, ai.prompt, ai.response).
For multi-step agents, log custom traces (braintrust.trace()).

Coding Tasks for Agents

Generate eval datasets (synthetic or task-specific).
Wrap OpenAI / local models with Braintrust proxy.
Add/modify CI workflows (.github/workflows/run-evals.yml).
Extend observability by tagging spans with context (gen_ai.conversation.id, gen_ai.feedback).

Collaboration Mode

I will experiment interactively and may diverge from this plan.
Coding assistants should suggest eval cases, observability improvements, and error handling.
Assume all work should be CI/CD ready with eval regression checks.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.venv		.venv
agents		agents
evals		evals
images		images
observability		observability
scoring		scoring
.DS_Store		.DS_Store
.env.email.template		.env.email.template
.env.template		.env.template
.gitignore		.gitignore
AGENT_GUIDE.md		AGENT_GUIDE.md
CLAUDE.md		CLAUDE.md
EMAIL_MANAGEMENT_README.md		EMAIL_MANAGEMENT_README.md
Experiment_Alpha_EmailManagementAgent.md		Experiment_Alpha_EmailManagementAgent.md
Experiment_Bravo_LogsToEvals.md		Experiment_Bravo_LogsToEvals.md
FabsBraintrustE2ELabFromBasicToAdvanced.pdf		FabsBraintrustE2ELabFromBasicToAdvanced.pdf
IMAGE_CAPTURE_INSTRUCTIONS.md		IMAGE_CAPTURE_INSTRUCTIONS.md
README.md		README.md
SOP_EmailAgentExperiment.md		SOP_EmailAgentExperiment.md
demo_email_agent.py		demo_email_agent.py
requirements.txt		requirements.txt
run_email_evals.py		run_email_evals.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Braintrust Dev Deep Dive (Personal Lab)

Goals

Setup

Usage

LLM Mode

LOCAL model

Multi Step Agent Eval

Observability

CI/CD

Folder Structure

Agent Coding Guide: Braintrust Lab Context

Project Context

Objectives

Development Guidelines

Coding Tasks for Agents

Collaboration Mode

About

Uh oh!

Releases

Packages

Languages

fabianwilliams/braintrustdevdeepdive

Folders and files

Latest commit

History

Repository files navigation

Braintrust Dev Deep Dive (Personal Lab)

Goals

Setup

Usage

LLM Mode

LOCAL model

Multi Step Agent Eval

Observability

CI/CD

Folder Structure

Agent Coding Guide: Braintrust Lab Context

Project Context

Objectives

Development Guidelines

Coding Tasks for Agents

Collaboration Mode

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages