Clean, pluggable safeguards for LLM responses. Verify outputs, enforce budgets, escalate to a stronger second opinion, and log everything for observability.
- Budgets: max tokens, cost (USD), latency, reasoning steps.
- Verifiers: categorical judges (hallucination/no_hallucination) with confidence.
- Second opinion: ensemble referee with strict/majority/weighted policies and optional corrected answer.
- Adapters: Verdict (judges, judge→verify) and DSPy (categorical/yes-no).
- Telemetry: JSONL logs + eval2otel-compatible conversion.
- Examples: Ollama-ready judge→verify and ensemble demos.
pip install -e .
# Optional extras
pip install verdict # Verdict adapters / examples
pip install dspy-ai # DSPy adapters / examplesRun a simple demo with heuristic verifiers:
python -m examples.demoRun tests:
python -m unittest -qpython -m scripts.cb_cli --prompt "Explain widgets" --log logs/cb.jsonl \
--budget-yaml budgets.yaml --eval2otel-json out/eval.json
# Optional DSPy judge
python -m scripts.cb_cli --prompt "Explain widgets" --use-dspy --context "widgets" --log logs/cb.jsonl
# Second-opinion (Verdict + Ollama)
python -m scripts.cb_cli --prompt "Explain widgets" --second-opinion \
--model ollama/mistral:7b --api-base http://localhost:11434 \
--temp-judge-a 0.2 --temp-judge-b 0.4 --temp-verify 0.0 --policy strict_pass# Pre-reqs
ollama serve && ollama pull mistral:7b
pip install verdict
MODEL=ollama/mistral:7b OLLAMA_API_BASE=http://localhost:11434 \
python -m examples.verdict_ollama_demoOr build programmatically:
from examples.verdict_integration import build_judge_then_verify_verifier
from circuit_breaker import CircuitBreaker
v = build_judge_then_verify_verifier(model_name="ollama/mistral:7b", api_base="http://localhost:11434")
breaker = CircuitBreaker(generate_fn=your_generate_fn, verifiers=[v], gamma=0.7)
print(breaker("Use the context to answer..."))Combine multiple verifiers for a stronger decision. Strict acceptance only if all pass:
from circuit_breaker import EnsembleReferee, CircuitBreaker
from examples.verdict_integration import build_judge_then_verify_verifier
v1 = build_judge_then_verify_verifier(model_name="ollama/mistral:7b", api_base="http://localhost:11434")
v2 = build_judge_then_verify_verifier(model_name="ollama/mistral:7b", api_base="http://localhost:11434")
referee = EnsembleReferee(
verifiers=[v1, v2],
policy="strict_pass",
min_confidence=0.7,
corrected_answer_fn=lambda r,c: r + "\n[Corrected or caveated answer here]",
)
breaker = CircuitBreaker(generate_fn=your_generate_fn, verifiers=[], referee=referee)CLI demo:
MODEL=ollama/mistral:7b OLLAMA_API_BASE=http://localhost:11434 \
python -m examples.second_opinion_demoPolicies:
- strict_pass: all judges must say no_hallucination
- majority: simple majority
- weighted_majority: compare summed confidences
from circuit_breaker.budgets import load_budget_from_yaml
budgets = load_budget_from_yaml("budgets.yaml")
breaker = CircuitBreaker(..., budgets=budgets)Keys: max_tokens, max_cost_usd, latency_threshold_s, max_reasoning_steps, skip_escalation_if_over_budget.
from circuit_breaker.telemetry import decision_to_eval2otel
eval_obj = decision_to_eval2otel(record, result, operation="chat", system="your-system", request_model="llama3")
# Forward to your eval2otel converter / OpenTelemetry pipelinecircuit_breaker/core.py: CircuitBreaker, models, decisionscircuit_breaker/verifiers.py: base + heuristic verifiers, callable adaptercircuit_breaker/referees.py:EnsembleReferee(second opinion)circuit_breaker/dspy_adapter.py: DSPy judges (Hallucination,YesNo)circuit_breaker/verdict_adapter.py: Verdict adapters (categorical, unit)circuit_breaker/budgets.py: YAML loadercircuit_breaker/logging.py: JSONL loggercircuit_breaker/telemetry.py: eval2otel mappingexamples/*: demos (heuristics, Verdict+Ollama, second opinion)scripts/cb_cli.py: CLI with budgets, DSPy, second-opinion, eval2otel exporttests/*: unit tests
MIT © EvalOps