Skip to content

orvi2014/Baar-Core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Baar-Core

Semantic routing + a hard financial kill-switch for LLM agents.

Never get surprised by another OpenAI or Anthropic bill.

PyPI version License: MIT Python 3.10+ Scientific Validation

pip install baar-core

baar-core is the PyPI package name. Baar-Core is the project.


Why Baar-Core?

Production LLM agents have a dangerous habit:

  • Simple queries still get sent to expensive models.
  • One runaway loop turns your $0.10 budget into $8+ overnight.
  • The invoice lands before you know which step burned the budget.

Most routers optimize averages. Baar-Core ships a hard Zero-Call Financial Kill-Switch: enforce a strict USD cap, score complexity, route cheap vs capable — and if the next safe call would exceed what’s left, reject locally before a single provider request. $0 spent. Zero network calls.

What you get

  • Smart semantic routing — Easy work → cheap model; hard work → capable model.
  • Budget-constrained downgrade — If the big model would break the budget, fall back to the small one so the turn can still finish.
  • True zero-call kill-switch — Even the cheap model unaffordable? Fail fast — no completion call, no surprise line item.
  • Offline Safety — If your budget is $0, baar-core won't even attempt a DNS lookup for the LLM provider. It fails instantly in your local environment.

No surprise invoices. Stronger stance against runaway and adversarial “denial of wallet” patterns. Quality where it matters (reasoning, coding, agents) because hard tasks still reach the capable tier when the budget allows.


How it works

graph TD
    A[User task] --> B{Semantic complexity router}
    B -- Low complexity --> C[Cheap model]
    B -- High complexity --> D{Budget check}
    D -- Affordable --> E[Capable model]
    D -- Too expensive --> F[Downgrade to cheap]
    C --> G[Spend tracking]
    E --> G
    F --> G
    G --> H[Response]
Loading
  1. Complexity scoring — Fast signal for cheap vs expensive route.
  2. Budget-aware choice — Remaining budget checked before committing to the expensive path.
  3. Local rejection — Exhausted or unsafe to call? Stop before the wire.

Benchmarks

Mock benchmark (deterministic routing sanity)

Command:

baar-bench \
  --dataset all \
  --limit 200 \
  --budget 10 \
  --mock \
  --value-policy none \
  --complexity-threshold 0.80 \
  --coding-threshold 0.75 \
  --small-exploration-rate 0.0 \
  --seed 42
Dataset Strategy Accuracy Total cost vs always-big
MMLU Always big 50.5% $1.000500
MMLU Baar-Core 69.5% $0.157000 92.8% cheaper
GSM8K Always big 50.5% $1.000500
GSM8K Baar-Core 71.5% $0.128500 93.3% cheaper
HumanEval Always big 61.6% $1.000500
HumanEval Baar-Core 79.9% $0.614000 65.3% cheaper

Live benchmark (small subset sanity check)

Command:

baar-bench \
  --dataset all \
  --limit 10 \
  --budget 2 \
  --value-policy none \
  --complexity-threshold 0.80 \
  --coding-threshold 0.75 \
  --small-exploration-rate 0.0 \
  --seed 42
Dataset Strategy Accuracy Total cost vs always-big
MMLU Always big 50.0% $0.002337
MMLU Baar-Core 60.0% $0.000137 93.3% cheaper
GSM8K Always big 60.0% $0.027615
GSM8K Baar-Core 20.0% $0.002097 93.3% cheaper
HumanEval Always big 0.0% $0.032125
HumanEval Baar-Core 0.0% $0.002743 93.3% cheaper

Live results can vary significantly by provider/model quality, API reliability, and prompt behavior. Use live runs as environment-specific checks, and use mock runs for reproducible routing/cost trade-off iteration. In mock mode, routing classifier calls are simulated separately from execution calls so routing and spend behavior can be measured without external APIs.


Quick start

from baar import BAARRouter

router = BAARRouter(budget=0.10)
print(router.chat("What is the capital of France?"))          # → usually cheap model
print(router.chat("Write an optimized CUDA matmul kernel."))  # → capable model if affordable

# Kill-switch: budget too low for any safe call → blocked before the API
tight = BAARRouter(budget=0.00001, min_cost_threshold=0.001)
try:
    tight.chat("Any prompt")
except RuntimeError as e:
    print("Blocked safely:", e)  # zero completion calls, $0 spent

Works with any LiteLLM-supported provider (OpenAI, Anthropic, Groq, Together, Ollama, OpenRouter, …).


Resilience

baar-stress

Adversarial-style checks (complexity games, tight budget). Baar-Core is designed with OWASP LLM Top 10 style risks in mind — including unbounded consumption. Details: RESEARCH.md.


Telemetry Summary CLI

If you enable telemetry_jsonl_path on BAARRouter, summarize logs with:

baar-telemetry path/to/telemetry.jsonl

This prints reject rate, failover rate, total spend, and per-model spend distribution.


Configuration

Default complexity_threshold=0.80 routes more traffic to the cheap model than 0.65 did; the effective threshold also rises with budget utilization so BIG is harder to justify as spend accumulates. Tighten or loosen with complexity_threshold if your workload skews very easy or very hard.

Additional tuning knobs:

  • min_cost_threshold — hard local floor for zero-call rejection before routing/completion.
  • routing_task_char_limit — routing view budget for long prompts; router samples head + middle + tail segments.
  • --coding-threshold (benchmark CLI) — separate threshold for coding-heavy sets like HumanEval.
router = BAARRouter(
    budget=0.10,
    small_model="gpt-4o-mini",
    big_model="gpt-4o",
    complexity_threshold=0.80,
    min_cost_threshold=0.001,
    routing_task_char_limit=900,
)

License & research

MITLICENSE.

Architecture, validation notes, and security mapping: RESEARCH.md.

About

Budget-Aware Agentic Routing (BAAR) — Intelligent LLM model selection with a zero-call financial kill-switch. Save 90% on costs without losing accuracy.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Languages