Runtime containment for LLM systems. Enforce cost, step, and retry limits before the call reaches the model.
veronica-core is the kernel. veronica is the control plane.
pip install veronica-corefrom veronica_core import ExecutionContext, ExecutionConfig, WrapOptions
def simulated_llm_call(prompt: str) -> str:
return f"response to: {prompt}"
config = ExecutionConfig(
max_cost_usd=1.00, # hard cost ceiling per chain
max_steps=50, # hard step ceiling
max_retries_total=10,
timeout_ms=0,
)
with ExecutionContext(config=config) as ctx:
for i in range(3):
decision = ctx.wrap_llm_call(
fn=lambda: simulated_llm_call(f"prompt {i}"),
options=WrapOptions(
operation_name=f"generate_{i}",
cost_estimate_hint=0.04,
),
)
if decision.name == "HALT":
break
snap = ctx.get_graph_snapshot()
print(snap["aggregates"])
# {"total_cost_usd": 0.12, "total_llm_calls": 3, ...}SDK-level injection (no per-call changes):
from veronica_core.patch import patch_openai
from veronica_core import veronica_guard
patch_openai() # patches openai.chat.completions.create
@veronica_guard(max_cost_usd=1.0, max_steps=20)
def run_agent(prompt: str) -> str:
from openai import OpenAI
return OpenAI().chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
).choices[0].message.content- Budget enforcement -- hard cost ceiling per chain, HALT before the call is made
- Step limits -- bounded recursion depth per entity
- Circuit breaker -- per-entity fail counts, COOLDOWN state, configurable threshold
- Distributed circuit breaker -- Redis-backed cross-process failure isolation with Lua-atomic transitions
- Failure classification -- predicate-based exception filtering (ignore 400s, count 500s)
- Token budget -- cumulative output/total token ceiling with DEGRADE zone
- Retry containment -- amplification control with jitter and backoff
- Adaptive ceiling -- auto-adjusts budget based on SafetyEvent history
- Time-aware policy -- weekend/off-hours budget multipliers
- Semantic loop detection -- word-level Jaccard similarity, no ML dependencies
- Input compression -- gates oversized inputs before they reach the model
- Execution graph -- typed node lifecycle, amplification metrics, divergence detection
- Degradation ladder -- 4-tier graceful degradation (model_downgrade, context_trim, rate_limit, halt)
- Multi-agent context -- parent-child ExecutionContext hierarchy with cost propagation
- SafetyEvent -- structured evidence for every non-ALLOW decision (SHA-256 hashed, no raw prompts)
- Security containment -- PolicyEngine, AuditLog, ed25519 policy signing, red-team regression suite
- ASGI/WSGI middleware -- per-request ExecutionContext via ContextVar, 429 on HALT
- Auto cost calculation -- pricing table for OpenAI, Anthropic, Google models
No required dependencies. Works with any LLM provider.
| Framework | Adapter | Example |
|---|---|---|
| OpenAI SDK | patch_openai() |
examples/integrations/openai_sdk/ |
| Anthropic SDK | patch_anthropic() |
-- |
| LangChain | VeronicaCallbackHandler |
examples/integrations/langchain/ |
| AG2 (AutoGen) | CircuitBreakerCapability |
examples/ag2_circuit_breaker.py |
| LlamaIndex | VeronicaLlamaIndexHandler |
-- |
| CrewAI | VeronicaCrewAIListener |
examples/integrations/crewai/ |
| LangGraph | VeronicaLangGraphListener |
-- |
| ASGI/WSGI | VeronicaASGIMiddleware |
docs/middleware.md |
| ROS2 | SafetyMonitor / OperatingMode |
examples/ros2/ |
veronica-core integrates with AG2 via AgentCapability. CircuitBreakerCapability wraps AG2 agents with failure detection and automatic recovery.
Working example: PR #2430
Current integration uses monkey-patching as AG2 does not expose before/after hooks on generate_reply. See the PR thread for context.
| File | Description |
|---|---|
| basic_usage.py | Budget enforcement and step limits |
| execution_context_demo.py | Step limit, budget, abort, circuit, divergence |
| adaptive_demo.py | Adaptive ceiling, cooldown, direction lock, anomaly, replay |
| ag2_circuit_breaker.py | AG2 agent-level circuit breaker |
| runaway_loop_demo.py | Runaway execution containment |
| budget_degrade_demo.py | DEGRADE before HALT |
| token_budget_minimal_demo.py | Token ceiling enforcement |
Application / Agent Framework
|
veronica-core <-- enforcement boundary
|
LLM Provider (OpenAI, Anthropic, etc.)
Each call passes through a ShieldPipeline of registered hooks. Any hook may emit DEGRADE or HALT. A HALT blocks the call and emits a SafetyEvent. The caller receives the decision.
veronica-core does not prescribe how the caller handles DEGRADE or HALT. It enforces that the evaluation occurs, the decision is recorded, and the call does not proceed past HALT.
For detailed architecture, see docs/architecture.md.
Policy enforcement is at the process boundary (argv-level). This is not an OS-level sandbox.
Includes a 20-scenario red-team regression suite covering exfiltration, credential hunt, workflow poisoning, and persistence attacks. All scenarios blocked on every CI run.
Details: docs/SECURITY_CONTAINMENT_PLAN.md | docs/THREAT_MODEL.md | docs/SECURITY_CLAIMS.md
1780 tests, 92% coverage, zero required dependencies. Python 3.10+.
Adaptive budget control: docs/adaptive-control.md
- Formal containment guarantee documentation
ExecutionGraphextensibility hooks for external integrationsPlannerProtocol: minimal Python Protocol defining the Planner/Executor contract
pip install veronica-coreOptional extras:
pip install veronica-core[redis] # DistributedCircuitBreaker, RedisBudgetBackend
pip install veronica-core[otel] # OpenTelemetry exportDevelopment:
git clone https://github.com/amabito/veronica-core
cd veronica-core
pip install -e ".[dev]"
pytestSee CHANGELOG.md.
MIT