Feature Request: Integrate Headroom for Token-Aware Context Compression
Summary
Integrate a Headroom-compatible compression layer into Agent Zero to reduce excessive token usage from:
- memory recalls
- tool outputs
- browser annotations
- code execution logs
- MCP schemas
- long conversation history
This would improve:
- context window stability
- self-hosted/local LLM usability
- LiteLLM reliability
- long-running agent performance
Motivation
Agent Zero currently experiences several context/token related issues:
As Agent Zero evolves into a long-running autonomous runtime, context growth becomes a critical engineering problem.
Proposed Solution
Introduce a token-aware compression middleware before LiteLLM requests.
Potential integration points:
memory recall
↓
compression layer
↓
token validation
↓
LiteLLM request
Compression targets:
- recalled memory chunks
- tool outputs
- browser DOM dumps
- execution logs
- MCP schemas
Why Headroom
Headroom already provides:
- context compression
- token-aware filtering
- JSON/log/code compression
- LiteLLM compatibility
- proxy mode integration
- semantic compression pipelines
This makes it a strong fit for Agent Zero's runtime architecture.
Possible Integration Approaches
Option 1 — Proxy Mode
Run Headroom as an OpenAI-compatible proxy in front of LiteLLM.
Option 2 — Plugin-Level Compression
Compress memory/tool outputs before injection into runtime state.
Option 3 — LiteLLM Middleware
Add compression hooks directly before LiteLLM completion calls.
Expected Benefits
- fewer ContextWindowExceededError failures
- reduced token cost
- improved local/self-hosted model support
- smaller memory recalls
- better browser automation scalability
- more stable long-running agents
Related Issues
Additional Ideas
Potential observability metrics:
- compression ratio
- tokens saved
- pre/post context size
- memory truncation stats
- tool output compaction stats
Notes
I am currently experimenting with a prototype integration using:
- Agent Zero
- LiteLLM
- Headroom proxy/compression middleware
and would like feedback on the preferred integration direction.
Feature Request: Integrate Headroom for Token-Aware Context Compression
Summary
Integrate a Headroom-compatible compression layer into Agent Zero to reduce excessive token usage from:
This would improve:
Motivation
Agent Zero currently experiences several context/token related issues:
As Agent Zero evolves into a long-running autonomous runtime, context growth becomes a critical engineering problem.
Proposed Solution
Introduce a token-aware compression middleware before LiteLLM requests.
Potential integration points:
Compression targets:
Why Headroom
Headroom already provides:
This makes it a strong fit for Agent Zero's runtime architecture.
Possible Integration Approaches
Option 1 — Proxy Mode
Run Headroom as an OpenAI-compatible proxy in front of LiteLLM.
Option 2 — Plugin-Level Compression
Compress memory/tool outputs before injection into runtime state.
Option 3 — LiteLLM Middleware
Add compression hooks directly before LiteLLM completion calls.
Expected Benefits
Related Issues
Additional Ideas
Potential observability metrics:
Notes
I am currently experimenting with a prototype integration using:
and would like feedback on the preferred integration direction.