Feature Request: Integrate Headroom for Token-Aware Context Compression

# Feature Request: Integrate Headroom for Token-Aware Context Compression

## Summary

Integrate a Headroom-compatible compression layer into Agent Zero to reduce excessive token usage from:

* memory recalls
* tool outputs
* browser annotations
* code execution logs
* MCP schemas
* long conversation history

This would improve:

* context window stability
* self-hosted/local LLM usability
* LiteLLM reliability
* long-running agent performance

---

## Motivation

Agent Zero currently experiences several context/token related issues:

* ContextWindowExceededError in memory extensions (#1413)
* oversized tool outputs (#833)
* embedding failures caused by oversized memory chunks (#1436)
* LiteLLM token overflow during utility model calls
* high token usage with browser automation and MCP tools

As Agent Zero evolves into a long-running autonomous runtime, context growth becomes a critical engineering problem.

---

## Proposed Solution

Introduce a token-aware compression middleware before LiteLLM requests.

Potential integration points:

```python
memory recall
    ↓
compression layer
    ↓
token validation
    ↓
LiteLLM request
```

Compression targets:

* recalled memory chunks
* tool outputs
* browser DOM dumps
* execution logs
* MCP schemas

---

## Why Headroom

Headroom already provides:

* context compression
* token-aware filtering
* JSON/log/code compression
* LiteLLM compatibility
* proxy mode integration
* semantic compression pipelines

This makes it a strong fit for Agent Zero's runtime architecture.

---

## Possible Integration Approaches

### Option 1 — Proxy Mode

Run Headroom as an OpenAI-compatible proxy in front of LiteLLM.

### Option 2 — Plugin-Level Compression

Compress memory/tool outputs before injection into runtime state.

### Option 3 — LiteLLM Middleware

Add compression hooks directly before LiteLLM completion calls.

---

## Expected Benefits

* fewer ContextWindowExceededError failures
* reduced token cost
* improved local/self-hosted model support
* smaller memory recalls
* better browser automation scalability
* more stable long-running agents

---

## Related Issues

* #1413
* #833
* #1436
* #1382
* #1522

---

## Additional Ideas

Potential observability metrics:

* compression ratio
* tokens saved
* pre/post context size
* memory truncation stats
* tool output compaction stats

---

## Notes

I am currently experimenting with a prototype integration using:

* Agent Zero
* LiteLLM
* Headroom proxy/compression middleware

and would like feedback on the preferred integration direction.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Integrate Headroom for Token-Aware Context Compression #1673

Feature Request: Integrate Headroom for Token-Aware Context Compression

Summary

Motivation

Proposed Solution

Why Headroom

Possible Integration Approaches

Option 1 — Proxy Mode

Option 2 — Plugin-Level Compression

Option 3 — LiteLLM Middleware

Expected Benefits

Related Issues

Additional Ideas

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Feature Request: Integrate Headroom for Token-Aware Context Compression #1673

Description

Feature Request: Integrate Headroom for Token-Aware Context Compression

Summary

Motivation

Proposed Solution

Why Headroom

Possible Integration Approaches

Option 1 — Proxy Mode

Option 2 — Plugin-Level Compression

Option 3 — LiteLLM Middleware

Expected Benefits

Related Issues

Additional Ideas

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions