-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem
The current summarizer produces thin, low-quality summaries. In testing, a 142-message conversation produced a 474-char summary. The summarizer input is truncated crudely (tool results to 200 chars, 60k total cap, drop-the-middle strategy).
With observational memory carrying the structured facts, the summary's job becomes simpler — just provide narrative coherence. But even for that reduced role, the current implementation has issues:
- Summarizing at 80% capacity means the input is already huge and may exceed the summarizer's own context window
- Single-pass summarization of 100+ messages loses important narrative threads
- No quality validation of the summary output
- No option to use a different (faster/cheaper) model for summarization
Improvements
1. Lower compaction threshold to 60%
Summarize earlier when the input is smaller and more manageable. A 60k token conversation produces better summaries than a 190k one. The trade-off is more frequent compaction, but each compaction is higher quality.
2. Aider-style split-head/preserve-tail
Instead of summarizing the entire conversation, split at ~50% and only summarize the older head. Recent messages stay verbatim. This preserves immediate context while compressing the past.
[system prompt] + [summary of older half] + [working memory block] + [recent messages verbatim]
Aider recurses up to depth=3 for very long conversations (summary of summaries).
3. Configurable summarizer model
New config option: compaction.model — defaults to the main model but can be set to a faster/cheaper model. Aider uses weak_model for summarization. For Ollama, this could be a smaller local model that handles summarization well (e.g., qwen2.5:7b even if the main model is glm-5:cloud).
4. Summary quality validation
After summarization, check that the summary mentions key entities from the conversation:
- File paths that were modified
- The current task/goal
- Recent errors or blockers
If the summary fails validation (missing key entities), fall back to keeping more recent messages verbatim rather than trusting the bad summary.
Key files
src/agent/compaction.ts—summarizeConversation(),buildSummarizerInput()src/config/schema.ts— CompactionSchema (addmodelfield)
Research references
docs/context-compaction-research.md— Aider, Cline, opencode comparison- Aider: split at 50%, summarize head, preserve tail, recursive up to depth=3, uses weak model
- Cline: 9-section structured summary prompt, file read deduplication (30%+ savings)
- opencode: whole-conversation single-pass (same as our current approach)
Related
- Follows from fix: redesign compaction — never alter chat history, fix token accuracy, use subagent for summarization #61 (compaction redesign)
- Benefits from observational memory (structured facts reduce pressure on summary quality)