-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Feature Request: Background Task Output Distillation
Pre-submission checklist
- I have searched existing issues and feature requests for duplicates
- I have read the README
Feature description
When a background task completes and the parent agent calls background_output, formatTaskResult() fetches all messages from the child session and concatenates every text, reasoning, and tool_result block into a single blob with no size cap. This raw output is injected directly into the parent session's context.
With parallel background agents (the intended use pattern), this causes rapid context window exhaustion. In a real session I debugged, 4 parallel explore agents returned ~60K tokens of combined output in a single batch, contributing to a 260K token spike that hit the 200K provider limit.
The problem in code
formatTaskResult() (dist/index.js, ~line 48776):
- Calls
client.session.messages()for the child session - Filters for assistant + tool messages
- Extracts ALL text/reasoning/tool_result parts
- Joins with
\n\n— no truncation, no summarization, no cap
background_output is not in the TRUNCATABLE_TOOLS allowlist, so the tool-output-truncator hook skips it by default.
Existing partial workarounds
1. experimental.truncate_all_tool_outputs: true — caps each tool output at 50K tokens. Helps but doesn't prevent the blowup: 4 parallel background agents × 50K = 200K tokens before system prompt and history are counted. Still exceeds a 200K limit.
2. DCP recovery (when it triggers) — after a 400 error, the recovery hook finds tool results on disk sorted by size and replaces each with a truncation notice, then auto-retries. This recovers the session, but the original tool outputs are replaced on disk and lost. The agent has to redo all background research from scratch.
Proposed solution
There are two complementary improvements, ranked by feasibility:
Improvement 1: Non-destructive recovery (minimal change, plugin-layer only)
Currently truncateToolResult() overwrites original tool output on disk with a truncation notice — the data is destroyed. A minimal fix: write the original content to a file before overwriting, and include the file path in the replacement message:
// Current (destructive):
part.state.output = TRUNCATION_MESSAGE;
writeFileSync(partPath, JSON.stringify(part));
// Proposed (non-destructive):
const preservePath = join(storageDir, sessionID, `${partId}_output.txt`);
writeFileSync(preservePath, part.state.output); // preserve first
part.state.output = `[TOOL OUTPUT PRESERVED: ${preservePath}]\nUse read tool to access full output if needed.`;
writeFileSync(partPath, JSON.stringify(part));The agent can then read the preserved file on demand without it consuming context. The plugin already uses writeFileSync — this is one additional write call. No Go engine changes needed.
This also enables a multi-step recovery pipeline (replacing the current binary truncate-or-summarize):
- Deduplicate — existing DCP logic
- Offload large outputs to files — preserve content, replace with file reference (not destructive)
- Retroactive offload — if still over limit, iterate backward through older turns, offload their large tool outputs too
- Full compaction —
session.summarizeas absolute last resort
Improvement 2: Proactive distillation at formatTaskResult time
Instead of reacting after the context blows up, distill background task output before injection into the parent context.
Important constraint identified via Oracle consultation: the plugin recovery layer cannot make async LLM calls (only session.summarize is available, and it's all-or-nothing for the entire session). So distillation must happen at formatTaskResult time, not during recovery.
Proposed flow:
Background agent completes
→ Full output stays in session storage (already the case, no change needed)
→ At formatTaskResult time: spawn a cheap distillation call (flash model)
→ Distillation produces: key findings + decisions + file changes summary
→ Parent receives:
- Distilled summary (~2-5K tokens)
- session_id reference for full output
- Instruction: "use session_read(session_id=ses_xxx) for full details"
→ Full output accessible on-demand via session_read (no data loss)
Possible config shape:
Why not just truncation?
| Approach | Pros | Cons |
|---|---|---|
truncate_all_tool_outputs |
Works today, simple | Loses signal — truncation is position-based, not relevance-based |
Hard cap on formatTaskResult |
Simple to implement | Same problem — arbitrary cutoff |
| Distillation subagent | Preserves all important signal, reference to full output | Adds a small latency + cheap model call per task |
Distillation retains what matters and discards what doesn't, rather than blindly chopping at a byte boundary.
Use case
Any workflow with parallel background agents — which is the primary oh-my-opencode orchestration pattern. The more agents you run in parallel, the faster you exhaust context. This is especially acute with Antigravity models that have a 200K token limit.
Related issues
- oh-my-opencode#96 — DCP (reactive recovery, doesn't prevent the blowup)
- opencode-antigravity-auth#411 —
antigravity-claude-opus-4-6-thinking exceeds 200k limit - opencode-antigravity-auth#407 —
Do compaction earlier
Alternatives considered
- Add
background_outputtoTRUNCATABLE_TOOLS— simple but same truncation problem - Reduce
DEFAULT_MAX_TOKENSglobally — hurts legitimate large tool outputs - Let the parent agent request
full_session=truewithmessage_limit/include_tool_results=false— shifts the burden to prompt engineering; agents don't consistently use these params - Configure compaction agent with a larger context model — tried setting
agents.compaction.modeltogoogle/antigravity-gemini-3-pro(1M context). The model does get used for compaction when it triggers, but it doesn't prevent the "prompt is too long" error because the issue isn't compaction capacity — it's that compaction doesn't trigger early enough, and background output injection happens upstream of compaction entirely
Additional context
Session forensics from a real failure: 260K tokens in under 5 minutes. Breakdown:
- 4 parallel background agents: ~60K tokens (raw
formatTaskResultoutput) - 8 sequential file reads: ~80K tokens
- System prompt + instructions: ~30K tokens
- Conversation history: ~90K tokens
{ "background_task": { "distill_output": { "enabled": true, "model": "github-copilot/grok-code-fast-1", "max_summary_tokens": 3000 } } }