Skip to content

[Feature]: Background Task Output Distillation + Non-Destructive Recovery #1734

@rothnic

Description

@rothnic

Feature Request: Background Task Output Distillation

Pre-submission checklist

  • I have searched existing issues and feature requests for duplicates
  • I have read the README

Feature description

When a background task completes and the parent agent calls background_output, formatTaskResult() fetches all messages from the child session and concatenates every text, reasoning, and tool_result block into a single blob with no size cap. This raw output is injected directly into the parent session's context.

With parallel background agents (the intended use pattern), this causes rapid context window exhaustion. In a real session I debugged, 4 parallel explore agents returned ~60K tokens of combined output in a single batch, contributing to a 260K token spike that hit the 200K provider limit.

The problem in code

formatTaskResult() (dist/index.js, ~line 48776):

  • Calls client.session.messages() for the child session
  • Filters for assistant + tool messages
  • Extracts ALL text/reasoning/tool_result parts
  • Joins with \n\n — no truncation, no summarization, no cap

background_output is not in the TRUNCATABLE_TOOLS allowlist, so the tool-output-truncator hook skips it by default.

Existing partial workarounds

1. experimental.truncate_all_tool_outputs: true — caps each tool output at 50K tokens. Helps but doesn't prevent the blowup: 4 parallel background agents × 50K = 200K tokens before system prompt and history are counted. Still exceeds a 200K limit.

2. DCP recovery (when it triggers) — after a 400 error, the recovery hook finds tool results on disk sorted by size and replaces each with a truncation notice, then auto-retries. This recovers the session, but the original tool outputs are replaced on disk and lost. The agent has to redo all background research from scratch.

Proposed solution

There are two complementary improvements, ranked by feasibility:

Improvement 1: Non-destructive recovery (minimal change, plugin-layer only)

Currently truncateToolResult() overwrites original tool output on disk with a truncation notice — the data is destroyed. A minimal fix: write the original content to a file before overwriting, and include the file path in the replacement message:

// Current (destructive):
part.state.output = TRUNCATION_MESSAGE;
writeFileSync(partPath, JSON.stringify(part));

// Proposed (non-destructive):
const preservePath = join(storageDir, sessionID, `${partId}_output.txt`);
writeFileSync(preservePath, part.state.output);  // preserve first
part.state.output = `[TOOL OUTPUT PRESERVED: ${preservePath}]\nUse read tool to access full output if needed.`;
writeFileSync(partPath, JSON.stringify(part));

The agent can then read the preserved file on demand without it consuming context. The plugin already uses writeFileSync — this is one additional write call. No Go engine changes needed.

This also enables a multi-step recovery pipeline (replacing the current binary truncate-or-summarize):

  1. Deduplicate — existing DCP logic
  2. Offload large outputs to files — preserve content, replace with file reference (not destructive)
  3. Retroactive offload — if still over limit, iterate backward through older turns, offload their large tool outputs too
  4. Full compactionsession.summarize as absolute last resort

Improvement 2: Proactive distillation at formatTaskResult time

Instead of reacting after the context blows up, distill background task output before injection into the parent context.

Important constraint identified via Oracle consultation: the plugin recovery layer cannot make async LLM calls (only session.summarize is available, and it's all-or-nothing for the entire session). So distillation must happen at formatTaskResult time, not during recovery.

Proposed flow:

Background agent completes
  → Full output stays in session storage (already the case, no change needed)
  → At formatTaskResult time: spawn a cheap distillation call (flash model)
  → Distillation produces: key findings + decisions + file changes summary
  → Parent receives:
      - Distilled summary (~2-5K tokens)
      - session_id reference for full output
      - Instruction: "use session_read(session_id=ses_xxx) for full details"
  → Full output accessible on-demand via session_read (no data loss)

Possible config shape:

{
  "background_task": {
    "distill_output": {
      "enabled": true,
      "model": "github-copilot/grok-code-fast-1",
      "max_summary_tokens": 3000
    }
  }
}

Why not just truncation?

Approach Pros Cons
truncate_all_tool_outputs Works today, simple Loses signal — truncation is position-based, not relevance-based
Hard cap on formatTaskResult Simple to implement Same problem — arbitrary cutoff
Distillation subagent Preserves all important signal, reference to full output Adds a small latency + cheap model call per task

Distillation retains what matters and discards what doesn't, rather than blindly chopping at a byte boundary.

Use case

Any workflow with parallel background agents — which is the primary oh-my-opencode orchestration pattern. The more agents you run in parallel, the faster you exhaust context. This is especially acute with Antigravity models that have a 200K token limit.

Related issues

Alternatives considered

  1. Add background_output to TRUNCATABLE_TOOLS — simple but same truncation problem
  2. Reduce DEFAULT_MAX_TOKENS globally — hurts legitimate large tool outputs
  3. Let the parent agent request full_session=true with message_limit/include_tool_results=false — shifts the burden to prompt engineering; agents don't consistently use these params
  4. Configure compaction agent with a larger context model — tried setting agents.compaction.model to google/antigravity-gemini-3-pro (1M context). The model does get used for compaction when it triggers, but it doesn't prevent the "prompt is too long" error because the issue isn't compaction capacity — it's that compaction doesn't trigger early enough, and background output injection happens upstream of compaction entirely

Additional context

Session forensics from a real failure: 260K tokens in under 5 minutes. Breakdown:

  • 4 parallel background agents: ~60K tokens (raw formatTaskResult output)
  • 8 sequential file reads: ~80K tokens
  • System prompt + instructions: ~30K tokens
  • Conversation history: ~90K tokens

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions