[Feature]: Background Task Output Distillation + Non-Destructive Recovery

# Feature Request: Background Task Output Distillation



### Pre-submission checklist

- [x] I have searched existing issues and feature requests for duplicates
- [x] I have read the [README](https://github.com/code-yeongyu/oh-my-opencode#readme)

### Feature description

When a background task completes and the parent agent calls `background_output`, `formatTaskResult()` fetches all messages from the child session and concatenates every text, reasoning, and tool_result block into a single blob with no size cap. This raw output is injected directly into the parent session's context.

With parallel background agents (the intended use pattern), this causes rapid context window exhaustion. In a real session I debugged, 4 parallel `explore` agents returned ~60K tokens of combined output in a single batch, contributing to a 260K token spike that hit the 200K provider limit.

### The problem in code

`formatTaskResult()` (dist/index.js, ~line 48776):
- Calls `client.session.messages()` for the child session
- Filters for assistant + tool messages
- Extracts ALL text/reasoning/tool_result parts
- Joins with `\n\n` — no truncation, no summarization, no cap

`background_output` is not in the `TRUNCATABLE_TOOLS` allowlist, so the `tool-output-truncator` hook skips it by default.

### Existing partial workarounds

**1. `experimental.truncate_all_tool_outputs: true`** — caps each tool output at 50K tokens. Helps but doesn't prevent the blowup: 4 parallel background agents × 50K = 200K tokens before system prompt and history are counted. Still exceeds a 200K limit.

**2. DCP recovery (when it triggers)** — after a 400 error, the recovery hook finds tool results on disk sorted by size and replaces each with a truncation notice, then auto-retries. This recovers the session, but the original tool outputs are **replaced on disk** and lost. The agent has to redo all background research from scratch.

### Proposed solution

There are two complementary improvements, ranked by feasibility:

#### Improvement 1: Non-destructive recovery (minimal change, plugin-layer only)

Currently `truncateToolResult()` overwrites original tool output on disk with a truncation notice — the data is destroyed. A minimal fix: write the original content to a file *before* overwriting, and include the file path in the replacement message:

```javascript
// Current (destructive):
part.state.output = TRUNCATION_MESSAGE;
writeFileSync(partPath, JSON.stringify(part));

// Proposed (non-destructive):
const preservePath = join(storageDir, sessionID, `${partId}_output.txt`);
writeFileSync(preservePath, part.state.output);  // preserve first
part.state.output = `[TOOL OUTPUT PRESERVED: ${preservePath}]\nUse read tool to access full output if needed.`;
writeFileSync(partPath, JSON.stringify(part));
```

The agent can then `read` the preserved file on demand without it consuming context. The plugin already uses `writeFileSync` — this is one additional write call. No Go engine changes needed.

This also enables a **multi-step recovery pipeline** (replacing the current binary truncate-or-summarize):

1. **Deduplicate** — existing DCP logic
2. **Offload large outputs to files** — preserve content, replace with file reference (not destructive)
3. **Retroactive offload** — if still over limit, iterate backward through older turns, offload their large tool outputs too
4. **Full compaction** — `session.summarize` as absolute last resort

#### Improvement 2: Proactive distillation at `formatTaskResult` time

Instead of reacting after the context blows up, distill background task output *before* injection into the parent context.

**Important constraint identified via Oracle consultation:** the plugin recovery layer cannot make async LLM calls (only `session.summarize` is available, and it's all-or-nothing for the entire session). So distillation must happen at `formatTaskResult` time, not during recovery.

**Proposed flow:**

```
Background agent completes
  → Full output stays in session storage (already the case, no change needed)
  → At formatTaskResult time: spawn a cheap distillation call (flash model)
  → Distillation produces: key findings + decisions + file changes summary
  → Parent receives:
      - Distilled summary (~2-5K tokens)
      - session_id reference for full output
      - Instruction: "use session_read(session_id=ses_xxx) for full details"
  → Full output accessible on-demand via session_read (no data loss)
```

**Possible config shape:**

```jsonc
{
  "background_task": {
    "distill_output": {
      "enabled": true,
      "model": "github-copilot/grok-code-fast-1",
      "max_summary_tokens": 3000
    }
  }
}
```

### Why not just truncation?

| Approach | Pros | Cons |
|----------|------|------|
| `truncate_all_tool_outputs` | Works today, simple | Loses signal — truncation is position-based, not relevance-based |
| Hard cap on `formatTaskResult` | Simple to implement | Same problem — arbitrary cutoff |
| Distillation subagent | Preserves all important signal, reference to full output | Adds a small latency + cheap model call per task |

Distillation retains what matters and discards what doesn't, rather than blindly chopping at a byte boundary.

### Use case

Any workflow with parallel background agents — which is the primary oh-my-opencode orchestration pattern. The more agents you run in parallel, the faster you exhaust context. This is especially acute with Antigravity models that have a 200K token limit.

### Related issues

- [oh-my-opencode#96](https://github.com/code-yeongyu/oh-my-opencode/issues/96) — DCP (reactive recovery, doesn't prevent the blowup)
- [opencode-antigravity-auth#411](https://github.com/NoeFabris/opencode-antigravity-auth/issues/411) — `antigravity-claude-opus-4-6-thinking exceeds 200k limit`
- [opencode-antigravity-auth#407](https://github.com/NoeFabris/opencode-antigravity-auth/issues/407) — `Do compaction earlier`

### Alternatives considered

1. **Add `background_output` to `TRUNCATABLE_TOOLS`** — simple but same truncation problem
2. **Reduce `DEFAULT_MAX_TOKENS` globally** — hurts legitimate large tool outputs
3. **Let the parent agent request `full_session=true` with `message_limit`/`include_tool_results=false`** — shifts the burden to prompt engineering; agents don't consistently use these params
4. **Configure compaction agent with a larger context model** — tried setting `agents.compaction.model` to `google/antigravity-gemini-3-pro` (1M context). The model does get used for compaction when it triggers, but it doesn't prevent the "prompt is too long" error because the issue isn't compaction capacity — it's that compaction doesn't trigger early enough, and background output injection happens upstream of compaction entirely

### Additional context

Session forensics from a real failure: 260K tokens in under 5 minutes. Breakdown:
- 4 parallel background agents: ~60K tokens (raw `formatTaskResult` output)
- 8 sequential file reads: ~80K tokens
- System prompt + instructions: ~30K tokens
- Conversation history: ~90K tokens

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Background Task Output Distillation + Non-Destructive Recovery #1734

Feature Request: Background Task Output Distillation

Pre-submission checklist

Feature description

The problem in code

Existing partial workarounds

Proposed solution

Improvement 1: Non-destructive recovery (minimal change, plugin-layer only)

Improvement 2: Proactive distillation at `formatTaskResult` time

Why not just truncation?

Use case

Related issues

Alternatives considered

Additional context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Approach	Pros	Cons
`truncate_all_tool_outputs`	Works today, simple	Loses signal — truncation is position-based, not relevance-based
Hard cap on `formatTaskResult`	Simple to implement	Same problem — arbitrary cutoff
Distillation subagent	Preserves all important signal, reference to full output	Adds a small latency + cheap model call per task

[Feature]: Background Task Output Distillation + Non-Destructive Recovery #1734

Description

Feature Request: Background Task Output Distillation

Pre-submission checklist

Feature description

The problem in code

Existing partial workarounds

Proposed solution

Improvement 1: Non-destructive recovery (minimal change, plugin-layer only)

Improvement 2: Proactive distillation at formatTaskResult time

Why not just truncation?

Use case

Related issues

Alternatives considered

Additional context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Improvement 2: Proactive distillation at `formatTaskResult` time