[Feature]: Native long-run execution mode (plan sharding + state + Top-N window + checkpoints)

### Pre-submission checklist

- [x] I have searched existing issues and feature requests for duplicates
- [x] I have read the README and docs

### Feature description

Add a first-class **Long-Run Execution Mode** for `/start-work` (or Sisyphus-like agents) to prevent context blowups in 40+ step tasks.

Current behavior tends to accumulate too much context by repeatedly carrying large plans, logs, and history into prompts. In practice this leads to token-limit failures and unstable long workflows.

This request is specifically about **plan/state orchestration**, not background tool output formatting.

### Problem

In long tasks, the agent often keeps too much text in active context:

- full plan markdown
- long history replay
- long execution logs

Even if each turn succeeds, context grows monotonically and eventually fails with token limit errors.

### Proposed behavior (v2-style)

Implement these guardrails in core orchestration:

1. **Plan sharding**
- Split one large plan into multiple shard files (e.g. 4 files, 10-15 tasks each).
- Keep stable task IDs and completion state.

2. **State-first loop**
- Persist minimal state (e.g. `completed_count`, `total_count`, `active_plan`, `current_window`, `checkpoint_seq`).
- At turn start, load only state + active shard (not all plan files).

3. **Top-N execution window (default N=3)**
- Extract only first N pending tasks from active shard into `current_window`.
- Execute only this window per round.

4. **Checkpoint policy (default every K=3 tasks)**
- After every K completed tasks, write checkpoint and stop the round.
- Also checkpoint early on token soft threshold (e.g. 60k soft, 90k hard).

5. **Local/partial updates only**
- Mark one task done at a time.
- Do not rewrite or re-inject full plan contents.

6. **Blocker handling without deadlock**
- Record blocker metadata and skip to next task.
- Keep task with BLOCKED note instead of deleting.

### Why this matters

- Improves reliability for long-running work plans.
- Reduces repeated token waste from plan/history replay.
- Produces resumable, deterministic progress via checkpoints.

### Suggested config shape

```jsonc
{
  "long_run": {
    "enabled": true,
    "window_size": 3,
    "checkpoint_every": 3,
    "token_soft_limit": 60000,
    "token_hard_limit": 90000,
    "partial_plan_updates": true,
    "blocker_skip": true
  }
}
```

### Acceptance criteria

- A 40+ task plan can run for many rounds without prompt overflow.
- Orchestrator reads only active shard + compact state each round.
- Checkpoints are generated deterministically and resumable.
- No full-plan/full-history reinjection unless explicitly requested.

### Related issues

- #1734 (background task output distillation)
- #1742 (single todo behavior signal)

This issue is complementary: #1734 addresses tool output size; this request addresses long-run plan loop strategy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Native long-run execution mode (plan sharding + state + Top-N window + checkpoints) #1751

Pre-submission checklist

Feature description

Problem

Proposed behavior (v2-style)

Why this matters

Suggested config shape

Acceptance criteria

Related issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature]: Native long-run execution mode (plan sharding + state + Top-N window + checkpoints) #1751

Description

Pre-submission checklist

Feature description

Problem

Proposed behavior (v2-style)

Why this matters

Suggested config shape

Acceptance criteria

Related issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions