-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Pre-submission checklist
- I have searched existing issues and feature requests for duplicates
- I have read the README and docs
Feature description
Add a first-class Long-Run Execution Mode for /start-work (or Sisyphus-like agents) to prevent context blowups in 40+ step tasks.
Current behavior tends to accumulate too much context by repeatedly carrying large plans, logs, and history into prompts. In practice this leads to token-limit failures and unstable long workflows.
This request is specifically about plan/state orchestration, not background tool output formatting.
Problem
In long tasks, the agent often keeps too much text in active context:
- full plan markdown
- long history replay
- long execution logs
Even if each turn succeeds, context grows monotonically and eventually fails with token limit errors.
Proposed behavior (v2-style)
Implement these guardrails in core orchestration:
- Plan sharding
- Split one large plan into multiple shard files (e.g. 4 files, 10-15 tasks each).
- Keep stable task IDs and completion state.
- State-first loop
- Persist minimal state (e.g.
completed_count,total_count,active_plan,current_window,checkpoint_seq). - At turn start, load only state + active shard (not all plan files).
- Top-N execution window (default N=3)
- Extract only first N pending tasks from active shard into
current_window. - Execute only this window per round.
- Checkpoint policy (default every K=3 tasks)
- After every K completed tasks, write checkpoint and stop the round.
- Also checkpoint early on token soft threshold (e.g. 60k soft, 90k hard).
- Local/partial updates only
- Mark one task done at a time.
- Do not rewrite or re-inject full plan contents.
- Blocker handling without deadlock
- Record blocker metadata and skip to next task.
- Keep task with BLOCKED note instead of deleting.
Why this matters
- Improves reliability for long-running work plans.
- Reduces repeated token waste from plan/history replay.
- Produces resumable, deterministic progress via checkpoints.
Suggested config shape
Acceptance criteria
- A 40+ task plan can run for many rounds without prompt overflow.
- Orchestrator reads only active shard + compact state each round.
- Checkpoints are generated deterministically and resumable.
- No full-plan/full-history reinjection unless explicitly requested.
Related issues
- [Feature]: Background Task Output Distillation + Non-Destructive Recovery #1734 (background task output distillation)
- [Question]: Atlas always maintains the only one todo called "Complete ALL tasks in work plan" #1742 (single todo behavior signal)
This issue is complementary: #1734 addresses tool output size; this request addresses long-run plan loop strategy.
{ "long_run": { "enabled": true, "window_size": 3, "checkpoint_every": 3, "token_soft_limit": 60000, "token_hard_limit": 90000, "partial_plan_updates": true, "blocker_skip": true } }