Skip to content

[Feature]: Native long-run execution mode (plan sharding + state + Top-N window + checkpoints) #1751

@tongsh6

Description

@tongsh6

Pre-submission checklist

  • I have searched existing issues and feature requests for duplicates
  • I have read the README and docs

Feature description

Add a first-class Long-Run Execution Mode for /start-work (or Sisyphus-like agents) to prevent context blowups in 40+ step tasks.

Current behavior tends to accumulate too much context by repeatedly carrying large plans, logs, and history into prompts. In practice this leads to token-limit failures and unstable long workflows.

This request is specifically about plan/state orchestration, not background tool output formatting.

Problem

In long tasks, the agent often keeps too much text in active context:

  • full plan markdown
  • long history replay
  • long execution logs

Even if each turn succeeds, context grows monotonically and eventually fails with token limit errors.

Proposed behavior (v2-style)

Implement these guardrails in core orchestration:

  1. Plan sharding
  • Split one large plan into multiple shard files (e.g. 4 files, 10-15 tasks each).
  • Keep stable task IDs and completion state.
  1. State-first loop
  • Persist minimal state (e.g. completed_count, total_count, active_plan, current_window, checkpoint_seq).
  • At turn start, load only state + active shard (not all plan files).
  1. Top-N execution window (default N=3)
  • Extract only first N pending tasks from active shard into current_window.
  • Execute only this window per round.
  1. Checkpoint policy (default every K=3 tasks)
  • After every K completed tasks, write checkpoint and stop the round.
  • Also checkpoint early on token soft threshold (e.g. 60k soft, 90k hard).
  1. Local/partial updates only
  • Mark one task done at a time.
  • Do not rewrite or re-inject full plan contents.
  1. Blocker handling without deadlock
  • Record blocker metadata and skip to next task.
  • Keep task with BLOCKED note instead of deleting.

Why this matters

  • Improves reliability for long-running work plans.
  • Reduces repeated token waste from plan/history replay.
  • Produces resumable, deterministic progress via checkpoints.

Suggested config shape

{
  "long_run": {
    "enabled": true,
    "window_size": 3,
    "checkpoint_every": 3,
    "token_soft_limit": 60000,
    "token_hard_limit": 90000,
    "partial_plan_updates": true,
    "blocker_skip": true
  }
}

Acceptance criteria

  • A 40+ task plan can run for many rounds without prompt overflow.
  • Orchestrator reads only active shard + compact state each round.
  • Checkpoints are generated deterministically and resumable.
  • No full-plan/full-history reinjection unless explicitly requested.

Related issues

This issue is complementary: #1734 addresses tool output size; this request addresses long-run plan loop strategy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions