Skip to content

[v0.4.0] Task Resumption #47

@dean0x

Description

@dean0x

Task Resumption

Version: v0.4.0
Priority: High
Complexity: High (depends on research spike)
Dependencies: Blocked by #46 (Research Spike)

Problem Statement

Long-running tasks that fail or get interrupted must restart from scratch, losing all progress. This wastes time and makes complex refactoring tasks impractical.

Users need:

  • Automatic checkpoints during task execution
  • Ability to resume failed tasks from last checkpoint
  • Detection of external changes during task downtime

User Stories

Must-Have

  • US-3.1: As a developer, I want Claudine to automatically create checkpoints during execution, so progress is preserved if the task fails.
  • US-3.2: As a developer, I want to resume a failed task from its last checkpoint, so I don't lose progress on long-running refactors.
  • US-3.3: As a developer, I want Claudine to detect if files changed while suspended, so I can decide whether to proceed.
  • US-3.4: As a developer, I want git state tracked at checkpoint time, so resume can verify working directory state.
  • US-3.5: As a developer, I want a fallback "retry with context" option, so I have options even if full resumption isn't feasible.

Nice-to-Have (Deferred)

  • List all checkpoints for a task
  • Configure checkpoint retention policy
  • Manual checkpoint trigger
  • Resume with modified prompt

Scope

v1 MVP (In Scope)

  • Automatic checkpoints every N tool calls (default 10)
  • Checkpoint before risky operations (configurable)
  • New MCP tool: ResumeTask
  • Conflict detection (file changes since checkpoint)
  • Conflict resolution options: fail (default), override
  • Git state capture (branch, commit SHA, dirty files)
  • Context injection fallback

Deferred (v0.4.1+)

  • Checkpoint compression
  • Partial state recovery (merge conflicts)
  • Resume from arbitrary checkpoint (not just latest)
  • Cross-server resumption

Out of Scope

  • Real-time checkpoint streaming
  • Automatic conflict resolution
  • Time-travel debugging

Acceptance Criteria

Success Scenarios

  • Checkpoints created automatically every 10 tool calls
  • Resume from last checkpoint completes within 30 seconds
  • Conflict detection identifies changed files correctly
  • Context injection fallback works when full resumption unavailable
  • Checkpoint overhead < 5% of task runtime

Failure Scenarios

  • Resume on non-failed task returns INVALID_OPERATION error
  • Resume without checkpoint returns INVALID_STATE error
  • Corrupted checkpoint detected and reported
  • Database failure during checkpoint handled gracefully (task continues)

Edge Cases

  • Server crash mid-task leaves resumable checkpoint
  • Multiple resume attempts maintain retry chain
  • Large checkpoints (500+ tool calls) handled via file storage
  • Git state mismatch logged as warning (not blocking)

Technical Design

New Database Table

CREATE TABLE task_checkpoints (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  task_id TEXT NOT NULL,
  checkpoint_number INTEGER NOT NULL,
  conversation_history TEXT,
  tool_calls TEXT,
  git_branch TEXT,
  git_commit_sha TEXT,
  git_has_uncommitted INTEGER,
  file_path TEXT,  -- For large checkpoints
  created_at INTEGER NOT NULL,
  FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE,
  UNIQUE(task_id, checkpoint_number)
);

New Events

  • CheckpointCreated
  • TaskResumed
  • CheckpointCorruptionDetected

New Components

  • CheckpointHandler (checkpoint lifecycle)
  • CheckpointRepository (SQLite persistence)
  • ResumeTask MCP tool

Implementation Approaches

Option A: Full Session Continuation (if research validates)

  • Serialize conversation history
  • Restore tool call state
  • Resume with full context

Option B: Context Injection Fallback

  • Inject summary: "Previous attempt failed at step X..."
  • Include last N messages
  • Include error context

Success Metrics

  • Resume from checkpoint within 30 seconds
  • Resumption success rate > 80% (no external conflicts)
  • Checkpoint overhead < 5% of task runtime
  • Clear error messages for unrecoverable states

Dependencies


Created via /specify command

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions