[v0.4.0] Task Resumption

# Task Resumption

**Version**: v0.4.0
**Priority**: High
**Complexity**: High (depends on research spike)
**Dependencies**: Blocked by #46 (Research Spike)

## Problem Statement

Long-running tasks that fail or get interrupted must restart from scratch, losing all progress. This wastes time and makes complex refactoring tasks impractical.

Users need:
- Automatic checkpoints during task execution
- Ability to resume failed tasks from last checkpoint
- Detection of external changes during task downtime

## User Stories

### Must-Have
- **US-3.1**: As a developer, I want Claudine to automatically create checkpoints during execution, so progress is preserved if the task fails.
- **US-3.2**: As a developer, I want to resume a failed task from its last checkpoint, so I don't lose progress on long-running refactors.
- **US-3.3**: As a developer, I want Claudine to detect if files changed while suspended, so I can decide whether to proceed.
- **US-3.4**: As a developer, I want git state tracked at checkpoint time, so resume can verify working directory state.
- **US-3.5**: As a developer, I want a fallback "retry with context" option, so I have options even if full resumption isn't feasible.

### Nice-to-Have (Deferred)
- List all checkpoints for a task
- Configure checkpoint retention policy
- Manual checkpoint trigger
- Resume with modified prompt

## Scope

### v1 MVP (In Scope)
- [x] Automatic checkpoints every N tool calls (default 10)
- [x] Checkpoint before risky operations (configurable)
- [x] New MCP tool: `ResumeTask`
- [x] Conflict detection (file changes since checkpoint)
- [x] Conflict resolution options: `fail` (default), `override`
- [x] Git state capture (branch, commit SHA, dirty files)
- [x] Context injection fallback

### Deferred (v0.4.1+)
- Checkpoint compression
- Partial state recovery (merge conflicts)
- Resume from arbitrary checkpoint (not just latest)
- Cross-server resumption

### Out of Scope
- Real-time checkpoint streaming
- Automatic conflict resolution
- Time-travel debugging

## Acceptance Criteria

### Success Scenarios
- [ ] Checkpoints created automatically every 10 tool calls
- [ ] Resume from last checkpoint completes within 30 seconds
- [ ] Conflict detection identifies changed files correctly
- [ ] Context injection fallback works when full resumption unavailable
- [ ] Checkpoint overhead < 5% of task runtime

### Failure Scenarios
- [ ] Resume on non-failed task returns `INVALID_OPERATION` error
- [ ] Resume without checkpoint returns `INVALID_STATE` error
- [ ] Corrupted checkpoint detected and reported
- [ ] Database failure during checkpoint handled gracefully (task continues)

### Edge Cases
- [ ] Server crash mid-task leaves resumable checkpoint
- [ ] Multiple resume attempts maintain retry chain
- [ ] Large checkpoints (500+ tool calls) handled via file storage
- [ ] Git state mismatch logged as warning (not blocking)

## Technical Design

### New Database Table
```sql
CREATE TABLE task_checkpoints (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  task_id TEXT NOT NULL,
  checkpoint_number INTEGER NOT NULL,
  conversation_history TEXT,
  tool_calls TEXT,
  git_branch TEXT,
  git_commit_sha TEXT,
  git_has_uncommitted INTEGER,
  file_path TEXT,  -- For large checkpoints
  created_at INTEGER NOT NULL,
  FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE,
  UNIQUE(task_id, checkpoint_number)
);
```

### New Events
- `CheckpointCreated`
- `TaskResumed`
- `CheckpointCorruptionDetected`

### New Components
- `CheckpointHandler` (checkpoint lifecycle)
- `CheckpointRepository` (SQLite persistence)
- `ResumeTask` MCP tool

### Implementation Approaches

**Option A: Full Session Continuation** (if research validates)
- Serialize conversation history
- Restore tool call state
- Resume with full context

**Option B: Context Injection Fallback**
- Inject summary: "Previous attempt failed at step X..."
- Include last N messages
- Include error context

## Success Metrics

- [ ] Resume from checkpoint within 30 seconds
- [ ] Resumption success rate > 80% (no external conflicts)
- [ ] Checkpoint overhead < 5% of task runtime
- [ ] Clear error messages for unrecoverable states

## Dependencies

- **Blocked by**: #46 (Task Resumption Research Spike)
- Implementation approach depends on research findings

---

*Created via `/specify` command*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v0.4.0] Task Resumption #47

Task Resumption

Problem Statement

User Stories

Must-Have

Nice-to-Have (Deferred)

Scope

v1 MVP (In Scope)

Deferred (v0.4.1+)

Out of Scope

Acceptance Criteria

Success Scenarios

Failure Scenarios

Edge Cases

Technical Design

New Database Table

New Events

New Components

Implementation Approaches

Success Metrics

Dependencies

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[v0.4.0] Task Resumption #47

Description

Task Resumption

Problem Statement

User Stories

Must-Have

Nice-to-Have (Deferred)

Scope

v1 MVP (In Scope)

Deferred (v0.4.1+)

Out of Scope

Acceptance Criteria

Success Scenarios

Failure Scenarios

Edge Cases

Technical Design

New Database Table

New Events

New Components

Implementation Approaches

Success Metrics

Dependencies

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions