Plan Mode and "Ralph" implementation loop Workflows for GitHub Copilot
Craftsman is a collection of 2 specialized AI agent modes that transform how you plan, implement, and verify complex software changes in VS Code.
Instead of endless manual prompting, Craftsman provides structured workflows that ensure quality through systematic planning, autonomous implementation, and continuous validation.
- Simple 2 files, 2 agent modes: Plan Mode / Ralph Loop.
- Plan Mode: An interview-based planning agent that produces reviewable specifications and actionable task breakdowns.
- Ralph Loop Mode: An orchestration agent that autonomously implements tasks with continuous verification
- Structured Artifacts: Clear, traceable files for specifications, plans, and tasks
- Integration with external issue trackers via JIRA-ID naming convention
- Optional Human-in-the-loop (HITL) phase review for stakeholder review at critical points during implementation.
Craftsman is an experiment in structured agent orchestration, exploring three key questions:
Using XML-like tags and structured prompts, Craftsman tests whether LLMs can execute multi-phase processes (discovery → interview → specification → planning → implementation → verification) without losing track of their role or breaking the workflow.
Real teams use JIRA, Linear, or GitHub Issues.
Craftsman uses the JIRA-ID naming convention (.agents/changes/JIRA-123-description/)
to maintain traceability between planning artifacts and external project management systems.
Inspired by the "Ralph Wiggum" pattern, a simple loop that repeatedly delegates to subagents until all tasks are complete.
Craftsman adapts this approach for VS Code GitHub Copilot.
Instead of:
- Writing new prompts for each implementation step
- Manually tracking which tasks are done
- Hoping the agent remembers earlier context
Ralph Loop does:
- Read the progress file
- Delegate next task to a fresh Coder subagent
- Verify the result with Inspector subagent
- Update progress
- Repeat until complete
This is linear, stateful, and autonomous — you start the loop and step away.
Craftsman currently provides two complementary agent modes:
A research and planning agent that produces reviewable specifications and actionable task breakdowns.
Plan Mode systematically explores your change request through:
- Deep context discovery — scans your project structure, documentation, and existing patterns
- Structured interviews — asks 10-15 clarifying questions, then 5-10 technical follow-ups
- Specification generation — produces a reviewable spec with requirements, constraints, and success criteria
- Implementation planning — creates detailed architectural plan with dependencies
- Task breakdown — generates independent, actionable task files for implementation
Output artifacts (in .agents/changes/<JIRA>-<description>/):
.agents/changes/JIRA-123-feature-name/
├── 00.request.md # Initial change request
├── 01-specification.md # Reviewable design decisions and requirements
├── 02-plan.md # Technical architecture and dependencies
├── 03-tasks-00-READBEFORE.md # Critical context for all tasks
├── 03-tasks-01-models.md # Phase 1, Task 1: Data models
├── 03-tasks-02-api.md # Phase 1, Task 2: API endpoints
├── 03-tasks-03-tests.md # Phase 2, Task 3: Unit tests
└── 03-tasks-04-docs.md # Phase 2, Task 4: Documentation
Key principle: Plan Mode never writes implementation code. It focuses exclusively on thorough planning so implementation agents have clear, complete instructions.
Files Description:
00.request.md: The initial human request, often a poorly written JIRA ticket.01-specification.md: The main output of Plan Mode, containing reviewable design and architectural choices without technical details or code.01-specification.jira.txt: A JIRA-friendly version of the specification for easy putting issues in review in JIRA.02-plan.md: A highly technical architecture plan that includes task dependencies and low-level details. This file is never used after task breakdown is finished.03-tasks-00-READBEFORE.md: critical context and instructions for all tasks, including applicable coding standards, testing requirements, and implementation guidelines. These guidelines may be loaded using progressive disclosure by implementation agents to ensure consistent adherence to standards.03-tasks-XX-*.md: individual task files, each containing a single independent task with just enough context for a fresh agent to implement it. Tasks are grouped into phases, but each task file is self-contained to reduce cognitive overload and token waste.
An orchestration agent that autonomously implements tasks with continuous verification.
Ralph Loop manages the complete implementation lifecycle:
- Reads planning artifacts — loads spec, plan, and task files from Plan Mode
- Delegates to Coder subagent — selects next task, triggers implementation subagent
- Runs Task Inspector — verifies each completed task meets acceptance criteria
- Manages phase transitions — validates phase completion before proceeding
- Human-in-the-Loop (HITL) — optional pause points for stakeholder review
- Progress tracking — maintains PROGRESS.md with task status and validation notes
Two operational modes:
- Auto Mode (default) — continuous implementation until all tasks complete
- HITL Mode (Human-in-the-loop) — pauses at phase boundaries for human review. This allows to "steer"
Verification system:
- Task Inspector — validates individual task completion after each Coder run
- Phase Inspector — generates comprehensive phase review reports
- Retry mechanism — marks incomplete tasks for high priority rework by a new coder subagent
At the end of each coding task AND after each review, a commit message is generated. This creates a clear, traceable commit history that links implementation decisions back to the original plan and specification.
It is higly recommended to squash all these commits into a single, self-contained commit before merging to the main branch, to avoid polluting the commit history with intermediate implementation steps.
- Orchestrator: manages the overall workflow, delegates to subagents, and tracks progress It nevers codes or even verify the actual job, it just tracks the progress and delegates to the right subagent.
- Coder: implements individual tasks based on task files
- Task Inspector: verifies task completion against acceptance criteria
- Phase Inspector: validates phase completion and generates review reports
Current method (manual):
-
Add Craftsman to your workspace:
# Clone or download Craftsman git clone https://github.com/your-org/Craftsman.git ~/Projects/Craftsman
Add the Craftsman folder to your VS Code workspace to make agent definitions available.
-
Copy agent modes to your project:
cd /path/to/your-project mkdir -p .github/agents cp ~/Projects/Craftsman/.github/agents/*.agent.md .github/agents/
-
Start Copilot Chat and select the Agent @Craftsman: Plan Mode
-
Provide your change request (or paste JIRA ticket)
-
Answer the clarifying questions
-
Review the generated specification in
.agents/changes/<JIRA-123>-<short-description>/01-specification.md -
Approve plan and task breakdown
-
By default, the implementation is autonomous without human intervention. You can choose to enable HITL (Human-in-the-Loop) mode to pause at phase boundaries for review.
-
Provide path to planning artifacts (e.g.,
.agents/changes/IDEA-01-refresh-command/) -
Ralph Loop will:
- Read spec, plan, and tasks
- Delegate implementation to Coder subagents
- Verify each task with Task Inspector
- Track progress in PROGRESS.md
- Continue until all tasks complete
Pausing Ralph Loop: Create PAUSE.md in the planning folder to safely pause the loop for manual task edits.
- Start with a small request in markdown (
.agents/changes/JIRA-123-description/00.request.md) - Use a mid-size model like Claude Sonnet 4.5 for the Plan Mode.
- Reserve Opus only when tasks require complex reasoning or multi-phase implementation (20+ tasks)
- Use Claude Haiku for implementation. But for the moment, you can use Sonnet for the Ralph loop, it will not consume 1 premium token per request (to my experience, it consumes 1 premium request after ~15 taks + reviewers)
graph TD
A[Change Request] --> B[Plan Mode: Discovery]
B --> C[Plan Mode: Questions]
C --> D[Plan Mode: Specification]
D --> E{Review Spec}
E -->|Approved| F[Plan Mode: Tasks]
E -->|Revise| C
F --> G[Ralph Loop: Read Artifacts]
G --> H[Ralph Loop: Coder Subagent]
H --> I[Ralph Loop: Task Inspector]
I -->|✅ Complete| J{More Tasks?}
I -->|� Incomplete| H
J -->|Yes| H
J -->|No| K{Phase Complete?}
K -->|Yes, HITL| L[Phase Inspector + Human Review]
K -->|Yes, Auto| M[Phase Inspector]
K -->|No| H
L --> N{All Phases Done?}
M --> N
N -->|Yes| O[Implementation Complete]
N -->|No| H
Craftsman is a production-level proof of concept. It works, i use it daily in my workflow BUT it's not perfect.
In this section, I humbly provides the real limitations and known issues of the current implementation.
I would be so grateful if you could try it out and share your feedback, especially if you have suggestions for improvement.
-
Task selection autonomy: The orchestrator sometimes chooses tasks and sends task numbers to the Coder subagent, despite instructions stating "let the subagent choose". This creates unnecessary coupling.
-
Rate limit recovery failures: When hitting GitHub Copilot daily/weekly rate limits, retry behavior degrades:
- Orchestrator "forgets" to trigger subagents
- Implementation happens in orchestrator instead of Coder subagent
- Workaround: Start a fresh chat session
-
Feature completeness vs. accessibility gap: The most significant limitation — at completion:
- ✅ All features are typically implemented
- ✅ Complete preflight checks pass
- ✅ Unit tests pass
- ✅ Code quality is high
- ❌ But features may not be user-accessible (especially with UI)
Despite intensive planning with Claude Opus and no visible gaps in specifications, implemented features sometimes exist in code but lack integration points, UI bindings, or entry points for users to actually use them.
I would say a human would have caught this gap during implementation, but the agent still misses it.
Inspired by the "Ralph Wiggum" loop concept and refined through experimentation with GitHub Copilot's Agent Mode system.
Read more:



