Human-in-the-Loop AI Coding Agent with GitLab Integration (or File-Only Mode)
Note: The badges above are for the harness codebase itself. Target projects can use any language and tooling - see Pluggable Code Quality System.
What is an Agent Harness? A harness is a coordination layer or scaffolding around AI agents that allows them to work for hours (or days) on complex tasks without overwhelming their context window. Instead of asking an agent to do everything at once, a harness connects multiple agent sessions together—each starting fresh but quickly catching up via structured artifacts, then making incremental progress before handing off to the next session.
Coding Harness is an autonomous coding agent orchestration system that combines Claude AI with a terminal user interface (TUI) for milestone-based development workflows. It takes specification files, breaks them into issues (GitLab or local JSON files), and uses AI agents to implement each issue with human oversight at critical decision points.
┌──────────┐ ┌──────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Spec │───▶│ TUI │───▶│ Claude Agents │───▶│ GitLab (default)│
│ File │ │ (human) │ │ (Init→Code→MR) │ │ - OR - │
└──────────┘ └────┬─────┘ └────────┬─────────┘ │ Local JSON files│
│ │ └─────────────────┘
└───── 8 HITL ──────┘
checkpoints
The harness provides Human-in-the-Loop (HITL) checkpoints at 8 key stages, ensuring humans maintain control over project setup, issue breakdown, implementation approval, and merge request creation.
AI engineering has evolved through three stages:
- Prompt Engineering (2020-2023) - Optimizing single interactions with LLMs
- Context Engineering (2023-2024) - Optimizing entire sessions—what context to provide, when, and how to avoid overwhelming the model
- Agent Harnesses (2024+) - Connecting multiple sessions together for long-running tasks
Early examples include Manis (2024) and LangChain's deep agents. The pattern is becoming standardized as "the next unlock for AI capability."
Why harnesses now? The raw power of LLMs isn't exploding like it was. Scaling has hit limits. The breakthrough now comes from the layer around LLMs—memory systems, handoff protocols, validation loops, and orchestration. Harnesses are this layer.
2026 prediction: Cole Medin and others predict 2026 will be "the year of agent harnesses"—shifting from experimental to production-grade, with reliable autonomous coding becoming mainstream.
Vibe coding becomes viable when you have an engineered harness with human-in-the-loop at strategic checkpoints. Without the scaffolding, delegating coding to AI fails. With it, you can trust long-running autonomous work—though the system itself requires careful engineering.
Anthropic Research:
- Effective Harnesses for Long-Running Agents - Engineering blog (Nov 2025)
- Autonomous Coding Quickstart - Reference implementation
Video Explainers:
- I Forced Claude to Code for 24 Hours NONSTOP - Cole Medin's live test (result: 54% of 200 tests passing after 54 sessions)
- Claude SDK: 24-Hour Coding Agent - Live stream with Ray, Linear integration, cloud deployment, mid-run feedback
- Are Agent Harnesses Bringing Back Vibe Coding? - Evolution from prompts to harnesses, two unsolved problems
- Unlock DEEP AGENTS with Anthropic's Agent Harness - n8n implementation with concurrency patterns
Related Projects (mentioned in live stream):
- Archon - Cole Medin's open source "command center for AI coding" with Kanban board, knowledge curation, and MCP server. Similar concepts to Linear integration but AI-optimized.
- Factory - Ray's preferred coding agent. Uses memory compression (different approach than this harness's fresh-context-per-session pattern).
- Remote Agentic Coding System - Cole's upcoming project for kicking off agents from Slack, Discord, Telegram, GitHub. Combines harness ideas with remote observability.
Concepts Drawn from Each Source:
| Source | Key Concepts Incorporated |
|---|---|
| Anthropic Blog | Initializer + coding agent pattern, feature list with passes: false→true, claude-progress.txt equivalent, git commits as checkpoints, one-shotting/premature-completion failures |
| Cole Medin (24hr) | PIV loop mental model, "priming" phase, Claude Agent SDK usage, security hooks pattern, regression testing, test-driven verification, brownfield adaptation |
| Cole Medin (Live) | Linear MCP integration, token efficiency, cloud deployment issues, mid-run feedback injection, meta issue pattern, harness simplicity ("just two prompts"), OAuth subscription token, alternative SDKs comparison, real-time task board observation (no refresh needed), mobile monitoring via Linear app, parallel MCP calls for speed, spec template + brain dump workflow, OpenCode + Ollama confirmed working, temperature variance between providers |
| Vibe Coding Video | Evolution timeline (prompts→context→harnesses), bounded attention/"dumb zone", compounding error math (95%²⁰=36%), vibe coding viability with HITL, predictive context as unsolved problem, autonomy balance |
| n8n Deep Agents | Concurrency patterns, lock mechanisms, research artifact staging, progressive summarization, Retrieval→Synthesize→Write pipeline, task dependency concepts |
Comprehensive Feature Coverage:
| Concept | Source | Status | Notes |
|---|---|---|---|
| Core Architecture | |||
| Initializer → Coding → MR agents | Anthropic | ✅ Have | Extended to 3 agents with HITL |
| Feature/task list persistence | Anthropic | ✅ Have (adapted) | GitLab issues instead of JSON |
| Progress file handoffs | Anthropic | ✅ Have (adapted) | GitLab comments + .gitlab_milestone.json |
| Git commits as checkpoints | All | ✅ Have | Structured format with metadata |
| Failure Mode Prevention | |||
| One-shotting prevention | Anthropic | ✅ Have | ONE issue per coding session |
| Premature completion prevention | Anthropic | ✅ Have | Quality gates + HITL before closure |
| Compounding error mitigation | Vibe Coding | ✅ Have | Circuit breakers, regression tests |
| Context rot prevention | Vibe Coding | ✅ Have | Fresh context per session |
| Validation & Testing | |||
| Browser automation (Puppeteer) | Anthropic | ✅ Have | Via Puppeteer MCP |
| Regression testing | All | ✅ Have | Before/after each issue |
| Unit test creation | Cole Medin | ✅ Have | Mandatory for new code |
| Test repair loop | Cole Medin | ✅ Have | Max 3 attempts then skip |
| Quality gates (lint/type/test) | All | ✅ Have | Must pass before issue closure |
| Human Oversight | |||
| Human-in-the-loop checkpoints | All | ✅ Have | 8 checkpoint types |
| Autonomy balance | Vibe Coding | ✅ Have | Strategic checkpoints, not constant |
| Auto-accept mode | - | ✅ Have | For trusted runs |
| Memory & Persistence | |||
| File system as memory | n8n | ✅ Have | .claude-agent/ directory |
| Git as external memory | Cole Medin | ✅ Have | Commit log for context |
| Progressive summarization | n8n | Handoff comments, not systematic | |
| Research artifact staging | n8n | GitLab comments serve this role | |
| Lock mechanisms | n8n | Daemon handles single-agent; no multi-worker locks | |
| Advanced Patterns | |||
| Static planning | n8n | ✅ Have | Issues created upfront |
| Rolling/adaptive planning | n8n | ❌ Don't have | No replanning based on progress |
| Goal-driven planning | n8n | ❌ Don't have | No dynamic goal reassessment |
| Parallel task execution | n8n | ❌ Don't have | Sequential only |
| Task dependency graphs | n8n | ❌ Don't have | No explicit dependencies |
| Hybrid concurrency | n8n | ❌ Don't have | No parallel retrieval → sequential synthesis |
| WorkTrees for parallel branches | Cole Medin | ❌ Don't have | Single branch per agent |
| Development Modes | |||
| Greenfield (new projects) | Anthropic | ✅ Have | Primary mode |
| Brownfield (existing codebases) | Cole Medin | ✅ Have | Supported via spec design |
| Refactoring/migrations | Cole Medin | Untested, needs custom validation | |
| Multi-Assistant Support | |||
| Claude Agent SDK | Anthropic | ✅ Have | Only supported assistant |
| Other SDKs (Codex, OpenCode, AMP) | Cole Medin | ❌ Don't have | Architecture is portable, not implemented |
| Specialized Agents | |||
| Testing agent | Anthropic (future) | ❌ Don't have | Coding agent does inline |
| QA agent | Anthropic (future) | Puppeteer in coding agent | |
| Code cleanup agent | Anthropic (future) | ❌ Don't have | Coding agent does inline |
| Research agent | n8n | Issue enrichment phase | |
| Input validation agent | n8n | ❌ Don't have | HITL serves this purpose |
| External Integrations | |||
| GitLab issues/MRs | - | ✅ Have | Core integration |
| Linear/Jira/Asana | Cole Medin (Live) | ❌ Don't have | GitLab-specific |
| Real-time task observation | Cole Medin (Live) | GitLab updates require refresh; Linear updates live | |
| Mobile monitoring | Cole Medin (Live) | GitLab mobile app works; Linear demo'd on phone | |
| Meta issue for handoffs | Cole Medin (Live) | ✅ Have | GitLab comments on milestone; Linear uses meta issue |
| Context7 (library docs) | - | ✅ Have | Optional enrichment |
| SearxNG (web search) | - | ✅ Have | Optional enrichment |
| Spec & Planning | |||
| Template + brain dump workflow | Cole Medin (Live) | ✅ Have | Give LLM template + rough idea → structured spec |
| Vibe planning (unstructured) | Cole Medin (Live) | Supported via spec writing; not explicit phase | |
| PIV loop (Plan-Implement-Verify) | Cole Medin | ✅ Have | Core harness pattern |
| Authentication & Cost | |||
| OAuth subscription token | Cole Medin (Live) | ✅ Have | Use Max plan instead of API credits |
| API key fallback | Anthropic | ✅ Have | Per-token billing option |
| Alternative Providers | |||
| OpenCode + Ollama (local) | Cole Medin (Live) | ❌ Don't have | Community-confirmed working; not implemented |
| OpenCode + Gemini 3 | Cole Medin (Live) | ❌ Don't have | Supported by OpenCode; not implemented |
| Parallel MCP calls | Cole Medin (Live) | Claude does this automatically when appropriate |
Legend: ✅ Have = Fully implemented |
Complex projects can't be completed in a single context window. When context fills up, a new session starts with no memory. This creates three core challenges:
1. Bounded Attention (Context Rot)
As you add more information to an LLM's context, it enters what's been called the "dumb zone"—overwhelmed and making poor decisions. Harnesses solve this by:
- Clearing context between sessions (fresh start)
- Using external memory (files, databases, git) instead of in-context memory
- Progressive summarization of older work
- Handoff artifacts that capture only what's needed to continue
2. Compounding Errors (Reliability Decay)
If an agent has 95% reliability per step, over 20 steps that compounds to only 36% system reliability (0.95²⁰). Harnesses address this through:
- Checkpoints with self-validation after each task
- Human-in-the-loop at critical decision points
- Automatic rollback via git when things go wrong
- Guard rails that stop progress if quality checks fail
3. Predictive Context (Unsolved)
The hardest problem: you can't predict which observation becomes critical 10 steps later. Current approaches try to preserve everything potentially relevant, but optimal summarization—knowing exactly what future sessions will need—remains an open challenge. This harness addresses it through:
- Structured handoff templates that capture known-important context
- Git history as lossless memory (always recoverable)
- GitLab comments preserving full context on each issue
Anthropic specifically identified two behavioral failure modes:
-
One-shotting - Agent tries to do everything at once, runs out of context mid-implementation, leaves codebase broken. Next session must guess what happened.
-
Premature completion - Agent sees progress has been made, declares the job done while major features remain unimplemented.
The core insight: Despite appearing complex, harnesses are fundamentally simple. As Cole Medin emphasized during the live stream: "Everything that seems fancy with AI coding assistants is just a bunch of prompts... all this harness is is two prompts—the initializer prompt and the coding prompt. That's it."
The most common harness architecture uses two specialized agents:
| Agent | Purpose |
|---|---|
| Initializer (session 1) | Create feature list/tasks, setup environment, initialize git |
| Task Agent (sessions 2+) | Implement ONE task, verify it works, commit, repeat in loop |
This pattern appears across implementations—Anthropic's coding quickstart, LangChain's deep agents, and custom harnesses. The task agent loops until all work is complete, with each iteration:
- Prime - Read progress files, git log, understand current state ("getting bearings")
- Validate - Regression test previous work before touching anything new
- Execute - Implement one task, write tests, verify
- Handoff - Update progress files, commit, prepare for next session
This maps to what Cole Medin calls the PIV loop (Plan-Implement-Verify)—a mental model for structured AI coding where each cycle produces verified, committed work.
The key insight: agents need structured artifacts to quickly understand project state when starting fresh. These include:
- A task/feature list that tracks what's done vs remaining (never edit descriptions, only mark complete)
- Progress notes summarizing recent work
- Git commits as atomic checkpoints
- Session handoff comments with specific next steps
| Aspect | Quickstart | This Harness |
|---|---|---|
| Work units | 200 tiny test cases | GitLab issues (fewer, larger) |
| Granularity | Each test ≈ one session | Each issue may span multiple sessions |
| Progress tracking | passes: false → true in JSON |
GitLab comments + "in-progress" label (or local JSON in file-only mode) |
| Completion signal | Edit JSON field | Close issue via GitLab API (or update local JSON) |
| State file | claude-progress.txt |
.gitlab_milestone.json |
| Human oversight | None (fully autonomous) | 8 HITL checkpoints |
| Interface | CLI auto-loop | TUI with log streaming |
| Issue tracking | Local JSON only | GitLab (default) or local JSON (file-only mode) |
| MR creation | N/A | Optional (can skip to keep changes on branch) |
Key difference in granularity: The quickstart decomposes specs into ~200 small test cases, each small enough to complete in a single context window. This harness uses GitLab issues—larger work units that may require multiple sessions to complete.
Multi-session issue support: When an issue can't be completed in one session:
- The "in-progress" label persists across sessions
- Structured handoff comments include:
- Last commit SHA and message
- Progress checklist (completed / in-progress / not started)
- Files changed with change types
- Specific next steps for the following session
- Gotchas and key context
- Next session reviews git log and handoff comments before continuing
- The issue stays assigned until closed
Commit conventions: All commits follow a structured format for traceability:
<type>(#<issue>): <short description>
<body - what changed and why>
Files: <count> changed
Tests: <added/updated/none>
Issue: #<iid> - <title>
Types: feat, fix, test, refactor, style, docs, chore
We extend to three agents (Initializer → Coding loop → MR Creation) and add human approval gates at critical decisions—directly addressing the premature completion problem.
Agent harnesses can vary significantly based on the use case. This section documents patterns from the research that we don't yet implement—useful context for understanding the design space and future directions.
Planning Strategies:
| Strategy | Description | This Harness |
|---|---|---|
| Static Plan | Create all tasks upfront, execute sequentially | ✅ Used |
| Rolling Plan | Plan → execute → replan based on progress | ❌ Not implemented |
| Goal-Driven | Constantly ask "what's the best next action?" | ❌ Not implemented |
| Test-Driven Loop | Iterate until criteria/tests pass | ✅ Partial (quality gates) |
Concurrency Patterns:
| Pattern | Description | This Harness |
|---|---|---|
| Sequential | One task at a time | ✅ Used |
| Parallel Retrieval | Research multiple sources simultaneously | ❌ Not implemented |
| Hybrid Concurrent | Parallel retrieval → sequential synthesis | ❌ Not implemented |
| Dependency Graph | Tasks with explicit dependencies | ❌ Not implemented |
Multi-Assistant Support:
The harness architecture (prompts + artifacts) is theoretically portable to other coding assistants with SDKs (Codex, OpenCode, AMP). However, this implementation is Claude-specific. Porting would require:
- Replacing Claude Agent SDK calls with equivalent SDK
- Adjusting prompts for model-specific behaviors
- Testing validation and tool use patterns
Confirmed working alternatives (from live stream):
- OpenCode + Ollama - Tested during the stream by community member Rasmus. OpenCode supports local models via Ollama, enabling fully local harness execution with models like Qwen 3 Coder or Kimi K2.
- OpenCode + Gemini 3 - Listed as supported model in OpenCode docs.
Temperature considerations: Ray noted that different providers may have different default temperatures, and this affects output quality. When using open models through providers like Fireworks or Cerebras, check the model card for recommended temperature settings. Models trained for agentic tool use (like Kimi K2, which handles hundreds of tool calls) may behave differently than general-purpose models.
WorkTrees:
Git worktrees enable parallel work on multiple branches. This harness uses a single branch per agent. Implementing worktrees would allow multiple issues to be worked simultaneously—useful for independent features with no code overlap.
The Anthropic quickstart is optimized for greenfield development—building new applications from scratch. This harness supports both modes:
| Mode | Description | This Harness |
|---|---|---|
| Greenfield | Build new project from scratch via spec file | ✅ Primary mode |
| Brownfield | Add features to existing codebase | ✅ Supported |
| Refactoring | Large-scale code modernization |
How brownfield works:
- Spec file describes features to ADD to existing codebase (not rebuild from scratch)
- Issues are created relative to current project state
- Agent reads existing code patterns before implementing
- Issue enrichment researches codebase context (grep, file structure)
Brownfield example (from Cole Medin's video): Sean used a modified harness for TypeScript refactoring—upgrading a codebase 12 major TypeScript versions behind. The harness handled:
- Breaking changes across versions
- Deprecated code patterns
- Validation that refactored code still compiles
Community results shared during live stream:
- WebDevCody: "I tried this last night for 24 [hours]. It's pretty amazing. 128 commits in."
- Multiple community members confirmed successful runs with custom applications
- Rasmus: "Tested the remote agent workflow manager harness today, built a Circle clone in a few hours in Django"
Refactoring considerations:
- Change spec to describe the transformation, not features
- Issues become refactoring tasks (e.g., "update all X to Y pattern")
- Quality gates verify no regressions introduced
- May need custom validation beyond test suites
Not yet supported:
- Parallel brownfield - Multiple agents on different features of same codebase
- Migration scripts - Database schema or infrastructure changes
- Cross-repo refactoring - Changes spanning multiple repositories
The Anthropic blog mentions potential specialized agents beyond the current Initializer → Coding → MR pattern:
| Agent Type | Purpose | Status |
|---|---|---|
| Testing Agent | Write comprehensive test suites | ❌ Not implemented (coding agent does inline) |
| QA Agent | End-to-end quality validation | |
| Code Cleanup Agent | Dead code removal, style fixes | ❌ Not implemented (coding agent does inline) |
| Research Agent | Deep context gathering |
Current approach: A single coding agent handles implementation, testing, and cleanup inline. Specialization could improve quality but adds coordination complexity.
Browser Automation:
- Cannot see browser-native alert/confirm/prompt modals through Puppeteer MCP
- Some dynamic UI elements may not be visible in screenshots
- Timing issues with slow-loading content
Context Prediction:
- Cannot predict which information becomes critical later
- Handoff templates are heuristic, not optimal
- Git history as fallback is comprehensive but token-expensive to read
Rate Limiting:
- Claude subscription has usage limits (varies by plan)
- Long-running agents may hit rate limits; resume from checkpoint
- OAuth token lasts 1 year; API key has no expiry but costs per-token
Observations from live stream: Cole Medin reported running Opus 4.5 for 24 hours (even running two harness instances for 12 of those hours) without hitting rate limits on the $200/month Max plan. However, other community members report hitting limits quickly. Rate limit behavior appears to vary by account and region. Recommendation: use multiple coding assistants (toggle between Claude Code, Codex, OpenCode) to distribute load if you hit limits frequently.
Incremental progress (solves: one-shotting): Each session works on ONE issue at a time. Large issues may span multiple sessions—progress is documented in GitLab comments with structured handoffs, and the next session continues where the last left off.
Clean state (solves: broken handoffs): Every session ends with code that's mergeable—no half-implementations, no broken tests, no uncommitted changes. If an issue isn't complete, the "in-progress" label stays on and a detailed handoff comment documents progress, last commit SHA, and next steps.
Test-driven verification (solves: compounding errors): Agents write unit tests for all new code, fix failing tests before proceeding, and verify features through browser automation. This combines automated test coverage with end-to-end UI verification—catching errors before they compound.
Quality gates (solves: premature completion): Before closing issues—linting, type checking, and test suite must pass. Regression checks on previously completed features. The agent cannot declare success without passing external validation.
Structured commits (solves: context rot): All commits follow a conventional format (feat(#42): description) with metadata (files changed, tests added, issue reference). Git becomes external memory—new sessions can read the commit log to understand what happened without relying on in-context memory.
File tracking (solves: accidental pollution): Agents track exactly which files they modify in session_files.tracked. Only those files get pushed—never pre-existing user changes or unrelated modifications.
The harness supports flexible operating modes to fit different workflows:
GitLab Mode (Default)
- Issues and milestones tracked in GitLab
- MR created automatically when coding completes
- Full observability through GitLab's web interface
File-Only Mode
- Issues and milestones stored in local JSON files (
.claude-agent/<spec>/) - No GitLab account required for issue tracking
- Useful for: offline work, private repos, quick prototyping
- Enable via TUI checkbox or
"file_only_mode": truein JSON specs
Skip MR Creation
- Agent stops after coding completes
- All changes remain on the feature branch
- Useful for: manual review before MR, experimental work, draft features
- Enable via TUI checkbox or
"skip_mr_creation": truein JSON specs
Combined Modes All modes can be combined. For example, file-only mode + skip MR creation gives you a fully local workflow where the agent codes but you handle all git operations manually afterward.
Each feature below addresses a specific harness challenge—context management, reliability, or coordination.
- Interactive TUI - Textual-based terminal interface for intuitive agent management
- GitLab Integration - Automatic milestone creation, issue management, and merge request generation
- File-Only Mode - Optional local JSON-based tracking instead of GitLab (no GitLab account required)
- Skip MR Creation - Option to stop after coding without creating a merge request (keep changes on branch)
- Configurable Testing - Skip Puppeteer, test suite, or regression testing via TUI options
- 8 HITL Checkpoints - Human approval gates for project verification, issue breakdown, implementation review, and more
- Spec-to-Issues Breakdown - AI-powered conversion of specifications into actionable GitLab issues
- Issue Enrichment - Optional Context7 and SearxNG integration for researching libraries and best practices
- Multi-Spec Support - Run multiple specification files in batch mode
- Real-Time Output - Log-tailing terminal widget streams agent output live
- Atomic State Persistence - Checkpoint state is safely persisted to prevent data loss
- Unit Test Creation - Agents write tests for all new code (pytest, Jest, Vitest, Go, Rust)
- Test Repair Loop - Failing tests are automatically fixed before proceeding with new work
- Regression Testing - Automated verification of existing features before and after implementation
- Quality Gates - Mandatory code quality checks (linting, formatting, type checking) before issue closure
- Structured Commits - Conventional commit format with issue references and metadata
- File Tracking - Agents track files they modify; only pushes agent-created changes (never user's uncommitted work)
- Multi-Session Handoffs - Detailed handoff comments with commit SHAs for seamless session continuity
- Verification Loops - GitLab API calls with retry logic, issue creation verification, MR existence validation
- Security Hooks - Bash command allowlist validation prevents dangerous operations
- Docker (recommended) - Install Docker
- OR Python 3.11+ - Download Python (for native mode)
- Git - For repository operations
- GitLab Account - With API access token (Create Token)
- Optional if using file-only mode for issue tracking
- Claude API Access - Either:
- Claude Code OAuth token (recommended) - Generate with
claude setup-token - Anthropic API key - Get API Key
- Claude Code OAuth token (recommended) - Generate with
- Optional:
# Clone the repository
git clone <repository-url>
cd coding-harness
# Copy environment template
cp .env.example .env
# Edit .env with your credentials (see Configuration section)
nano .env # or your preferred editor
# Start (Docker image builds automatically on first run)
./start.shDocker mode benefits:
- Daemon + agents persist when TUI exits
- Reconnect anytime with
./start.sh --connect - True SSH-like experience
# Clone the repository
git clone <repository-url>
cd coding-harness
# Create and activate virtual environment
python3.11 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Copy environment template
cp .env.example .env
# Edit .env with your credentials (see Configuration section)
nano .env # or your preferred editor
# Start in native mode
./start.sh --nativeNative mode differences:
- Daemon is started automatically but killed when TUI exits
- No persistence - agents stop when you quit
- Good for quick testing, not long-running tasks
| Variable | Description |
|---|---|
GITLAB_PERSONAL_ACCESS_TOKEN |
GitLab token with scopes: api, read_api, read_repository, write_repository (optional in file-only mode) |
CLAUDE_CODE_OAUTH_TOKEN |
Claude Code OAuth token (preferred) |
ANTHROPIC_API_KEY |
Alternative: Anthropic API key (if not using OAuth) |
Note: You need either CLAUDE_CODE_OAUTH_TOKEN OR ANTHROPIC_API_KEY, not both.
Note: GITLAB_PERSONAL_ACCESS_TOKEN is optional if using file-only mode for issue tracking.
| Variable | Default | Description |
|---|---|---|
CONTEXT7_API_KEY |
- | For library documentation lookup during issue enrichment |
SEARXNG_URL |
http://localhost:8888 |
SearxNG instance URL for web search |
CLAUDE_MODEL |
claude-opus-4-5-20251101 |
Claude model to use |
GITLAB_API_URL |
https://gitlab.com/api/v4 |
For self-hosted GitLab instances |
The harness needs git access for both read operations (git fetch, git pull) and write operations (pushing code). Authentication is handled differently for each:
Write Operations (push/commit):
- Always use GitLab MCP with
GITLAB_PERSONAL_ACCESS_TOKEN - Token-based, works everywhere without additional setup
Read Operations (fetch/pull):
- Docker mode: Auto-configured using
GITLAB_PERSONAL_ACCESS_TOKENas git credential - Native mode: Uses your local git credentials (SSH or keychain)
Docker Mode (automatic):
The container automatically configures git credentials from GITLAB_PERSONAL_ACCESS_TOKEN. No additional setup required - git fetch and git pull just work.
If you prefer SSH:
# SSH agent is forwarded if running
ssh-add ~/.ssh/your_key # Add key to agent before starting
./start.sh # SSH_AUTH_SOCK is forwarded to containerNative Mode (manual setup):
For HTTP(S) remotes, store your GitLab token in the keychain:
# Erase old credential
git credential-osxkeychain erase <<EOF
protocol=https
host=gitlab.com
EOF
# Store new credential (use your GitLab PAT as password)
git credential-osxkeychain store <<EOF
protocol=https
host=gitlab.com
username=your_username
password=your_gitlab_personal_access_token
EOFFor SSH remotes (recommended for native mode):
# Ensure SSH key is added to GitLab
cat ~/.ssh/id_ed25519.pub # Add this to GitLab > Settings > SSH Keys
# Update remote to use SSH
git remote set-url origin git@gitlab.com:your/repo.gitThe easiest way to start - the TUI guides you through all configuration:
./start.sh # Docker mode (default)
./start.sh --native # Native Python mode./start.sh # Start new container
./start.sh --build # Rebuild Docker image
./start.sh --list # List running containers
./start.sh --connect # Connect to running container
./start.sh --connect coding-harness-2 # Connect to specific containerContainer Controls:
Enter- Start/restart TUICtrl+C- Stop containerCtrl+P, Ctrl+Q- Detach (keep running)Q- Quit TUI (container stays running)
For automation or scripting, provide specs as JSON:
./start.sh --specs '[{
"spec_file": "/path/to/feature-spec.txt",
"project_dir": "/path/to/your/project",
"target_branch": "main"
}]'With optional flags:
# File-only mode (no GitLab required)
./start.sh --specs '[{
"spec_file": "/path/to/feature-spec.txt",
"project_dir": "/path/to/your/project",
"target_branch": "main",
"file_only_mode": true
}]'
# Skip MR creation (keep changes on branch)
./start.sh --specs '[{
"spec_file": "/path/to/feature-spec.txt",
"project_dir": "/path/to/your/project",
"target_branch": "main",
"skip_mr_creation": true
}]'
# Both flags together
./start.sh --specs '[{
"spec_file": "/path/to/feature-spec.txt",
"project_dir": "/path/to/your/project",
"target_branch": "main",
"file_only_mode": true,
"skip_mr_creation": true
}]'Note: Git operations use GitLab MCP with
GITLAB_PERSONAL_ACCESS_TOKEN. Commits are attributed to the token owner's GitLab identity. In file-only mode, GitLab is not required for issue tracking but is still used for git operations.
Run without human approval prompts (use with caution):
# In TUI, press 'a' to toggle auto-accept mode for an agent
# Auto-accept setting is saved per-agent and can be toggled anytimeLevel 3 detail: Docker containers, persistence, and process management.
The harness uses a daemon architecture for robust agent management:
┌─────────────────────────── Docker Container ───────────────────────────┐
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Agent Daemon (python -m agent.daemon) │ │
│ │ Always running while container is up │ │
│ │ │ │
│ │ Responsibilities: │ │
│ │ • Spawns agent subprocesses (python -m agent.cli) │ │
│ │ • Monitors process lifecycle (running/stopped/failed) │ │
│ │ • Persists state to .data/daemon_state.json │ │
│ │ • Routes log output to per-agent log files │ │
│ │ │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │
│ │ │ Agent CLI │ │ Agent CLI │ │ Agent CLI │ │ │
│ │ │ (subprocess) │ │ (subprocess) │ │ (subprocess) │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │ │ │
│ │ │ │Orchestratr│ │ │ │Orchestratr│ │ │ │Orchestratr│ │ │ │
│ │ │ │ Session │ │ │ │ Session │ │ │ │ Session │ │ │ │
│ │ │ │ Phases: │ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ •Init │ │ │ │ •Coding │ │ │ (idle/done) │ │ │ │
│ │ │ │ •Coding │ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ •MR │ │ │ └───────────┘ │ │ └───────────┘ │ │ │
│ │ │ └───────────┘ │ │ │ │ │ │ │
│ │ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │ │
│ │ │ │ │ │ │
│ │ ▼ ▼ ▼ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ Log Files (in project's .claude-agent/<slug>/logs/) │ │ │
│ │ │ (persisted via $HOME mount) │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────┬───────────────────────────────┘ │
│ │ │
│ Unix Socket (/tmp/coding-harness-daemon.sock) │
│ │ │
│ ┌──────────────────────────────────▼───────────────────────────────┐ │
│ │ TUI (python -m tui.main) │ │
│ │ (ephemeral, can restart) │ │
│ │ │ │
│ │ Commands to daemon: Views: │ │
│ │ • list - get all agents • Agent list (status) │ │
│ │ • start - spawn new agent • Log viewer (tail log files) │ │
│ │ • stop - terminate agent • HITL checkpoint dialogs │ │
│ │ • status - get agent info • Session phase indicator │ │
│ │ • remove - delete agent • Git branch/status │ │
│ │ │ │
│ │ ┌────────────────────────────────────────────────────────────┐ │ │
│ │ │ Can exit freely (Ctrl+C, q, Esc) │ │ │
│ │ │ Daemon + agents continue in background │ │ │
│ │ └────────────────────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
├────────────────────────── Persistence Layer ───────────────────────────┤
│ │
│ Named Volume: ${container}-data → /app/.data/ │
│ • daemon_state.json (agent registry, status, config) │
│ │
│ Bind Mount: $HOME:$HOME │
│ • Project files (.claude-agent/<slug>-<hash>/) │
│ • Log files (logs/*.log) │
│ • Milestone state (.gitlab_milestone.json) │
│ • HITL checkpoints (.hitl_checkpoint.json) │
│ • Workspace info (.workspace_info.json) │
│ │
│ Bind Mount: /tmp:/tmp │
│ • Daemon socket (ephemeral, recreated on start) │
│ • Daemon PID file │
│ │
└────────────────────────────────────────────────────────────────────────┘
Benefits:
- TUI can exit/restart - Agents keep running in daemon
- Logs persist - Written to project's
.claude-agent/directory - True SSH-like experience - Reconnect to see agent output
- No state file hacks - Daemon IS the state
Workflow:
- Start agents in TUI
- Press
Qto quit TUI (agents keep running in daemon) - Press
Enterto restart TUI - TUI syncs with daemon, shows running agents
Detach vs Quit:
Q- Quit TUI, agents keep running in daemonCtrl+P, Ctrl+Q- Detach from container entirely (container + daemon + agents keep running)Ctrl+C- Stop container (kills daemon and all agents)
The harness uses a two-tier persistence model with Docker volumes:
PROJECT DIRECTORY (bind mount via $HOME:$HOME):
project/.claude-agent/{spec-slug}-{hash}/ ← LOCAL ONLY, never pushed to GitLab
├── .workspace_info.json # Spec config, branch, auto-accept, skip flags
├── .gitlab_milestone.json # Milestone ID, issue list, progress (GitLab mode)
├── .file_milestone.json # Milestone data (file-only mode)
├── .hitl_checkpoint_log.json # All checkpoint history with decisions
├── app_spec.txt # Copy of original specification
└── logs/ # Agent execution logs (project-scoped)
└── agent_1-20241223-120530.log
DAEMON STATE (named Docker volume per container):
/app/.data/ # Inside container, persisted via volume
└── daemon_state.json # Daemon's agent registry (for reconnect)
Important: The
.claude-agent/directory is local working storage for agents. It is never pushed to GitLab and should be in your.gitignore. Agents read/write these files directly via filesystem tools, not through git.
What survives container restart:
| Data | Location | Persists? | Notes |
|---|---|---|---|
| Checkpoint history | Project .claude-agent/ |
✅ Yes | Bind mount to host |
| Milestone state | Project .claude-agent/ |
✅ Yes | Bind mount to host |
| Agent logs | Project .claude-agent/logs/ |
✅ Yes | Bind mount to host |
| Daemon state | Docker volume | ✅ Yes | Volume {container}-data |
| Agent options | Project .workspace_info.json |
✅ Yes | Auto-accept, skip flags |
Docker volumes (per-container):
docker volume ls # List all volumes
# Volumes are named: coding-harness-data, coding-harness-2-data, etc.
docker volume inspect coding-harness-data # View volume details
docker volume rm coding-harness-data # Delete volume (reset daemon state)Atomic saves: All JSON files use atomic write (temp file + rename) to prevent corruption if the process crashes mid-write.
Alternative Approaches (from n8n Deep Agents):
Other harness implementations use different persistence strategies:
- Research artifacts table - Intermediate findings stored in DB, later synthesized
- Lock mechanisms - Timestamp-based locking prevents multiple workers on same task
- Progressive summarization - Older work is compressed more than recent work
- Staged processing - Retrieval → Synthesize → Write with separate artifact storage
This harness uses GitLab as its "artifact staging area"—comments on issues preserve research findings, and the issue itself tracks state. This trades some flexibility for simpler infrastructure (no separate database required).
External Task Tracking (Linear/Jira/Asana):
Cole Medin's Linear integration pattern shows an alternative: use external task management for remote monitoring. Benefits include:
- Monitor agent progress from mobile or any browser
- Edit task descriptions to inject human feedback mid-run
- Real-time task board updates (no page refresh needed)
- Multiple observers without SSH access
From the live stream, Cole demonstrated watching Linear update in real-time as agents completed tasks—no page refresh required. The agent makes parallel MCP calls to Linear for efficiency (creating multiple issues in batches rather than sequentially). Each issue gets a description that humans can edit mid-run, and agents pick up those edits when they start the next task.
The meta issue pattern: A special "project tracker" issue serves as the handoff artifact between sessions. Each agent comments on this issue with a summary of what it completed, which tasks it marked done, and context for the next session. This replaces the local claude-progress.txt file from Anthropic's quickstart.
Why external task management over local files: Local JSON files work fine for single-machine runs, but can't be easily monitored remotely. Cole's insight: "If we're going to have these agents run for a really long time, we need observability... we can't interrupt it because it's set up as a process to run in the background. So we need some way to communicate to it."
Token efficiency insight (from Coriolis via Ray): External task management via MCP is more token-efficient than reading/writing local JSON files. You hand off task management to the MCP server rather than having the agent parse and rewrite JSON. Agents are trained on human workflows—they naturally understand systems like Linear, making the integration feel native rather than forced.
This harness achieves similar observability through GitLab's native issue tracking—the trade-off is GitLab-specific vs. tool-agnostic integration.
- Select Repository - Choose a directory containing a git repository
- Select Spec File - Pick the specification file to implement
- Select Target Branch - Choose which branch to target for the merge request
- Agent Options - Configure behavior:
- File-only mode - Use local JSON files instead of GitLab for tracking
- Skip MR creation - Stop after coding without creating a merge request
- Skip Puppeteer - Disable browser automation testing
- Skip test suite - Disable test suite execution
- Skip regression testing - Disable regression spot-checks
- Advanced Options - Configure iterations and other settings
- Agent Execution - Watch the agent work with HITL checkpoints for your approval
A good specification file clearly describes what you want built. You don't have to write it from scratch—Cole Medin demonstrated a template + brain dump workflow:
- Give an LLM (Claude Opus 4.5 recommended) an existing app spec as a template
- Provide your own "brain dump"—a rough description of what you want to build
- Ask it to combine them into a structured spec following the template format
- Iterate with follow-up questions to fill in missing details
This approach reduces the mental burden of spec writing while ensuring comprehensive coverage. The spec doesn't need to be perfect—the initializer agent will break it into granular tasks regardless of spec detail level.
Example spec structure:
# Feature: User Authentication System
## Overview
Add a complete authentication system with login, registration, and session management.
## Requirements
- Email/password login and registration
- Session persistence across browser refreshes
- Password reset via email
- OAuth support for Google and GitHub
## Acceptance Criteria
- Users can register with email and password
- Users can log in and are redirected to dashboard
- Sessions persist for 7 days
- Password reset emails are sent within 30 seconds
- OAuth buttons appear on login page
## Technical Notes
- Use bcrypt for password hashing
- Store sessions in Redis
- Follow existing auth patterns in the codebase
The harness uses a two-phase approach to issue creation:
Phase 1: Spec-Faithful Issue Creation
Initial issues are created by transcribing the specification directly—no embellishment, no research:
Spec Detail Level → Issue Detail Level
─────────────────────────────────────────────
Detailed spec → Detailed issues
Vague spec → Vague issues
Spec silent on X → Issue omits X
The issue template adapts to whatever the spec provides:
- Summary - Transcribed requirement (spec's exact language)
- Requirement Details - Listed only if spec provides them
- Technical Notes - Included only if spec mentions constraints
- User-Facing Behavior - Included only if spec describes UX
- Test Criteria - From spec, or minimal generic criteria
- Open Questions - Flags ambiguities for later enrichment
Phase 2: Optional Deep Enrichment
After issues are created, the LLM judges which need enrichment. Human selects which to enrich (can override LLM). Selected issues get comprehensive enhancement:
| Step | Action | GitLab API |
|---|---|---|
| A | Deep Research | Context7 (library docs), grep (codebase), web search |
| B | Update Title | update_issue - make action-oriented |
| C | Replace Description | update_issue - full implementation guide |
| D | Add Research Comment | create_issue_note - raw findings |
| E | Add Dependencies Comment | create_issue_note - cross-issue links |
| F | Add Labels | update_issue - enriched, complexity-X, time-estimate-Xh |
Enriched description includes:
- Implementation guide (step-by-step with code patterns)
- API/interface specifications
- Codebase patterns to follow (with
file:linereferences) - Acceptance criteria (checkbox format)
- Test plan (table format)
- Time estimate and risk assessment
Non-enriched issues proceed to implementation with original spec-faithful description.
The harness pauses at 8 checkpoint types for human review. This reflects the autonomy balance principle: agents should be as autonomous as possible, but with easy injection points for human validation at critical decisions. The goal is strategic checkpoints, not constant interruption.
Checkpoint types:
| Checkpoint | Description | Quick Approve (Y) | Quick Reject (N) |
|---|---|---|---|
project_verification |
Validates project setup and configuration | Standard approve | Standard reject |
spec_to_issues |
Reviews proposed breakdown of spec into issues | Standard approve | Standard reject |
issue_enrichment |
Reviews issues flagged for additional context | Auto-select LLM-recommended issues | Skip all (empty list) |
regression_approval |
Handles detected regressions during development | Default to "fix_now" | |
issue_selection |
Confirms which issue to work on next | Use recommended issue | Standard reject |
issue_closure |
Reviews completed implementation before closing | Standard approve | Standard reject |
mr_phase_transition |
Gate before entering merge request creation | Standard approve | Standard reject |
mr_review |
Reviews final merge request before creation | Standard approve | Standard reject |
Checkpoint Output Format:
Each checkpoint displays a consistent structure in the agent logs:
================================================================
HITL CHECKPOINT: [CHECKPOINT NAME]
================================================================
WHAT HAPPENED:
✓ What LLM already completed
✓ What LLM verified
→ What human needs to decide
[CONTEXT-SPECIFIC DETAILS]
┌─────────────────────────────────────────────────────────────┐
│ IF APPROVED: │
│ → What happens next │
│ │
│ IF REJECTED: │
│ → What happens instead │
└─────────────────────────────────────────────────────────────┘
================================================================
TUI SHORTCUTS:
[Y] or [1] → APPROVE
[X] or [0] → REJECT
================================================================
Keyboard Shortcuts:
Yor1- Quick approve (with checkpoint-specific defaults)Nor0- Quick reject (blocked forregression_approval)R- Open full review screenEsc- Cancel/dismiss
When auto-accept is enabled (toggle with a key in TUI), checkpoints are automatically approved with intelligent defaults:
| Checkpoint | Auto-Accept Behavior |
|---|---|
issue_enrichment |
Uses LLM judgment to auto-select issues needing enrichment |
regression_approval |
Defaults to "fix_now" action |
issue_selection |
Uses the recommended issue from context |
| Others | Standard automatic approval |
Per-Agent Toggle: Each agent independently tracks its auto-accept preference in .workspace_info.json. Press a in the TUI to toggle for the selected agent. The status bar shows AUTO or HITL to indicate the current mode. Changes take effect immediately on the next checkpoint.
Why this matters for harnesses: With 95% per-step reliability, 20 steps = 36% system reliability. The only way to run agents for hours is aggressive self-validation at every step. These guardrails are what make long-running execution possible:
- Dead code detection - Checks for unused imports, variables, and arguments (via project's linter)
- Test suite execution & repair - Runs project's test suite; fixes any failing tests before proceeding
- Git log review - Checks previous commits for context on multi-session issues
- Feature regression testing - Puppeteer-based verification of completed features
- Unit test creation (mandatory) - Writes tests for all new functions, endpoints, and classes
- Test framework detection - Auto-detects pytest, Jest, Vitest, Go testing, or Cargo test
- Coverage requirements - 1 happy path + 1 edge case per function, 1 success + 1 error per endpoint
- Post-implementation regression check - Re-run test suite, spot-check previous features
- Build & quality gate - All linting, formatting, and type checks must pass
- Git status verification - Ensure clean working tree before checkpoint
- Full test suite execution - All tests must pass
- Comprehensive regression check - Test ALL completed features in milestone
- MR existence verification - Confirm MR was actually created on GitLab with retries
When tests fail, the agent follows a structured repair process:
- Read the test to understand what it's testing
- Diagnose: outdated test, implementation bug, or flaky test
- Fix appropriately (update expectation, fix code, or add deterministic waits)
- Re-run to verify fix
- After 3 failed attempts, skip with documented reason and create bug issue
- Retry wrapper - Exponential backoff for all GitLab API calls that modify data
- Issue creation verification - Confirm all issues were created successfully
- State file validation - Atomic writes prevent data corruption
Guardrails (hard stops that prevent compounding errors):
- Never implement new features if test suite is failing due to agent changes
- All tests must pass (or be explicitly skipped with reason) before new work
- Do NOT create issue_closure checkpoint if quality checks fail
- Stop MR creation if ANY regression is found
- Do NOT mark checkpoint complete until MR is verified on GitLab
These aren't suggestions—they're circuit breakers. Without them, a harness running for 24 hours would accumulate errors until the codebase is unusable.
Security Boundaries:
The Claude Agent SDK enables fine-grained control over what agents can do:
- Directory isolation - Agents can only operate within the project directory
- Command filtering - Dangerous bash commands are blocked via pre-tool-use hooks
- Tool allowlists - Only approved tools (read, write, bash, MCP) are available
- Sandboxed execution - Agents run in controlled subprocess environments
Bash Security Hook (agent/core/hooks/security.py):
The harness implements a comprehensive bash command security hook using an allowlist approach:
- Allowed commands - Only specific commands permitted:
ls,cat,head,tail,grep,cp,mkdir,chmod,npm,node,git,ps,lsof,sleep,pkill,cd,gh,echo - Sensitive command validation - Extra checks for
pkill(only dev processes),chmod(only +x), and script execution - Command injection prevention - Blocks command substitution (
$(...), backticks), subshells, and dangerous metacharacters - Path traversal prevention - Validates script paths resolve within the current directory
- Argument sanitization - Limits argument count/length, blocks shell metacharacters in arguments
Commands not in the allowlist are blocked with an explanatory error message.
Three levels of detail: Overview (L1) → Component (L2) → Runtime (L3)
┌─────────────────────────────────────────────────────────────────────────────┐
│ TUI (tui/) │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌───────────┐ │
│ │ Repo │→│ Spec │→│ Branch │→│ Agent │→│Advanced │→│ Running │ │
│ │ Screen │ │ Screen │ │ Screen │ │ Options │ │ Options │ │ (logs+ckpt│ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └───────────┘ │
└───────────────────────────────────┬─────────────────────────────────────────┘
│ daemon socket
┌───────────────────────────────────▼─────────────────────────────────────────┐
│ Daemon (agent/daemon/) │
│ manages agent subprocesses + state │
└───────────────────────────────────┬─────────────────────────────────────────┘
│
┌───────────────────────────┼───────────────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Initializer │ ────────▶ │ Coding │ ────────▶ │ MR Creation │
│ Agent │ │ Agent │ │ Agent │
│ │ │ (loop) │ │ │
│ • Verify proj │ │ • Pick issue │ │ • Sync branch │
│ • Create issue│ │ • Implement │ │ • Final tests │
│ • Enrich │ │ • Test+verify │ │ • Create MR │
│ • Make branch │ │ • Close issue │ │ • Verify MR │
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
│ │ │
└───────────────────────────┴───────────────────────────┘
│
┌───────────────────────────────────▼─────────────────────────────────────────┐
│ MCP Servers (Integration) │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ GitLab │ │Context7 │ │ SearxNG │ │Puppeteer│ │ Files │ │
│ │ issues │ │lib docs │ │ search │ │ browser │ │ quality │ │
│ │ MRs │ │ │ │ │ │ tests │ │ gates │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
When each HITL checkpoint occurs. Details in HITL Checkpoints table.
INITIALIZER CODING (per issue) MR CREATION
─────────── ────────────────── ───────────
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ project_verify │──┐ │ issue_selection │──┐ │ mr_phase_trans │
└─────────────────┘ │ └─────────────────┘ │ └─────────────────┘
│ │ │ │
┌─────────────────┐ │ ▼ │ ▼
│ spec_to_issues │──┤ (implement) │ ┌─────────────────┐
└─────────────────┘ │ │ │ │ mr_review │
│ ▼ │ └─────────────────┘
┌─────────────────┐ │ ┌─────────────────┐ │ │
│ issue_enrichment│──┘ │ regression_appr │ │ ▼
└─────────────────┘ └─────────────────┘ │ (create MR)
│ │ │
▼ ▼ │
(create branch) ┌─────────────────┐ │
│ issue_closure │──┘
└─────────────────┘
│
▼
(next issue)
| Directory | Purpose |
|---|---|
agent/ |
Core agent logic, prompts, HITL system, Claude SDK client |
agent/core/ |
Orchestrator, session runner, checkpoint handlers, security hooks |
agent/core/hooks/ |
SDK hook system for bash command security validation |
agent/daemon/ |
Background daemon for agent process management |
agent/prompts/ |
Prompt templates for initializer, coding, and MR agents |
tui/ |
Textual UI screens, terminal widget, event handling |
tui/screens/ |
Modal screens: repo/spec/branch selection, agent options, checkpoints |
common/ |
Shared types, utilities, exceptions, unified state management |
.claude/ |
Claude Code configuration (settings, skills, agents, commands) |
The coding harness uses a pluggable code quality skill system. Code quality commands are defined in skill presets rather than hardcoded, allowing any language or tooling to be used.
For this project: See .claude/skills/code-quality.md for the specific commands used.
coding-harness/
├── agent/
│ ├── __init__.py # Package exports
│ ├── cli.py # CLI entry point (python -m agent.cli)
│ ├── core/ # Core agent logic
│ │ ├── __init__.py
│ │ ├── orchestrator.py # Main agent loop
│ │ ├── client.py # Claude SDK client configuration
│ │ ├── hitl.py # HITL checkpoint file operations
│ │ ├── checkpoint_handlers.py # Strategy pattern for checkpoint types
│ │ ├── session_runner.py # Individual session execution
│ │ ├── output.py # Output formatting utilities
│ │ └── hooks/ # SDK hook system
│ │ ├── __init__.py # Hook registration
│ │ └── security.py # Bash command security validation
│ ├── daemon/ # Background daemon
│ │ ├── __init__.py
│ │ ├── __main__.py # Module entry (python -m agent.daemon)
│ │ ├── server.py # Daemon process
│ │ └── client.py # TUI client
│ ├── prompts/ # Prompt templates
│ │ ├── __init__.py # Loader functions
│ │ └── templates/ # Markdown templates
│ └── skills/ # Agent skills
│ └── code-quality/
│ └── presets/ # Language-specific presets
├── tui/
│ ├── __init__.py
│ ├── app.py # Main Textual app (connects to daemon)
│ ├── events.py # Custom event types
│ ├── log_terminal.py # Log file tailing widget
│ ├── main.py # Entry point
│ └── screens/ # TUI screen components
│ ├── agent_options_screen.py # Agent behavior options
│ ├── checkpoint_screen.py # HITL checkpoint review
│ └── ... # Other screens
├── common/
│ ├── __init__.py # Package exports
│ ├── types.py # Shared type definitions
│ ├── utils.py # Utility functions
│ ├── exceptions.py # Exception hierarchy
│ └── state.py # Unified state management
├── .claude/ # Claude Code configuration
├── .env.example # Environment template
├── Dockerfile # Docker image definition
├── docker-entrypoint.sh # Container entrypoint script
├── .dockerignore # Docker build exclusions
├── CLAUDE.md # AI assistant instructions
├── requirements.txt # Python dependencies
├── ruff.toml # Linting configuration
├── pyrightconfig.json # Type checking configuration
└── start.sh # Entry point script (Docker + native)
Runtime directories (created at runtime, not in repo):
.data/- Daemon state in native mode (daemon_state.json). Docker mode uses a named volume instead.project/.claude-agent/- Agent workspace created in target projects (see Persistence Model)
| Issue | Solution |
|---|---|
| "Docker not found" | Install Docker or use ./start.sh --native |
| Container exits immediately | Rebuild with ./start.sh --build |
| Can't see files in container | Your home directory is mounted - use absolute paths |
| "GitLab token invalid" | Verify token has scopes: api, read_api, read_repository, write_repository |
| "Claude API error" | Check CLAUDE_CODE_OAUTH_TOKEN or ANTHROPIC_API_KEY is set correctly |
| "No .git folder" | Select a directory that is a git repository |
| Agent seems stuck | Check terminal output - likely waiting for HITL checkpoint approval |
| "Permission denied" on start.sh | Run chmod +x start.sh |
| Python version error | Ensure Python 3.11+ is active in your virtual environment |
| Agent logs not showing | Logs are in project's .claude-agent/{spec}/logs/ - check file permissions |
| Daemon not starting | Check if socket exists at /tmp/coding-harness-daemon.sock |
| TUI says "Daemon not running" | Stale socket from previous container - rebuild with ./start.sh --build |
| "could not read Password" git error | Docker: Rebuild with ./start.sh --build to get auto-credentials. Native: See Git Authentication section |
| Git fetch/pull fails in container | Ensure GITLAB_PERSONAL_ACCESS_TOKEN is set in .env - it's used for git credentials in Docker |
Q: Do I need Docker?
A: No, you can use ./start.sh --native to run with a local Python virtual environment. However, native mode doesn't persist agents - they stop when you quit the TUI. For long-running tasks, use Docker.
Q: Can I run multiple containers?
A: Yes! Each new ./start.sh creates a container named coding-harness, coding-harness-2, etc.
Q: How do I keep agents running when I disconnect?
A: In Docker mode: press Q to quit TUI - agents continue in daemon. Reconnect with ./start.sh --connect or press Enter after restarting. Use Ctrl+P, Ctrl+Q to detach entirely. In native mode: agents stop when TUI exits (no persistence).
Q: Can I use this without GitLab?
A: Yes! Enable file-only mode in the TUI's Agent Options screen or use "file_only_mode": true in JSON specs. This stores milestones and issues in local JSON files instead of GitLab. Note: Git operations still use GitLab MCP for pushing commits.
Q: Can I skip merge request creation?
A: Yes! Enable skip MR creation in the TUI's Agent Options screen or use "skip_mr_creation": true in JSON specs. The agent will stop after coding completes and keep all changes on the feature branch without creating an MR.
Q: What's the difference between file-only mode and skip MR creation? A: They're independent options:
- File-only mode - Changes where issue/milestone tracking happens (GitLab vs local JSON files)
- Skip MR creation - Changes whether an MR is created at the end (can be used with or without GitLab tracking) You can use either or both together.
Q: What Claude models are supported?
A: Default is claude-opus-4-5-20251101. You can use other models by setting CLAUDE_MODEL.
Q: Can I run multiple specs at once? A: Yes! Use JSON mode with multiple spec objects in the array.
Q: How do I skip human approval?
A: Press a in the TUI to toggle auto-accept mode for the selected agent. This removes human oversight - use carefully.
Q: Can I use a different coding assistant (not Claude)? A: The architecture is portable but not implemented. During the live stream, community members confirmed OpenCode + Ollama works for local models (Qwen 3, Kimi K2). Porting requires replacing Claude Agent SDK calls with equivalent SDK (Codex SDK, OpenCode SDK, AMP SDK). The prompts and artifacts pattern transfers directly.
Q: Will I hit rate limits running for 24 hours? A: It varies. Cole Medin ran Opus 4.5 for 24 hours on the $200/month Max plan without limits, but others report hitting limits quickly. Behavior seems to vary by account and region. Mitigation: toggle between Claude Code, Codex, and OpenCode to distribute load.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run code quality checks (see
.claude/skills/code-quality.mdfor commands) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
[License type to be determined]