Coding Harness

Human-in-the-Loop AI Coding Agent with GitLab Integration (or File-Only Mode)

Note: The badges above are for the harness codebase itself. Target projects can use any language and tooling - see Pluggable Code Quality System.

Overview

What is an Agent Harness? A harness is a coordination layer or scaffolding around AI agents that allows them to work for hours (or days) on complex tasks without overwhelming their context window. Instead of asking an agent to do everything at once, a harness connects multiple agent sessions together—each starting fresh but quickly catching up via structured artifacts, then making incremental progress before handing off to the next session.

Coding Harness is an autonomous coding agent orchestration system that combines Claude AI with a terminal user interface (TUI) for milestone-based development workflows. It takes specification files, breaks them into issues (GitLab or local JSON files), and uses AI agents to implement each issue with human oversight at critical decision points.

┌──────────┐    ┌──────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Spec   │───▶│   TUI    │───▶│  Claude Agents   │───▶│ GitLab (default)│
│   File   │    │ (human)  │    │ (Init→Code→MR)   │    │ - OR -          │
└──────────┘    └────┬─────┘    └────────┬─────────┘    │ Local JSON files│
                     │                   │              └─────────────────┘
                     └───── 8 HITL ──────┘
                          checkpoints

The harness provides Human-in-the-Loop (HITL) checkpoints at 8 key stages, ensuring humans maintain control over project setup, issue breakdown, implementation approval, and merge request creation.

Conceptual Foundations

The Evolution: Prompts → Context → Harnesses

AI engineering has evolved through three stages:

Prompt Engineering (2020-2023) - Optimizing single interactions with LLMs
Context Engineering (2023-2024) - Optimizing entire sessions—what context to provide, when, and how to avoid overwhelming the model
Agent Harnesses (2024+) - Connecting multiple sessions together for long-running tasks

Early examples include Manis (2024) and LangChain's deep agents. The pattern is becoming standardized as "the next unlock for AI capability."

Why harnesses now? The raw power of LLMs isn't exploding like it was. Scaling has hit limits. The breakthrough now comes from the layer around LLMs—memory systems, handoff protocols, validation loops, and orchestration. Harnesses are this layer.

2026 prediction: Cole Medin and others predict 2026 will be "the year of agent harnesses"—shifting from experimental to production-grade, with reliable autonomous coding becoming mainstream.

Vibe coding becomes viable when you have an engineered harness with human-in-the-loop at strategic checkpoints. Without the scaffolding, delegating coding to AI fails. With it, you can trust long-running autonomous work—though the system itself requires careful engineering.

Resources

Anthropic Research:

Effective Harnesses for Long-Running Agents - Engineering blog (Nov 2025)
Autonomous Coding Quickstart - Reference implementation

Video Explainers:

I Forced Claude to Code for 24 Hours NONSTOP - Cole Medin's live test (result: 54% of 200 tests passing after 54 sessions)
Claude SDK: 24-Hour Coding Agent - Live stream with Ray, Linear integration, cloud deployment, mid-run feedback
Are Agent Harnesses Bringing Back Vibe Coding? - Evolution from prompts to harnesses, two unsolved problems
Unlock DEEP AGENTS with Anthropic's Agent Harness - n8n implementation with concurrency patterns

Related Projects (mentioned in live stream):

Archon - Cole Medin's open source "command center for AI coding" with Kanban board, knowledge curation, and MCP server. Similar concepts to Linear integration but AI-optimized.
Factory - Ray's preferred coding agent. Uses memory compression (different approach than this harness's fresh-context-per-session pattern).
Remote Agentic Coding System - Cole's upcoming project for kicking off agents from Slack, Discord, Telegram, GitHub. Combines harness ideas with remote observability.

Concepts Drawn from Each Source:

Source	Key Concepts Incorporated
Anthropic Blog	Initializer + coding agent pattern, feature list with `passes: false→true`, `claude-progress.txt` equivalent, git commits as checkpoints, one-shotting/premature-completion failures
Cole Medin (24hr)	PIV loop mental model, "priming" phase, Claude Agent SDK usage, security hooks pattern, regression testing, test-driven verification, brownfield adaptation
Cole Medin (Live)	Linear MCP integration, token efficiency, cloud deployment issues, mid-run feedback injection, meta issue pattern, harness simplicity ("just two prompts"), OAuth subscription token, alternative SDKs comparison, real-time task board observation (no refresh needed), mobile monitoring via Linear app, parallel MCP calls for speed, spec template + brain dump workflow, OpenCode + Ollama confirmed working, temperature variance between providers
Vibe Coding Video	Evolution timeline (prompts→context→harnesses), bounded attention/"dumb zone", compounding error math (95%²⁰=36%), vibe coding viability with HITL, predictive context as unsolved problem, autonomy balance
n8n Deep Agents	Concurrency patterns, lock mechanisms, research artifact staging, progressive summarization, Retrieval→Synthesize→Write pipeline, task dependency concepts

Comprehensive Feature Coverage:

Concept	Source	Status	Notes
Core Architecture
Initializer → Coding → MR agents	Anthropic	✅ Have	Extended to 3 agents with HITL
Feature/task list persistence	Anthropic	✅ Have (adapted)	GitLab issues instead of JSON
Progress file handoffs	Anthropic	✅ Have (adapted)	GitLab comments + `.gitlab_milestone.json`
Git commits as checkpoints	All	✅ Have	Structured format with metadata
Failure Mode Prevention
One-shotting prevention	Anthropic	✅ Have	ONE issue per coding session
Premature completion prevention	Anthropic	✅ Have	Quality gates + HITL before closure
Compounding error mitigation	Vibe Coding	✅ Have	Circuit breakers, regression tests
Context rot prevention	Vibe Coding	✅ Have	Fresh context per session
Validation & Testing
Browser automation (Puppeteer)	Anthropic	✅ Have	Via Puppeteer MCP
Regression testing	All	✅ Have	Before/after each issue
Unit test creation	Cole Medin	✅ Have	Mandatory for new code
Test repair loop	Cole Medin	✅ Have	Max 3 attempts then skip
Quality gates (lint/type/test)	All	✅ Have	Must pass before issue closure
Human Oversight
Human-in-the-loop checkpoints	All	✅ Have	8 checkpoint types
Autonomy balance	Vibe Coding	✅ Have	Strategic checkpoints, not constant
Auto-accept mode	-	✅ Have	For trusted runs
Memory & Persistence
File system as memory	n8n	✅ Have	`.claude-agent/` directory
Git as external memory	Cole Medin	✅ Have	Commit log for context
Progressive summarization	n8n	⚠️ Kinda have	Handoff comments, not systematic
Research artifact staging	n8n	⚠️ Kinda have	GitLab comments serve this role
Lock mechanisms	n8n	⚠️ Kinda have	Daemon handles single-agent; no multi-worker locks
Advanced Patterns
Static planning	n8n	✅ Have	Issues created upfront
Rolling/adaptive planning	n8n	❌ Don't have	No replanning based on progress
Goal-driven planning	n8n	❌ Don't have	No dynamic goal reassessment
Parallel task execution	n8n	❌ Don't have	Sequential only
Task dependency graphs	n8n	❌ Don't have	No explicit dependencies
Hybrid concurrency	n8n	❌ Don't have	No parallel retrieval → sequential synthesis
WorkTrees for parallel branches	Cole Medin	❌ Don't have	Single branch per agent
Development Modes
Greenfield (new projects)	Anthropic	✅ Have	Primary mode
Brownfield (existing codebases)	Cole Medin	✅ Have	Supported via spec design
Refactoring/migrations	Cole Medin	⚠️ Possible	Untested, needs custom validation
Multi-Assistant Support
Claude Agent SDK	Anthropic	✅ Have	Only supported assistant
Other SDKs (Codex, OpenCode, AMP)	Cole Medin	❌ Don't have	Architecture is portable, not implemented
Specialized Agents
Testing agent	Anthropic (future)	❌ Don't have	Coding agent does inline
QA agent	Anthropic (future)	⚠️ Partial	Puppeteer in coding agent
Code cleanup agent	Anthropic (future)	❌ Don't have	Coding agent does inline
Research agent	n8n	⚠️ Partial	Issue enrichment phase
Input validation agent	n8n	❌ Don't have	HITL serves this purpose
External Integrations
GitLab issues/MRs	-	✅ Have	Core integration
Linear/Jira/Asana	Cole Medin (Live)	❌ Don't have	GitLab-specific
Real-time task observation	Cole Medin (Live)	⚠️ Kinda have	GitLab updates require refresh; Linear updates live
Mobile monitoring	Cole Medin (Live)	⚠️ Kinda have	GitLab mobile app works; Linear demo'd on phone
Meta issue for handoffs	Cole Medin (Live)	✅ Have	GitLab comments on milestone; Linear uses meta issue
Context7 (library docs)	-	✅ Have	Optional enrichment
SearxNG (web search)	-	✅ Have	Optional enrichment
Spec & Planning
Template + brain dump workflow	Cole Medin (Live)	✅ Have	Give LLM template + rough idea → structured spec
Vibe planning (unstructured)	Cole Medin (Live)	⚠️ Kinda have	Supported via spec writing; not explicit phase
PIV loop (Plan-Implement-Verify)	Cole Medin	✅ Have	Core harness pattern
Authentication & Cost
OAuth subscription token	Cole Medin (Live)	✅ Have	Use Max plan instead of API credits
API key fallback	Anthropic	✅ Have	Per-token billing option
Alternative Providers
OpenCode + Ollama (local)	Cole Medin (Live)	❌ Don't have	Community-confirmed working; not implemented
OpenCode + Gemini 3	Cole Medin (Live)	❌ Don't have	Supported by OpenCode; not implemented
Parallel MCP calls	Cole Medin (Live)	⚠️ Kinda have	Claude does this automatically when appropriate

Legend: ✅ Have = Fully implemented | ⚠️ Kinda have = Partial/adapted | ❌ Don't have = Not implemented

The Three Problems Harnesses Solve

Complex projects can't be completed in a single context window. When context fills up, a new session starts with no memory. This creates three core challenges:

1. Bounded Attention (Context Rot)

As you add more information to an LLM's context, it enters what's been called the "dumb zone"—overwhelmed and making poor decisions. Harnesses solve this by:

Clearing context between sessions (fresh start)
Using external memory (files, databases, git) instead of in-context memory
Progressive summarization of older work
Handoff artifacts that capture only what's needed to continue

2. Compounding Errors (Reliability Decay)

If an agent has 95% reliability per step, over 20 steps that compounds to only 36% system reliability (0.95²⁰). Harnesses address this through:

Checkpoints with self-validation after each task
Human-in-the-loop at critical decision points
Automatic rollback via git when things go wrong
Guard rails that stop progress if quality checks fail

3. Predictive Context (Unsolved)

The hardest problem: you can't predict which observation becomes critical 10 steps later. Current approaches try to preserve everything potentially relevant, but optimal summarization—knowing exactly what future sessions will need—remains an open challenge. This harness addresses it through:

Structured handoff templates that capture known-important context
Git history as lossless memory (always recoverable)
GitLab comments preserving full context on each issue

Anthropic specifically identified two behavioral failure modes:

One-shotting - Agent tries to do everything at once, runs out of context mid-implementation, leaves codebase broken. Next session must guess what happened.
Premature completion - Agent sees progress has been made, declares the job done while major features remain unimplemented.

The Solution: Initializer + Task Agent Pattern

The core insight: Despite appearing complex, harnesses are fundamentally simple. As Cole Medin emphasized during the live stream: "Everything that seems fancy with AI coding assistants is just a bunch of prompts... all this harness is is two prompts—the initializer prompt and the coding prompt. That's it."

The most common harness architecture uses two specialized agents:

Agent	Purpose
Initializer (session 1)	Create feature list/tasks, setup environment, initialize git
Task Agent (sessions 2+)	Implement ONE task, verify it works, commit, repeat in loop

This pattern appears across implementations—Anthropic's coding quickstart, LangChain's deep agents, and custom harnesses. The task agent loops until all work is complete, with each iteration:

Prime - Read progress files, git log, understand current state ("getting bearings")
Validate - Regression test previous work before touching anything new
Execute - Implement one task, write tests, verify
Handoff - Update progress files, commit, prepare for next session

This maps to what Cole Medin calls the PIV loop (Plan-Implement-Verify)—a mental model for structured AI coding where each cycle produces verified, committed work.

The key insight: agents need structured artifacts to quickly understand project state when starting fresh. These include:

A task/feature list that tracks what's done vs remaining (never edit descriptions, only mark complete)
Progress notes summarizing recent work
Git commits as atomic checkpoints
Session handoff comments with specific next steps

How This Harness Adapts the Pattern

Aspect	Quickstart	This Harness
Work units	200 tiny test cases	GitLab issues (fewer, larger)
Granularity	Each test ≈ one session	Each issue may span multiple sessions
Progress tracking	`passes: false → true` in JSON	GitLab comments + "in-progress" label (or local JSON in file-only mode)
Completion signal	Edit JSON field	Close issue via GitLab API (or update local JSON)
State file	`claude-progress.txt`	`.gitlab_milestone.json`
Human oversight	None (fully autonomous)	8 HITL checkpoints
Interface	CLI auto-loop	TUI with log streaming
Issue tracking	Local JSON only	GitLab (default) or local JSON (file-only mode)
MR creation	N/A	Optional (can skip to keep changes on branch)

Key difference in granularity: The quickstart decomposes specs into ~200 small test cases, each small enough to complete in a single context window. This harness uses GitLab issues—larger work units that may require multiple sessions to complete.

Multi-session issue support: When an issue can't be completed in one session:

The "in-progress" label persists across sessions
Structured handoff comments include:
- Last commit SHA and message
- Progress checklist (completed / in-progress / not started)
- Files changed with change types
- Specific next steps for the following session
- Gotchas and key context
Next session reviews git log and handoff comments before continuing
The issue stays assigned until closed

Commit conventions: All commits follow a structured format for traceability:

<type>(#<issue>): <short description>

<body - what changed and why>

Files: <count> changed
Tests: <added/updated/none>
Issue: #<iid> - <title>

Types: feat, fix, test, refactor, style, docs, chore

We extend to three agents (Initializer → Coding loop → MR Creation) and add human approval gates at critical decisions—directly addressing the premature completion problem.

Advanced Patterns (Not Yet Implemented)

Agent harnesses can vary significantly based on the use case. This section documents patterns from the research that we don't yet implement—useful context for understanding the design space and future directions.

Planning Strategies:

Strategy	Description	This Harness
Static Plan	Create all tasks upfront, execute sequentially	✅ Used
Rolling Plan	Plan → execute → replan based on progress	❌ Not implemented
Goal-Driven	Constantly ask "what's the best next action?"	❌ Not implemented
Test-Driven Loop	Iterate until criteria/tests pass	✅ Partial (quality gates)

Concurrency Patterns:

Pattern	Description	This Harness
Sequential	One task at a time	✅ Used
Parallel Retrieval	Research multiple sources simultaneously	❌ Not implemented
Hybrid Concurrent	Parallel retrieval → sequential synthesis	❌ Not implemented
Dependency Graph	Tasks with explicit dependencies	❌ Not implemented

Multi-Assistant Support:

The harness architecture (prompts + artifacts) is theoretically portable to other coding assistants with SDKs (Codex, OpenCode, AMP). However, this implementation is Claude-specific. Porting would require:

Replacing Claude Agent SDK calls with equivalent SDK
Adjusting prompts for model-specific behaviors
Testing validation and tool use patterns

Confirmed working alternatives (from live stream):

OpenCode + Ollama - Tested during the stream by community member Rasmus. OpenCode supports local models via Ollama, enabling fully local harness execution with models like Qwen 3 Coder or Kimi K2.
OpenCode + Gemini 3 - Listed as supported model in OpenCode docs.

Temperature considerations: Ray noted that different providers may have different default temperatures, and this affects output quality. When using open models through providers like Fireworks or Cerebras, check the model card for recommended temperature settings. Models trained for agentic tool use (like Kimi K2, which handles hundreds of tool calls) may behave differently than general-purpose models.

WorkTrees:

Git worktrees enable parallel work on multiple branches. This harness uses a single branch per agent. Implementing worktrees would allow multiple issues to be worked simultaneously—useful for independent features with no code overlap.

Greenfield vs Brownfield Development

The Anthropic quickstart is optimized for greenfield development—building new applications from scratch. This harness supports both modes:

Mode	Description	This Harness
Greenfield	Build new project from scratch via spec file	✅ Primary mode
Brownfield	Add features to existing codebase	✅ Supported
Refactoring	Large-scale code modernization	⚠️ Possible but untested

How brownfield works:

Spec file describes features to ADD to existing codebase (not rebuild from scratch)
Issues are created relative to current project state
Agent reads existing code patterns before implementing
Issue enrichment researches codebase context (grep, file structure)

Brownfield example (from Cole Medin's video): Sean used a modified harness for TypeScript refactoring—upgrading a codebase 12 major TypeScript versions behind. The harness handled:

Breaking changes across versions
Deprecated code patterns
Validation that refactored code still compiles

Community results shared during live stream:

WebDevCody: "I tried this last night for 24 [hours]. It's pretty amazing. 128 commits in."
Multiple community members confirmed successful runs with custom applications
Rasmus: "Tested the remote agent workflow manager harness today, built a Circle clone in a few hours in Django"

Refactoring considerations:

Change spec to describe the transformation, not features
Issues become refactoring tasks (e.g., "update all X to Y pattern")
Quality gates verify no regressions introduced
May need custom validation beyond test suites

Not yet supported:

Parallel brownfield - Multiple agents on different features of same codebase
Migration scripts - Database schema or infrastructure changes
Cross-repo refactoring - Changes spanning multiple repositories

Multi-Agent Specialization (Future Work)

The Anthropic blog mentions potential specialized agents beyond the current Initializer → Coding → MR pattern:

Agent Type	Purpose	Status
Testing Agent	Write comprehensive test suites	❌ Not implemented (coding agent does inline)
QA Agent	End-to-end quality validation	⚠️ Partial (via Puppeteer in coding agent)
Code Cleanup Agent	Dead code removal, style fixes	❌ Not implemented (coding agent does inline)
Research Agent	Deep context gathering	⚠️ Partial (issue enrichment phase)

Current approach: A single coding agent handles implementation, testing, and cleanup inline. Specialization could improve quality but adds coordination complexity.

Known Limitations

Browser Automation:

Cannot see browser-native alert/confirm/prompt modals through Puppeteer MCP
Some dynamic UI elements may not be visible in screenshots
Timing issues with slow-loading content

Context Prediction:

Cannot predict which information becomes critical later
Handoff templates are heuristic, not optimal
Git history as fallback is comprehensive but token-expensive to read

Rate Limiting:

Claude subscription has usage limits (varies by plan)
Long-running agents may hit rate limits; resume from checkpoint
OAuth token lasts 1 year; API key has no expiry but costs per-token

Observations from live stream: Cole Medin reported running Opus 4.5 for 24 hours (even running two harness instances for 12 of those hours) without hitting rate limits on the $200/month Max plan. However, other community members report hitting limits quickly. Rate limit behavior appears to vary by account and region. Recommendation: use multiple coding assistants (toggle between Claude Code, Codex, OpenCode) to distribute load if you hit limits frequently.

Core Behaviors (How We Tackle Harness Challenges)

Incremental progress (solves: one-shotting): Each session works on ONE issue at a time. Large issues may span multiple sessions—progress is documented in GitLab comments with structured handoffs, and the next session continues where the last left off.

Clean state (solves: broken handoffs): Every session ends with code that's mergeable—no half-implementations, no broken tests, no uncommitted changes. If an issue isn't complete, the "in-progress" label stays on and a detailed handoff comment documents progress, last commit SHA, and next steps.

Test-driven verification (solves: compounding errors): Agents write unit tests for all new code, fix failing tests before proceeding, and verify features through browser automation. This combines automated test coverage with end-to-end UI verification—catching errors before they compound.

Quality gates (solves: premature completion): Before closing issues—linting, type checking, and test suite must pass. Regression checks on previously completed features. The agent cannot declare success without passing external validation.

Structured commits (solves: context rot): All commits follow a conventional format (feat(#42): description) with metadata (files changed, tests added, issue reference). Git becomes external memory—new sessions can read the commit log to understand what happened without relying on in-context memory.

File tracking (solves: accidental pollution): Agents track exactly which files they modify in session_files.tracked. Only those files get pushed—never pre-existing user changes or unrelated modifications.

Operating Modes

The harness supports flexible operating modes to fit different workflows:

GitLab Mode (Default)

Issues and milestones tracked in GitLab
MR created automatically when coding completes
Full observability through GitLab's web interface

File-Only Mode

Issues and milestones stored in local JSON files (.claude-agent/<spec>/)
No GitLab account required for issue tracking
Useful for: offline work, private repos, quick prototyping
Enable via TUI checkbox or "file_only_mode": true in JSON specs

Skip MR Creation

Agent stops after coding completes
All changes remain on the feature branch
Useful for: manual review before MR, experimental work, draft features
Enable via TUI checkbox or "skip_mr_creation": true in JSON specs

Combined Modes All modes can be combined. For example, file-only mode + skip MR creation gives you a fully local workflow where the agent codes but you handle all git operations manually afterward.

Features

Each feature below addresses a specific harness challenge—context management, reliability, or coordination.

Interactive TUI - Textual-based terminal interface for intuitive agent management
GitLab Integration - Automatic milestone creation, issue management, and merge request generation
File-Only Mode - Optional local JSON-based tracking instead of GitLab (no GitLab account required)
Skip MR Creation - Option to stop after coding without creating a merge request (keep changes on branch)
Configurable Testing - Skip Puppeteer, test suite, or regression testing via TUI options
8 HITL Checkpoints - Human approval gates for project verification, issue breakdown, implementation review, and more
Spec-to-Issues Breakdown - AI-powered conversion of specifications into actionable GitLab issues
Issue Enrichment - Optional Context7 and SearxNG integration for researching libraries and best practices
Multi-Spec Support - Run multiple specification files in batch mode
Real-Time Output - Log-tailing terminal widget streams agent output live
Atomic State Persistence - Checkpoint state is safely persisted to prevent data loss
Unit Test Creation - Agents write tests for all new code (pytest, Jest, Vitest, Go, Rust)
Test Repair Loop - Failing tests are automatically fixed before proceeding with new work
Regression Testing - Automated verification of existing features before and after implementation
Quality Gates - Mandatory code quality checks (linting, formatting, type checking) before issue closure
Structured Commits - Conventional commit format with issue references and metadata
File Tracking - Agents track files they modify; only pushes agent-created changes (never user's uncommitted work)
Multi-Session Handoffs - Detailed handoff comments with commit SHAs for seamless session continuity
Verification Loops - GitLab API calls with retry logic, issue creation verification, MR existence validation
Security Hooks - Bash command allowlist validation prevents dangerous operations

Prerequisites

Docker (recommended) - Install Docker
OR Python 3.11+ - Download Python (for native mode)
Git - For repository operations
GitLab Account - With API access token (Create Token)
- Optional if using file-only mode for issue tracking
Claude API Access - Either:
- Claude Code OAuth token (recommended) - Generate with claude setup-token
- Anthropic API key - Get API Key
Optional:
- Context7 API key for library documentation lookup
- SearxNG instance for web search enrichment

Installation

Docker Mode (Recommended)

# Clone the repository
git clone <repository-url>
cd coding-harness

# Copy environment template
cp .env.example .env

# Edit .env with your credentials (see Configuration section)
nano .env  # or your preferred editor

# Start (Docker image builds automatically on first run)
./start.sh

Docker mode benefits:

Daemon + agents persist when TUI exits
Reconnect anytime with ./start.sh --connect
True SSH-like experience

Native Mode (Without Docker)

# Clone the repository
git clone <repository-url>
cd coding-harness

# Create and activate virtual environment
python3.11 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Copy environment template
cp .env.example .env

# Edit .env with your credentials (see Configuration section)
nano .env  # or your preferred editor

# Start in native mode
./start.sh --native

Native mode differences:

Daemon is started automatically but killed when TUI exits
No persistence - agents stop when you quit
Good for quick testing, not long-running tasks

Configuration

Required Environment Variables

Variable	Description
`GITLAB_PERSONAL_ACCESS_TOKEN`	GitLab token with scopes: `api`, `read_api`, `read_repository`, `write_repository` (optional in file-only mode)
`CLAUDE_CODE_OAUTH_TOKEN`	Claude Code OAuth token (preferred)
`ANTHROPIC_API_KEY`	Alternative: Anthropic API key (if not using OAuth)

Note: You need either CLAUDE_CODE_OAUTH_TOKEN OR ANTHROPIC_API_KEY, not both. Note: GITLAB_PERSONAL_ACCESS_TOKEN is optional if using file-only mode for issue tracking.

Optional Environment Variables

Variable	Default	Description
`CONTEXT7_API_KEY`	-	For library documentation lookup during issue enrichment
`SEARXNG_URL`	`http://localhost:8888`	SearxNG instance URL for web search
`CLAUDE_MODEL`	`claude-opus-4-5-20251101`	Claude model to use
`GITLAB_API_URL`	`https://gitlab.com/api/v4`	For self-hosted GitLab instances

Git Authentication

The harness needs git access for both read operations (git fetch, git pull) and write operations (pushing code). Authentication is handled differently for each:

Write Operations (push/commit):

Always use GitLab MCP with GITLAB_PERSONAL_ACCESS_TOKEN
Token-based, works everywhere without additional setup

Read Operations (fetch/pull):

Docker mode: Auto-configured using GITLAB_PERSONAL_ACCESS_TOKEN as git credential
Native mode: Uses your local git credentials (SSH or keychain)

Docker Mode (automatic):

The container automatically configures git credentials from GITLAB_PERSONAL_ACCESS_TOKEN. No additional setup required - git fetch and git pull just work.

If you prefer SSH:

# SSH agent is forwarded if running
ssh-add ~/.ssh/your_key  # Add key to agent before starting
./start.sh               # SSH_AUTH_SOCK is forwarded to container

Native Mode (manual setup):

For HTTP(S) remotes, store your GitLab token in the keychain:

# Erase old credential
git credential-osxkeychain erase <<EOF
protocol=https
host=gitlab.com
EOF

# Store new credential (use your GitLab PAT as password)
git credential-osxkeychain store <<EOF
protocol=https
host=gitlab.com
username=your_username
password=your_gitlab_personal_access_token
EOF

For SSH remotes (recommended for native mode):

# Ensure SSH key is added to GitLab
cat ~/.ssh/id_ed25519.pub  # Add this to GitLab > Settings > SSH Keys

# Update remote to use SSH
git remote set-url origin git@gitlab.com:your/repo.git

Quick Start

Interactive Mode

The easiest way to start - the TUI guides you through all configuration:

./start.sh                # Docker mode (default)
./start.sh --native       # Native Python mode

Docker Container Management

./start.sh                # Start new container
./start.sh --build        # Rebuild Docker image
./start.sh --list         # List running containers
./start.sh --connect      # Connect to running container
./start.sh --connect coding-harness-2  # Connect to specific container

Container Controls:

Enter - Start/restart TUI
Ctrl+C - Stop container
Ctrl+P, Ctrl+Q - Detach (keep running)
Q - Quit TUI (container stays running)

Programmatic Mode

For automation or scripting, provide specs as JSON:

./start.sh --specs '[{
  "spec_file": "/path/to/feature-spec.txt",
  "project_dir": "/path/to/your/project",
  "target_branch": "main"
}]'

With optional flags:

# File-only mode (no GitLab required)
./start.sh --specs '[{
  "spec_file": "/path/to/feature-spec.txt",
  "project_dir": "/path/to/your/project",
  "target_branch": "main",
  "file_only_mode": true
}]'

# Skip MR creation (keep changes on branch)
./start.sh --specs '[{
  "spec_file": "/path/to/feature-spec.txt",
  "project_dir": "/path/to/your/project",
  "target_branch": "main",
  "skip_mr_creation": true
}]'

# Both flags together
./start.sh --specs '[{
  "spec_file": "/path/to/feature-spec.txt",
  "project_dir": "/path/to/your/project",
  "target_branch": "main",
  "file_only_mode": true,
  "skip_mr_creation": true
}]'

Note: Git operations use GitLab MCP with GITLAB_PERSONAL_ACCESS_TOKEN. Commits are attributed to the token owner's GitLab identity. In file-only mode, GitLab is not required for issue tracking but is still used for git operations.

Auto-Accept Mode

Run without human approval prompts (use with caution):

# In TUI, press 'a' to toggle auto-accept mode for an agent
# Auto-accept setting is saved per-agent and can be toggled anytime

Usage

Runtime Architecture

Level 3 detail: Docker containers, persistence, and process management.

The harness uses a daemon architecture for robust agent management:

  ┌─────────────────────────── Docker Container ───────────────────────────┐
  │                                                                        │
  │  ┌──────────────────────────────────────────────────────────────────┐  │
  │  │               Agent Daemon (python -m agent.daemon)              │  │
  │  │               Always running while container is up               │  │
  │  │                                                                  │  │
  │  │  Responsibilities:                                               │  │
  │  │    • Spawns agent subprocesses (python -m agent.cli)             │  │
  │  │    • Monitors process lifecycle (running/stopped/failed)         │  │
  │  │    • Persists state to .data/daemon_state.json                   │  │
  │  │    • Routes log output to per-agent log files                    │  │
  │  │                                                                  │  │
  │  │  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐   │  │
  │  │  │   Agent CLI     │  │   Agent CLI     │  │   Agent CLI     │   │  │
  │  │  │  (subprocess)   │  │  (subprocess)   │  │  (subprocess)   │   │  │
  │  │  │                 │  │                 │  │                 │   │  │
  │  │  │  ┌───────────┐  │  │  ┌───────────┐  │  │  ┌───────────┐  │   │  │
  │  │  │  │Orchestratr│  │  │  │Orchestratr│  │  │  │Orchestratr│  │   │  │
  │  │  │  │  Session  │  │  │  │  Session  │  │  │  │  Session  │  │   │  │
  │  │  │  │  Phases:  │  │  │  │           │  │  │  │           │  │   │  │
  │  │  │  │ •Init     │  │  │  │ •Coding   │  │  │ (idle/done)  │  │   │  │
  │  │  │  │ •Coding   │  │  │  │           │  │  │  │           │  │   │  │
  │  │  │  │ •MR       │  │  │  └───────────┘  │  │  └───────────┘  │   │  │
  │  │  │  └───────────┘  │  │                 │  │                 │   │  │
  │  │  └────────┬────────┘  └────────┬────────┘  └────────┬────────┘   │  │
  │  │           │                    │                    │            │  │
  │  │           ▼                    ▼                    ▼            │  │
  │  │  ┌─────────────────────────────────────────────────────────────┐ │  │
  │  │  │     Log Files (in project's .claude-agent/<slug>/logs/)     │ │  │
  │  │  │                    (persisted via $HOME mount)              │ │  │
  │  │  └─────────────────────────────────────────────────────────────┘ │  │
  │  └──────────────────────────────────┬───────────────────────────────┘  │
  │                                     │                                  │
  │                      Unix Socket (/tmp/coding-harness-daemon.sock)     │
  │                                     │                                  │
  │  ┌──────────────────────────────────▼───────────────────────────────┐  │
  │  │                      TUI (python -m tui.main)                    │  │
  │  │                         (ephemeral, can restart)                 │  │
  │  │                                                                  │  │
  │  │  Commands to daemon:           Views:                            │  │
  │  │    • list - get all agents       • Agent list (status)           │  │
  │  │    • start - spawn new agent     • Log viewer (tail log files)   │  │
  │  │    • stop - terminate agent      • HITL checkpoint dialogs       │  │
  │  │    • status - get agent info     • Session phase indicator       │  │
  │  │    • remove - delete agent       • Git branch/status             │  │
  │  │                                                                  │  │
  │  │  ┌────────────────────────────────────────────────────────────┐  │  │
  │  │  │              Can exit freely (Ctrl+C, q, Esc)              │  │  │
  │  │  │           Daemon + agents continue in background           │  │  │
  │  │  └────────────────────────────────────────────────────────────┘  │  │
  │  └──────────────────────────────────────────────────────────────────┘  │
  │                                                                        │
  ├────────────────────────── Persistence Layer ───────────────────────────┤
  │                                                                        │
  │  Named Volume: ${container}-data → /app/.data/                         │
  │    • daemon_state.json (agent registry, status, config)                │
  │                                                                        │
  │  Bind Mount: $HOME:$HOME                                               │
  │    • Project files (.claude-agent/<slug>-<hash>/)                      │
  │    • Log files (logs/*.log)                                            │
  │    • Milestone state (.gitlab_milestone.json)                          │
  │    • HITL checkpoints (.hitl_checkpoint.json)                          │
  │    • Workspace info (.workspace_info.json)                             │
  │                                                                        │
  │  Bind Mount: /tmp:/tmp                                                 │
  │    • Daemon socket (ephemeral, recreated on start)                     │
  │    • Daemon PID file                                                   │
  │                                                                        │
  └────────────────────────────────────────────────────────────────────────┘

Benefits:

TUI can exit/restart - Agents keep running in daemon
Logs persist - Written to project's .claude-agent/ directory
True SSH-like experience - Reconnect to see agent output
No state file hacks - Daemon IS the state

Workflow:

Start agents in TUI
Press Q to quit TUI (agents keep running in daemon)
Press Enter to restart TUI
TUI syncs with daemon, shows running agents

Detach vs Quit:

Q - Quit TUI, agents keep running in daemon
Ctrl+P, Ctrl+Q - Detach from container entirely (container + daemon + agents keep running)
Ctrl+C - Stop container (kills daemon and all agents)

Persistence Model

The harness uses a two-tier persistence model with Docker volumes:

PROJECT DIRECTORY (bind mount via $HOME:$HOME):
project/.claude-agent/{spec-slug}-{hash}/    ← LOCAL ONLY, never pushed to GitLab
├── .workspace_info.json      # Spec config, branch, auto-accept, skip flags
├── .gitlab_milestone.json    # Milestone ID, issue list, progress (GitLab mode)
├── .file_milestone.json      # Milestone data (file-only mode)
├── .hitl_checkpoint_log.json # All checkpoint history with decisions
├── app_spec.txt              # Copy of original specification
└── logs/                     # Agent execution logs (project-scoped)
    └── agent_1-20241223-120530.log

DAEMON STATE (named Docker volume per container):
/app/.data/                   # Inside container, persisted via volume
└── daemon_state.json         # Daemon's agent registry (for reconnect)

Important: The .claude-agent/ directory is local working storage for agents. It is never pushed to GitLab and should be in your .gitignore. Agents read/write these files directly via filesystem tools, not through git.

What survives container restart:

Data	Location	Persists?	Notes
Checkpoint history	Project `.claude-agent/`	✅ Yes	Bind mount to host
Milestone state	Project `.claude-agent/`	✅ Yes	Bind mount to host
Agent logs	Project `.claude-agent/logs/`	✅ Yes	Bind mount to host
Daemon state	Docker volume	✅ Yes	Volume `{container}-data`
Agent options	Project `.workspace_info.json`	✅ Yes	Auto-accept, skip flags

Docker volumes (per-container):

docker volume ls                          # List all volumes
# Volumes are named: coding-harness-data, coding-harness-2-data, etc.
docker volume inspect coding-harness-data # View volume details
docker volume rm coding-harness-data      # Delete volume (reset daemon state)

Atomic saves: All JSON files use atomic write (temp file + rename) to prevent corruption if the process crashes mid-write.

Alternative Approaches (from n8n Deep Agents):

Other harness implementations use different persistence strategies:

Research artifacts table - Intermediate findings stored in DB, later synthesized
Lock mechanisms - Timestamp-based locking prevents multiple workers on same task
Progressive summarization - Older work is compressed more than recent work
Staged processing - Retrieval → Synthesize → Write with separate artifact storage

This harness uses GitLab as its "artifact staging area"—comments on issues preserve research findings, and the issue itself tracks state. This trades some flexibility for simpler infrastructure (no separate database required).

External Task Tracking (Linear/Jira/Asana):

Cole Medin's Linear integration pattern shows an alternative: use external task management for remote monitoring. Benefits include:

Monitor agent progress from mobile or any browser
Edit task descriptions to inject human feedback mid-run
Real-time task board updates (no page refresh needed)
Multiple observers without SSH access

From the live stream, Cole demonstrated watching Linear update in real-time as agents completed tasks—no page refresh required. The agent makes parallel MCP calls to Linear for efficiency (creating multiple issues in batches rather than sequentially). Each issue gets a description that humans can edit mid-run, and agents pick up those edits when they start the next task.

The meta issue pattern: A special "project tracker" issue serves as the handoff artifact between sessions. Each agent comments on this issue with a summary of what it completed, which tasks it marked done, and context for the next session. This replaces the local claude-progress.txt file from Anthropic's quickstart.

Why external task management over local files: Local JSON files work fine for single-machine runs, but can't be easily monitored remotely. Cole's insight: "If we're going to have these agents run for a really long time, we need observability... we can't interrupt it because it's set up as a process to run in the background. So we need some way to communicate to it."

Token efficiency insight (from Coriolis via Ray): External task management via MCP is more token-efficient than reading/writing local JSON files. You hand off task management to the MCP server rather than having the agent parse and rewrite JSON. Agents are trained on human workflows—they naturally understand systems like Linear, making the integration feel native rather than forced.

This harness achieves similar observability through GitLab's native issue tracking—the trade-off is GitLab-specific vs. tool-agnostic integration.

TUI Workflow

Select Repository - Choose a directory containing a git repository
Select Spec File - Pick the specification file to implement
Select Target Branch - Choose which branch to target for the merge request
Agent Options - Configure behavior:
- File-only mode - Use local JSON files instead of GitLab for tracking
- Skip MR creation - Stop after coding without creating a merge request
- Skip Puppeteer - Disable browser automation testing
- Skip test suite - Disable test suite execution
- Skip regression testing - Disable regression spot-checks
Advanced Options - Configure iterations and other settings
Agent Execution - Watch the agent work with HITL checkpoints for your approval

Writing a Specification File

A good specification file clearly describes what you want built. You don't have to write it from scratch—Cole Medin demonstrated a template + brain dump workflow:

Give an LLM (Claude Opus 4.5 recommended) an existing app spec as a template
Provide your own "brain dump"—a rough description of what you want to build
Ask it to combine them into a structured spec following the template format
Iterate with follow-up questions to fill in missing details

This approach reduces the mental burden of spec writing while ensuring comprehensive coverage. The spec doesn't need to be perfect—the initializer agent will break it into granular tasks regardless of spec detail level.

Example spec structure:

# Feature: User Authentication System

## Overview
Add a complete authentication system with login, registration, and session management.

## Requirements
- Email/password login and registration
- Session persistence across browser refreshes
- Password reset via email
- OAuth support for Google and GitHub

## Acceptance Criteria
- Users can register with email and password
- Users can log in and are redirected to dashboard
- Sessions persist for 7 days
- Password reset emails are sent within 30 seconds
- OAuth buttons appear on login page

## Technical Notes
- Use bcrypt for password hashing
- Store sessions in Redis
- Follow existing auth patterns in the codebase

Issue Creation & Enrichment

The harness uses a two-phase approach to issue creation:

Phase 1: Spec-Faithful Issue Creation

Initial issues are created by transcribing the specification directly—no embellishment, no research:

Spec Detail Level    →    Issue Detail Level
─────────────────────────────────────────────
Detailed spec        →    Detailed issues
Vague spec           →    Vague issues
Spec silent on X     →    Issue omits X

The issue template adapts to whatever the spec provides:

Summary - Transcribed requirement (spec's exact language)
Requirement Details - Listed only if spec provides them
Technical Notes - Included only if spec mentions constraints
User-Facing Behavior - Included only if spec describes UX
Test Criteria - From spec, or minimal generic criteria
Open Questions - Flags ambiguities for later enrichment

Phase 2: Optional Deep Enrichment

After issues are created, the LLM judges which need enrichment. Human selects which to enrich (can override LLM). Selected issues get comprehensive enhancement:

Step	Action	GitLab API
A	Deep Research	Context7 (library docs), grep (codebase), web search
B	Update Title	`update_issue` - make action-oriented
C	Replace Description	`update_issue` - full implementation guide
D	Add Research Comment	`create_issue_note` - raw findings
E	Add Dependencies Comment	`create_issue_note` - cross-issue links
F	Add Labels	`update_issue` - `enriched`, `complexity-X`, `time-estimate-Xh`

Enriched description includes:

Implementation guide (step-by-step with code patterns)
API/interface specifications
Codebase patterns to follow (with file:line references)
Acceptance criteria (checkbox format)
Test plan (table format)
Time estimate and risk assessment

Non-enriched issues proceed to implementation with original spec-faithful description.

HITL Checkpoints

The harness pauses at 8 checkpoint types for human review. This reflects the autonomy balance principle: agents should be as autonomous as possible, but with easy injection points for human validation at critical decisions. The goal is strategic checkpoints, not constant interruption.

Checkpoint types:

Checkpoint	Description	Quick Approve (Y)	Quick Reject (N)
`project_verification`	Validates project setup and configuration	Standard approve	Standard reject
`spec_to_issues`	Reviews proposed breakdown of spec into issues	Standard approve	Standard reject
`issue_enrichment`	Reviews issues flagged for additional context	Auto-select LLM-recommended issues	Skip all (empty list)
`regression_approval`	Handles detected regressions during development	Default to "fix_now"	⚠️ Blocked - requires explicit action
`issue_selection`	Confirms which issue to work on next	Use recommended issue	Standard reject
`issue_closure`	Reviews completed implementation before closing	Standard approve	Standard reject
`mr_phase_transition`	Gate before entering merge request creation	Standard approve	Standard reject
`mr_review`	Reviews final merge request before creation	Standard approve	Standard reject

Checkpoint Output Format:

Each checkpoint displays a consistent structure in the agent logs:

================================================================
HITL CHECKPOINT: [CHECKPOINT NAME]
================================================================

WHAT HAPPENED:
  ✓ What LLM already completed
  ✓ What LLM verified
  → What human needs to decide

[CONTEXT-SPECIFIC DETAILS]

┌─────────────────────────────────────────────────────────────┐
│  IF APPROVED:                                               │
│    → What happens next                                      │
│                                                             │
│  IF REJECTED:                                               │
│    → What happens instead                                   │
└─────────────────────────────────────────────────────────────┘

================================================================
  TUI SHORTCUTS:
    [Y] or [1]  →  APPROVE
    [X] or [0]  →  REJECT
================================================================

Keyboard Shortcuts:

Y or 1 - Quick approve (with checkpoint-specific defaults)
N or 0 - Quick reject (blocked for regression_approval)
R - Open full review screen
Esc - Cancel/dismiss

Auto-Accept Mode

When auto-accept is enabled (toggle with a key in TUI), checkpoints are automatically approved with intelligent defaults:

Checkpoint	Auto-Accept Behavior
`issue_enrichment`	Uses LLM judgment to auto-select issues needing enrichment
`regression_approval`	Defaults to "fix_now" action
`issue_selection`	Uses the recommended issue from context
Others	Standard automatic approval

Per-Agent Toggle: Each agent independently tracks its auto-accept preference in .workspace_info.json. Press a in the TUI to toggle for the selected agent. The status bar shows AUTO or HITL to indicate the current mode. Changes take effect immediately on the next checkpoint.

Quality Assurance & Guardrails

Why this matters for harnesses: With 95% per-step reliability, 20 steps = 36% system reliability. The only way to run agents for hours is aggressive self-validation at every step. These guardrails are what make long-running execution possible:

Before Starting Work (Coding Agent)

Dead code detection - Checks for unused imports, variables, and arguments (via project's linter)
Test suite execution & repair - Runs project's test suite; fixes any failing tests before proceeding
Git log review - Checks previous commits for context on multi-session issues
Feature regression testing - Puppeteer-based verification of completed features

During Implementation

Unit test creation (mandatory) - Writes tests for all new functions, endpoints, and classes
Test framework detection - Auto-detects pytest, Jest, Vitest, Go testing, or Cargo test
Coverage requirements - 1 happy path + 1 edge case per function, 1 success + 1 error per endpoint

Before Issue Closure

Post-implementation regression check - Re-run test suite, spot-check previous features
Build & quality gate - All linting, formatting, and type checks must pass
Git status verification - Ensure clean working tree before checkpoint

Before MR Creation

Full test suite execution - All tests must pass
Comprehensive regression check - Test ALL completed features in milestone
MR existence verification - Confirm MR was actually created on GitLab with retries

Test Repair Loop

When tests fail, the agent follows a structured repair process:

Read the test to understand what it's testing
Diagnose: outdated test, implementation bug, or flaky test
Fix appropriately (update expectation, fix code, or add deterministic waits)
Re-run to verify fix
After 3 failed attempts, skip with documented reason and create bug issue

GitLab API Reliability

Retry wrapper - Exponential backoff for all GitLab API calls that modify data
Issue creation verification - Confirm all issues were created successfully
State file validation - Atomic writes prevent data corruption

Guardrails (hard stops that prevent compounding errors):

Never implement new features if test suite is failing due to agent changes
All tests must pass (or be explicitly skipped with reason) before new work
Do NOT create issue_closure checkpoint if quality checks fail
Stop MR creation if ANY regression is found
Do NOT mark checkpoint complete until MR is verified on GitLab

These aren't suggestions—they're circuit breakers. Without them, a harness running for 24 hours would accumulate errors until the codebase is unusable.

Security Boundaries:

The Claude Agent SDK enables fine-grained control over what agents can do:

Directory isolation - Agents can only operate within the project directory
Command filtering - Dangerous bash commands are blocked via pre-tool-use hooks
Tool allowlists - Only approved tools (read, write, bash, MCP) are available
Sandboxed execution - Agents run in controlled subprocess environments

Bash Security Hook (agent/core/hooks/security.py):

The harness implements a comprehensive bash command security hook using an allowlist approach:

Allowed commands - Only specific commands permitted: ls, cat, head, tail, grep, cp, mkdir, chmod, npm, node, git, ps, lsof, sleep, pkill, cd, gh, echo
Sensitive command validation - Extra checks for pkill (only dev processes), chmod (only +x), and script execution
Command injection prevention - Blocks command substitution ($(...), backticks), subshells, and dangerous metacharacters
Path traversal prevention - Validates script paths resolve within the current directory
Argument sanitization - Limits argument count/length, blocks shell metacharacters in arguments

Commands not in the allowlist are blocked with an explanatory error message.

Architecture

Three levels of detail: Overview (L1) → Component (L2) → Runtime (L3)

Component Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                              TUI (tui/)                                     │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌───────────┐  │
│  │  Repo   │→│  Spec   │→│ Branch  │→│ Agent   │→│Advanced │→│  Running  │  │
│  │ Screen  │ │ Screen  │ │ Screen  │ │ Options │ │ Options │ │ (logs+ckpt│  │
│  └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └───────────┘  │
└───────────────────────────────────┬─────────────────────────────────────────┘
                                    │ daemon socket
┌───────────────────────────────────▼─────────────────────────────────────────┐
│                           Daemon (agent/daemon/)                            │
│                     manages agent subprocesses + state                      │
└───────────────────────────────────┬─────────────────────────────────────────┘
                                    │
        ┌───────────────────────────┼───────────────────────────┐
        ▼                           ▼                           ▼
┌───────────────┐           ┌───────────────┐           ┌───────────────┐
│  Initializer  │ ────────▶ │    Coding     │ ────────▶ │  MR Creation  │
│    Agent      │           │    Agent      │           │    Agent      │
│               │           │    (loop)     │           │               │
│ • Verify proj │           │ • Pick issue  │           │ • Sync branch │
│ • Create issue│           │ • Implement   │           │ • Final tests │
│ • Enrich      │           │ • Test+verify │           │ • Create MR   │
│ • Make branch │           │ • Close issue │           │ • Verify MR   │
└───────┬───────┘           └───────┬───────┘           └───────┬───────┘
        │                           │                           │
        └───────────────────────────┴───────────────────────────┘
                                    │
┌───────────────────────────────────▼─────────────────────────────────────────┐
│                         MCP Servers (Integration)                           │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐           │
│  │ GitLab  │  │Context7 │  │ SearxNG │  │Puppeteer│  │  Files  │           │
│  │ issues  │  │lib docs │  │ search  │  │ browser │  │ quality │           │
│  │   MRs   │  │         │  │         │  │  tests  │  │  gates  │           │
│  └─────────┘  └─────────┘  └─────────┘  └─────────┘  └─────────┘           │
└─────────────────────────────────────────────────────────────────────────────┘

Checkpoint Flow

When each HITL checkpoint occurs. Details in HITL Checkpoints table.

INITIALIZER                    CODING (per issue)              MR CREATION
───────────                    ──────────────────              ───────────
    │                               │                              │
    ▼                               ▼                              ▼
┌─────────────────┐          ┌─────────────────┐          ┌─────────────────┐
│ project_verify  │──┐       │ issue_selection │──┐       │ mr_phase_trans  │
└─────────────────┘  │       └─────────────────┘  │       └─────────────────┘
                     │              │              │              │
┌─────────────────┐  │              ▼              │              ▼
│ spec_to_issues  │──┤        (implement)         │       ┌─────────────────┐
└─────────────────┘  │              │              │       │   mr_review     │
                     │              ▼              │       └─────────────────┘
┌─────────────────┐  │       ┌─────────────────┐  │              │
│ issue_enrichment│──┘       │ regression_appr │  │              ▼
└─────────────────┘          └─────────────────┘  │         (create MR)
        │                           │              │
        ▼                           ▼              │
   (create branch)           ┌─────────────────┐  │
                             │ issue_closure   │──┘
                             └─────────────────┘
                                    │
                                    ▼
                              (next issue)

Component Overview

Directory	Purpose
`agent/`	Core agent logic, prompts, HITL system, Claude SDK client
`agent/core/`	Orchestrator, session runner, checkpoint handlers, security hooks
`agent/core/hooks/`	SDK hook system for bash command security validation
`agent/daemon/`	Background daemon for agent process management
`agent/prompts/`	Prompt templates for initializer, coding, and MR agents
`tui/`	Textual UI screens, terminal widget, event handling
`tui/screens/`	Modal screens: repo/spec/branch selection, agent options, checkpoints
`common/`	Shared types, utilities, exceptions, unified state management
`.claude/`	Claude Code configuration (settings, skills, agents, commands)

Development

Code Quality

The coding harness uses a pluggable code quality skill system. Code quality commands are defined in skill presets rather than hardcoded, allowing any language or tooling to be used.

For this project: See .claude/skills/code-quality.md for the specific commands used.

Project Structure

coding-harness/
├── agent/
│   ├── __init__.py        # Package exports
│   ├── cli.py             # CLI entry point (python -m agent.cli)
│   ├── core/              # Core agent logic
│   │   ├── __init__.py
│   │   ├── orchestrator.py       # Main agent loop
│   │   ├── client.py             # Claude SDK client configuration
│   │   ├── hitl.py               # HITL checkpoint file operations
│   │   ├── checkpoint_handlers.py # Strategy pattern for checkpoint types
│   │   ├── session_runner.py     # Individual session execution
│   │   ├── output.py             # Output formatting utilities
│   │   └── hooks/                # SDK hook system
│   │       ├── __init__.py       # Hook registration
│   │       └── security.py       # Bash command security validation
│   ├── daemon/            # Background daemon
│   │   ├── __init__.py
│   │   ├── __main__.py      # Module entry (python -m agent.daemon)
│   │   ├── server.py        # Daemon process
│   │   └── client.py        # TUI client
│   ├── prompts/           # Prompt templates
│   │   ├── __init__.py      # Loader functions
│   │   └── templates/       # Markdown templates
│   └── skills/            # Agent skills
│       └── code-quality/
│           └── presets/     # Language-specific presets
├── tui/
│   ├── __init__.py
│   ├── app.py             # Main Textual app (connects to daemon)
│   ├── events.py          # Custom event types
│   ├── log_terminal.py    # Log file tailing widget
│   ├── main.py            # Entry point
│   └── screens/           # TUI screen components
│       ├── agent_options_screen.py  # Agent behavior options
│       ├── checkpoint_screen.py     # HITL checkpoint review
│       └── ...                      # Other screens
├── common/
│   ├── __init__.py        # Package exports
│   ├── types.py           # Shared type definitions
│   ├── utils.py           # Utility functions
│   ├── exceptions.py      # Exception hierarchy
│   └── state.py           # Unified state management
├── .claude/             # Claude Code configuration
├── .env.example         # Environment template
├── Dockerfile           # Docker image definition
├── docker-entrypoint.sh # Container entrypoint script
├── .dockerignore        # Docker build exclusions
├── CLAUDE.md           # AI assistant instructions
├── requirements.txt    # Python dependencies
├── ruff.toml           # Linting configuration
├── pyrightconfig.json  # Type checking configuration
└── start.sh            # Entry point script (Docker + native)

Runtime directories (created at runtime, not in repo):

.data/ - Daemon state in native mode (daemon_state.json). Docker mode uses a named volume instead.
project/.claude-agent/ - Agent workspace created in target projects (see Persistence Model)

Troubleshooting

Issue	Solution
"Docker not found"	Install Docker or use `./start.sh --native`
Container exits immediately	Rebuild with `./start.sh --build`
Can't see files in container	Your home directory is mounted - use absolute paths
"GitLab token invalid"	Verify token has scopes: `api`, `read_api`, `read_repository`, `write_repository`
"Claude API error"	Check `CLAUDE_CODE_OAUTH_TOKEN` or `ANTHROPIC_API_KEY` is set correctly
"No .git folder"	Select a directory that is a git repository
Agent seems stuck	Check terminal output - likely waiting for HITL checkpoint approval
"Permission denied" on start.sh	Run `chmod +x start.sh`
Python version error	Ensure Python 3.11+ is active in your virtual environment
Agent logs not showing	Logs are in project's `.claude-agent/{spec}/logs/` - check file permissions
Daemon not starting	Check if socket exists at `/tmp/coding-harness-daemon.sock`
TUI says "Daemon not running"	Stale socket from previous container - rebuild with `./start.sh --build`
"could not read Password" git error	Docker: Rebuild with `./start.sh --build` to get auto-credentials. Native: See Git Authentication section
Git fetch/pull fails in container	Ensure `GITLAB_PERSONAL_ACCESS_TOKEN` is set in `.env` - it's used for git credentials in Docker

FAQ

Q: Do I need Docker? A: No, you can use ./start.sh --native to run with a local Python virtual environment. However, native mode doesn't persist agents - they stop when you quit the TUI. For long-running tasks, use Docker.

Q: Can I run multiple containers? A: Yes! Each new ./start.sh creates a container named coding-harness, coding-harness-2, etc.

Q: How do I keep agents running when I disconnect? A: In Docker mode: press Q to quit TUI - agents continue in daemon. Reconnect with ./start.sh --connect or press Enter after restarting. Use Ctrl+P, Ctrl+Q to detach entirely. In native mode: agents stop when TUI exits (no persistence).

Q: Can I use this without GitLab? A: Yes! Enable file-only mode in the TUI's Agent Options screen or use "file_only_mode": true in JSON specs. This stores milestones and issues in local JSON files instead of GitLab. Note: Git operations still use GitLab MCP for pushing commits.

Q: Can I skip merge request creation? A: Yes! Enable skip MR creation in the TUI's Agent Options screen or use "skip_mr_creation": true in JSON specs. The agent will stop after coding completes and keep all changes on the feature branch without creating an MR.

Q: What's the difference between file-only mode and skip MR creation? A: They're independent options:

File-only mode - Changes where issue/milestone tracking happens (GitLab vs local JSON files)
Skip MR creation - Changes whether an MR is created at the end (can be used with or without GitLab tracking) You can use either or both together.

Q: What Claude models are supported? A: Default is claude-opus-4-5-20251101. You can use other models by setting CLAUDE_MODEL.

Q: Can I run multiple specs at once? A: Yes! Use JSON mode with multiple spec objects in the array.

Q: How do I skip human approval? A: Press a in the TUI to toggle auto-accept mode for the selected agent. This removes human oversight - use carefully.

Q: Can I use a different coding assistant (not Claude)? A: The architecture is portable but not implemented. During the live stream, community members confirmed OpenCode + Ollama works for local models (Qwen 3, Kimi K2). Porting requires replacing Claude Agent SDK calls with equivalent SDK (Codex SDK, OpenCode SDK, AMP SDK). The prompts and artifacts pattern transfers directly.

Q: Will I hit rate limits running for 24 hours? A: It varies. Cole Medin ran Opus 4.5 for 24 hours on the $200/month Max plan without limits, but others report hitting limits quickly. Behavior seems to vary by account and region. Mitigation: toggle between Claude Code, Codex, and OpenCode to distribute load.

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Run code quality checks (see .claude/skills/code-quality.md for commands)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

[License type to be determined]

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.claude/skills/code-quality		.claude/skills/code-quality
.vscode		.vscode
agent		agent
common		common
tui		tui
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pylintrc		.pylintrc
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
docker-entrypoint.sh		docker-entrypoint.sh
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt
ruff.toml		ruff.toml
start.sh		start.sh

GantisStorm/autonomous-coding-harness

Folders and files

Latest commit

History

Repository files navigation

Coding Harness

Overview

Conceptual Foundations

The Evolution: Prompts → Context → Harnesses

Resources

The Three Problems Harnesses Solve

The Solution: Initializer + Task Agent Pattern

How This Harness Adapts the Pattern

Advanced Patterns (Not Yet Implemented)

Greenfield vs Brownfield Development

Multi-Agent Specialization (Future Work)

Known Limitations

Core Behaviors (How We Tackle Harness Challenges)

Operating Modes

Features

Prerequisites

Installation

Docker Mode (Recommended)

Native Mode (Without Docker)

Configuration

Required Environment Variables

Optional Environment Variables

Git Authentication

Quick Start

Interactive Mode

Docker Container Management

Programmatic Mode

Auto-Accept Mode

Usage

Runtime Architecture

Persistence Model

TUI Workflow

Writing a Specification File

Issue Creation & Enrichment

HITL Checkpoints

Auto-Accept Mode

Quality Assurance & Guardrails

Before Starting Work (Coding Agent)

During Implementation

Before Issue Closure

Before MR Creation

Test Repair Loop

GitLab API Reliability

Architecture

Component Architecture

Checkpoint Flow

Component Overview

Development

Code Quality

Project Structure

Troubleshooting

FAQ

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages