Skip to content

CaesiumY/harness-optimizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

harness-optimizer

A Claude Code skill that diagnoses and improves your harness configuration based on 8 design principles from Anthropic's "Harness design for long-running application development".

한국어

Core Functionality

The skill performs four main operations:

  1. Scan — Auto-detects all harness components in your project (skills, agents, commands, hooks, CLAUDE.md, plugin.json, MCP config, settings)
  2. Diagnose — Evaluates each component against 8 harness design principles using 2-layer diagnostics
  3. Report — Outputs a PASS/FAIL/PARTIAL checklist with per-principle scores and an overall health grade (0-100)
  4. Fix — Applies tiered auto-fixes: Tier 1 modifies existing files, Tier 2 creates new files (with user confirmation)

Supports both plugin projects (with plugin.json) and non-plugin projects (CLAUDE.md + .claude/ only).

The 8 Principles

# Principle Weight Key Question
1 Evaluator Separation 20% Are generator and evaluator agents separated?
2 Context Management 15% Is there a context reset/compaction strategy?
3 Task Decomposition 15% Are complex tasks broken into manageable units?
4 Evaluation Criteria Design 10% Are subjective qualities converted to measurable criteria?
5 Structured Handoff 10% Is agent context transfer via files/artifacts?
6 Harness Simplification 7% Is unnecessary scaffolding removed?
7 Sprint Contract 8% Are "done" criteria defined before work starts?
8 Feedback Loop 15% Do evaluation results feed back to the generator?

2-Layer Diagnostics

Unlike simple keyword matching, harness-optimizer uses a 2-layer approach for accurate diagnosis:

  • Layer 1 (Signal Collection): Grep/Glob patterns scan for relevant files and keywords — collecting evidence without making judgments
  • Layer 2 (Semantic Judgment): The LLM reads the actual file content and determines whether the principle is truly implemented, not just mentioned

This prevents false positives (e.g., a file named reviewer.md that doesn't actually serve as an evaluator) and false negatives (e.g., an evaluator with a non-standard name).

Installation & Usage

npx skills add CaesiumY/harness-optimizer

Once installed, trigger with phrases like:

  • optimize harness / diagnose harness / check my harness
  • harness health / harness review / improve harness

Flags

Flag Effect
--dry-run Show proposed changes without modifying files
--report-only Output diagnostic report only, skip auto-fix
--path <path> Specify target project path
--help Display usage information

Health Grades

Grade Score Description
Excellent 80-100 Harness design is principled and well-executed
Good 60-79 Core principles implemented, room for improvement
Fair 40-59 Major principles missing, improvement needed
Poor 20-39 Most principles unimplemented
Critical 0-19 Harness design principles are barely applied

Project Structure

harness-design-skill/
├── skills/
│   └── harness-optimizer/
│       ├── SKILL.md                        # Main workflow (182 lines)
│       ├── references/
│       │   ├── principles-checklist.md     # 2-layer diagnostic logic per principle
│       │   ├── scoring-system.md           # Weights, formulas, grade definitions
│       │   ├── autofix-catalog.md          # Tier 1/2 fix catalog with before/after
│       │   └── harness-article-summary.md  # Key insights from the source article
│       └── scripts/
│           └── scan-components.mjs         # Project component auto-detection
├── docs/
│   └── Harness design for long-running application development.md
├── LICENSE
└── README.md

Based On

This skill is built on insights from Anthropic's engineering blog post "Harness design for long-running application development" by Prithvi Rajasekaran. The article describes a multi-agent architecture (Planner → Generator → Evaluator) that produced rich full-stack applications over multi-hour autonomous coding sessions.

Skill structure modeled after agents-md-optimizer.

License

MIT

About

Claude Code skill that diagnoses and improves harness configurations based on 8 design principles from Anthropic's harness design article

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors