Inspired by SkillClaw, adapted for Hermes Agent as a local-first plugin:
session evidence in, safer skill updates out.
If you use Hermes skills heavily and do not want them to silently rot, this plugin turns real usage into reviewable, reversible skill maintenance.
| 📚 Session evidence | 🧠 Memory/skill candidates | 🧪 --variants dry-run |
🛡️ Guarded automation |
|---|---|---|---|
| Learn from real Hermes work | Mine sessions into a review queue | Compare bounded deterministic variants | Bounded notes, reference spillover, rollback |
Two new reviewer-first paths are now visible up front:
auto-run --variants Ngenerates up to four deterministic, model-free bounded update variants, scores them with local safety/quality signals, and selects one winner. The default remains--variants 1, so existing dry runs stay stable.candidates-mine+candidates-listturns already-redacted session evidence into a local SQLite review queue. It classifies findings asmemory,skill_update,skill_new,replay_benchmark, orignoreso a human can decide what should become durable memory, a skill patch, a new skill, or an evaluation case. It does not write Hermes memory, edit skills, or enable auto-apply.
# Compare bounded variants without writing files
hermes-curator-evolver auto-run --skills-dir ~/.hermes/skills --variants 3 --format json
# Mine session-derived evidence into a human-review queue
hermes-curator-evolver candidates-mine --input-jsonl redacted-evidence.jsonl --queue-db curator-review.sqlite
hermes-curator-evolver candidates-list --queue-db curator-review.sqlite --status pending --format json- Who this is for
- Quick start: install, backfill, autorun
- At a glance
- Trust boundary
- Why this exists
- Inspired by SkillClaw, made Hermes-native
- Launch / discussion kit
- Architecture
- Model usage plan
- Safety model
- Examples and demo
- Feedback wanted
- CLI reference
- Contributing
- Uninstall
Hermes Curator Evolver is for people who treat agent skills as operational memory: debugging playbooks, deployment habits, project conventions, and lessons learned from real work. It helps answer a practical question: how can those skills improve from evidence without letting automation silently rewrite the library?
Use it when you want:
- local evidence reports before any skill update,
- dry-run proposals that can be reviewed like maintenance notes,
- explicit write approval, exact target-hash checks, backups, and rollback,
- safe unattended maintenance limited to bounded managed blocks,
- optional semantic search/rerank only when you choose to enable it.
It is not a general AutoML system, a skill marketplace, or an agent that freely rewrites every prompt it can see. The default path is local, model-free, reversible, and intentionally boring.
Copy, paste, done. bootstrap handles the noisy parts: backfill old sessions + enable daily safe autorun.
hermes plugins install pingchesu/hermes-curator-evolver --enable
uv pip install --python ~/.hermes/hermes-agent/venv/bin/python -e ~/.hermes/plugins/curator-evolver
hermes-curator-evolver bootstrapThat is the default, model-free path. It writes only low-risk bounded notes to local agent-created skills, spills bulky evidence into references/ when needed, then validates the changed SKILL.md before the apply is considered successful. Official/bundled, hub-installed, plugin-provided, skills.external_dirs, pinned, unknown-source, and already-over-hard-cap skills are skipped.
Want multilingual semantic/rerank ordering? Make the opt-in explicit:
uv pip install --python ~/.hermes/hermes-agent/venv/bin/python -e "$HOME/.hermes/plugins/curator-evolver[semantic]"
hermes-curator-evolver bootstrap --semanticQuick checks:
hermes-curator-evolver status
# Linux / systemd:
systemctl --user list-timers 'hermes-curator-evolver*' --all --no-pager
# macOS / launchd:
launchctl list | grep hermes-curator-evolver || truebootstrap now installs the daily autorun with the native user scheduler: systemd user timers on Linux and a LaunchAgent plist at ~/Library/LaunchAgents/com.pingchesu.hermes-curator-evolver.auto.plist on macOS. Use --schedule hourly|daily|weekly for portable cadences; custom systemd OnCalendar values remain Linux-only.
If Hermes gateway was already running, restart it once so plugin hooks are loaded. For health checks, timer logs, model details, and uninstall steps, see docs/after-install.md.
| 1. Collect | 2. Rank | 3. Improve | 4. Protect |
|---|---|---|---|
| Tool calls + skill loads + old sessions | Evidence counts; optional Qwen + bge rerank | Daily bounded notes + reference spillover + post-apply validation | Only local agent-created skills are writable |
flowchart LR
S[Hermes sessions + tool calls] --> DB[(SQLite evidence)]
DB --> T[daily bootstrap scheduler]
T --> A[bounded notes to local agent-created skills]
A -. bulky evidence .-> REF[references/ spillover]
T -. skip .-> P[official / hub / external / pinned skills]
A --> B[backup + rollback manifest]
| User concern | Short answer |
|---|---|
| Will it run by itself? | Yes. bootstrap enables a daily user-level scheduler: systemd on Linux, launchd on macOS. |
| Will it rewrite my skills? | No. Autorun only updates a managed bounded block and spills bulky evidence to references/. |
| Will it touch official/team skills? | No. Provenance gate skips bundled, hub, plugin, and external_dirs skills. |
| Can I inspect first? | Yes. auto-run --format json is dry-run by default. |
The default experience is designed to be inspectable before it is writable:
- Read-only first:
status,report,analyze,candidates,candidates-mine,candidates-list,propose,verify, and defaultauto-rundo not mutate skills. - No blind model dependency: the default bootstrap path is model-free; model-assisted proposal drafting and semantic/rerank ordering require explicit opt-in flags.
- Narrow unattended writes: low-risk autorun writes only a managed bounded notes block, and only after both
--apply-low-riskand--approve-auto-apply. - Size guardrails:
SKILL.mdupdates target a 90k soft cap, spill bulky evidence intoreferences/, and skip unattended writes when the target skill is already over the 100k hard cap. - Source provenance gate: official/bundled, hub-installed, plugin-provided,
skills.external_dirs, pinned, and unknown-source skills are skipped from unattended writes. - Rollback is concrete: guarded apply records backups and manifests so you can restore exact prior content.
For a quick visual walkthrough, see docs/demo-script.md. For synthetic output examples, see examples/.
candidates-mine turns already-redacted evidence packets into a local SQLite review queue. It classifies each record as memory, skill_update, skill_new, replay_benchmark, or ignore, but it never writes to Hermes memory, never edits skills, and never enables auto-apply. Every row is pending human review by default.
cat > /tmp/redacted-evidence.jsonl <<'EOF'
{"text":"durable memory 只存精簡宣告事實;流程/步驟/SOP 進 skill;不存 task progress / PR / SHA / 短期狀態","evidence_ref":"session:policy"}
{"text":"Workflow: 1. First run `ingest`. 2. Then run `mine`. 3. Finally review.","evidence_ref":"session:workflow"}
{"text":"{\"exit_code\":1,\"output\":\"remote: Repository not found\"}","evidence_ref":"session:failure","tool_name":"terminal"}
EOF
hermes-curator-evolver candidates-mine \
--input-jsonl /tmp/redacted-evidence.jsonl \
--queue-db /tmp/curator-review.sqlite \
--format markdown
hermes-curator-evolver candidates-list \
--queue-db /tmp/curator-review.sqlite \
--status pending \
--format jsonThe miner is intentionally conservative: unknown cases become ignore, raw read_file/source dumps are suppressed as source_dump, near-cap SKILL.md evidence is marked direct_append_allowed=false, and the queue refuses any candidate with auto_apply_allowed=true even if a caller bypasses the normal Candidate constructor.
Hermes skills are operational memory. They capture how an agent should debug, deploy, research, and communicate in a real environment. But memory decays: stale commands, duplicated workflows, missing caveats, weak trigger descriptions, and hard-won lessons trapped in old session logs.
Hermes Curator Evolver closes that loop: session evidence in, safer skill updates out — without patching Hermes core or silently rewriting your skill library.
SkillClaw showed the right idea: agents should evolve skills from session trajectories. Hermes Curator Evolver adapts that idea to a local-first Hermes plugin.
| SkillClaw lesson | Hermes-native adaptation |
|---|---|
| Learn from sessions. | Runtime hooks + historical backfill feed local SQLite evidence. |
| Retrieve similar skills before editing. | Lexical search by default; optional Qwen embeddings + bge reranking. |
| Verify skill changes. | Dry-run proposals, verifier gates, exact SHA match, backups, rollback. |
| Avoid uncontrolled mutation. | No Hermes core patches, pinned skills are skipped, official/hub/external/plugin skills are protected from unattended writes, autorun is bounded and can spill bulky evidence into references/. |
If you are evaluating or sharing the project, start with the smallest concrete claim:
A local-first Hermes Agent plugin that turns session history into evidence-backed skill maintenance, with dry-run proposals and provenance-safe bounded autorun.
Useful links for reviewers and community posts:
- docs/core-algorithm.md — exact evidence, candidate-selection, semantic/rerank, and autorun algorithm.
- docs/architecture.md — one-page architecture and safety boundary.
- docs/after-install.md — what to expect after install, health checks, scheduler logs, and uninstall.
- docs/hyperagents-design-notes.md — clean-room design notes explaining why HyperAgents is not a dependency and which concepts (multi-variant candidates, staged verifier) are adapted.
- docs/reddit-launch.md — recommended cadence and concise community-post drafts.
- docs/reddit-launch-kit.md — expanded subreddit-specific titles, replies, and disclosure notes.
See docs/architecture.md for the one-page architecture diagram, model usage plan, and safety boundary. See docs/after-install.md for the post-install autorun guide, health checks, uninstall path, and supported models.
flowchart LR
H[Hermes runtime] --> P[curator-evolver plugin]
P --> DB[(local SQLite evidence)]
DB --> R[reports]
R --> Proposal[dry-run proposals]
Proposal --> Verify[verifier gate]
Verify --> Human[human approval]
Human --> Apply[guarded apply + rollback]
DB --> Auto[auto-run low-risk bounded update]
Auto --> Apply
| Phase | Model | Purpose | Default |
|---|---|---|---|
| v0.1 | None | Evidence collection and report aggregation. | Local/read-only. |
| v0.2 | Hermes configured chat model | Draft improvement proposals from evidence + skill text. | Optional --draft-with-model; dry-run artifact; no skill writes. |
| v0.2 | Deterministic verifier + future verifier prompt | Check grounding, safety, and non-destructive behavior. | Blocks mutation by default. |
| v0.3/v0.5 | Qwen/Qwen3-Embedding-0.6B |
Candidate skill/evidence/user-correction search. | Optional --execute-semantic; no default download. |
| v0.3/v0.5 | BAAI/bge-reranker-v2-m3 |
Re-rank candidates, especially for mixed Chinese/English agent workflows. | Optional --rerank; no default download. |
| v0.4 | Verifier + local validation command | Guard final reviewed content before apply. | Requires approval, backup, verification, rollback. |
| v0.6 | None by default | Automatic low-risk managed skill updates from observed evidence. | Optional install-auto; no Hermes core modification. |
| v0.7 | Qwen/Qwen3-Embedding-0.6B + BAAI/bge-reranker-v2-m3 |
Optional model-assisted autorun candidate ordering. | Explicit --semantic-candidates --rerank-candidates; models only reorder evidence-eligible candidates. |
| v0.9 | None | Provenance-safe unattended auto-apply. | Writes only local agent-created skills; skips bundled, hub, plugin, external, pinned, and unknown sources. |
| v0.10 | None by default | One-command setup and clearer public README. | bootstrap backfills sessions and installs/enables autorun; bootstrap --semantic is explicit model opt-in. |
| v0.11 | None | Size-bounded unattended auto-apply. | Keeps SKILL.md under the 100k tool cap by targeting a 90k soft cap, spilling bulky evidence into references/, and skipping already-over-hard-cap skills. |
| v0.12 | None by default | Clean-room multi-variant candidate selection and staged verification inspired by HyperAgents concepts. | --variants N is deterministic/model-free; --staged-verify runs local structural checks before optional user-supplied verify commands. No HyperAgents dependency or model-generated code execution. |
| v0.13 | None by default | Session-content mining for memory/skill/replay candidates. | candidates-mine classifies redacted evidence into a local SQLite review queue, then candidates-list lets reviewers decide whether an item should become memory, a skill update, a new skill, a replay benchmark, or be ignored. It is read-only and never writes memory or skills. |
The guarded path requires:
- evidence report,
- dry-run proposal,
- verifier pass,
- human-reviewed content,
- exact target SHA256 match,
- explicit
--approve, - backup manifest,
- optional validation command,
- rollback path.
Hard defaults:
- ✅ Evidence/report/proposal/candidate commands do not mutate skills.
- ✅ Semantic mode does not download models by default;
--execute-semantic/--rerankare explicit opt-ins. - ✅ Apply refuses to run without
--approve. - ✅ Apply refuses if the target SHA256 changed.
- ✅ Apply creates a backup before writing.
- ✅ Failed validation auto-restores the backup.
- ✅
auto-runwrites only managed bounded blocks and still requires both--apply-low-riskand--approve-auto-applybefore mutation. - ✅ Bulky autorun evidence spills into
references/instead of growingSKILL.mdpast the tool cap; already-over-hard-cap skills are skipped. - ✅ Even with both write flags, unattended auto-apply writes only local agent-created skills. Official/bundled skills (
.bundled_manifest), hub-installed skills (.hub/lock.json), plugin-provided skills,skills.external_dirs, pinned skills, and unknown sources are skipped. - ✅
--semantic-candidates/--rerank-candidatesare explicit opt-ins and only reorder skills that already passed the evidence threshold. - ✅ Optional
--variants N(default1) deterministically generates up to four bounded variants and picks one winner; only the winner is applied, and variant generation never executes model-generated code. See docs/hyperagents-design-notes.md. - ✅ Optional staged verifier gate: cheap built-in structural check (managed-block + size invariants) runs before any expensive
--verify-command, so a failing cheap stage skips the expensive stage entirely and still rolls back. - ✅ Optional restore-drill gate:
hermes-curator-evolver restore-drill --manifest <manifest>replays a rollback manifest into a clean temp directory and emits a pass/fail report. Pair withauto-run --require-restore-drillto refuse further mutating apply when the last apply has not been drill-verified yet (default: warn only, never silent).
| Concept | What it proves |
|---|---|
| Rollback manifest | Records original SHA, backup path, support-file snapshots, provenance, evidence DB reference, and any scheduler hooks at the moment of apply. Lets rollback restore the prior file in place. |
| Restore drill | Actually replays that manifest into a clean directory (default: temp dir; explicit --target-dir must be empty) and verifies: skill content sha256, support files (references/templates/scripts/assets), evidence DB reference is a real SQLite file, provenance metadata is recorded, scheduler/cron references exist on disk. Emits machine-readable JSON. Drill state is checked by auto-run --require-restore-drill so unattended apply can refuse to widen risk after a failed, missing, stale, or unreadable drill state. |
If you want to inspect the behavior before installing, start here:
- 60-second demo script — terminal walkthrough for a GIF/asciinema recording.
- Example artifacts — synthetic report, proposal, bounded managed-block diff, and rollback manifest.
- Promotion readiness plan — what changed to make the repo easier to evaluate publicly.
- Architecture notes — one-page data flow and safety boundary.
- Post-install guide — health checks, scheduler logs, model details, and uninstall steps.
This project is intentionally conservative, and feedback is most useful around the trust model:
- Is the provenance gate strict enough for unattended skill maintenance?
- Should proposals become PR-like diffs instead of bounded managed notes?
- Which evidence signals should count: tool sequences, repeated fixes, user corrections, failed commands, or something else?
- What rollback UX would make automated skill maintenance trustworthy?
- What evaluation would show that a skill update actually improves future agent behavior?
If you are sharing or reviewing this project publicly, the community launch notes and draft posts live in docs/reddit-launch.md.
# One-command bootstrap
hermes-curator-evolver bootstrap
hermes-curator-evolver bootstrap --semantic
hermes-curator-evolver bootstrap --format json
# Evidence
hermes-curator-evolver status
hermes-curator-evolver report --days 7 --format json
hermes-curator-evolver backfill-sessions --sessions-dir ~/.hermes/sessions --days 30 --format json
hermes-curator-evolver analyze --skill hermes-agent --days 30
# Proposal + verifier
hermes-curator-evolver propose --skill hermes-agent --skill-file ./SKILL.md --format json --output proposal.json
hermes-curator-evolver propose --skill hermes-agent --skill-file ./SKILL.md --draft-with-model --model-timeout 180
hermes-curator-evolver verify --proposal-file proposal.json --skill hermes-agent --format json
# Candidate generation
hermes-curator-evolver candidates --query "gateway restart plugin cli" --skills-dir ~/.hermes/skills
hermes-curator-evolver candidates --query "中文 mixed agent skill" --skills-dir ~/.hermes/skills --semantic --format json # plan only
hermes-curator-evolver candidates --query "中文 mixed agent skill" --skills-dir ~/.hermes/skills --execute-semantic --format json
hermes-curator-evolver candidates --query "中文 mixed agent skill" --skills-dir ~/.hermes/skills --execute-semantic --rerank --format json
# Guarded apply
sha256sum ./SKILL.md
hermes-curator-evolver apply \
--target ./SKILL.md \
--content-file ./reviewed-SKILL.md \
--expected-sha256 <current-sha256> \
--backup-dir .curator-evolver-backups \
--verify-command "python -m pytest -q" \
--approve
# Rollback
hermes-curator-evolver rollback --manifest .curator-evolver-backups/<timestamp>/manifest.json
# Restore drill (non-destructive: replay manifest into a clean dir and report pass/fail)
hermes-curator-evolver restore-drill --manifest .curator-evolver-backups/<timestamp>/manifest.json --format json
hermes-curator-evolver restore-drill --manifest .curator-evolver-backups/<timestamp>/manifest.json --target-dir /tmp/drill-XYZ --format markdown
# Automatic evolution
hermes-curator-evolver auto-run --skills-dir ~/.hermes/skills --format json # dry-run
hermes-curator-evolver auto-run --skills-dir ~/.hermes/skills --semantic-candidates --rerank-candidates --format json
hermes-curator-evolver auto-run --skills-dir ~/.hermes/skills --apply-low-risk --approve-auto-apply
hermes-curator-evolver auto-run --skills-dir ~/.hermes/skills --semantic-candidates --rerank-candidates --apply-low-risk --approve-auto-apply
hermes-curator-evolver auto-run --skills-dir ~/.hermes/skills --apply-low-risk --approve-auto-apply --block-auto-apply-skill 'github-*'
hermes-curator-evolver auto-run --skills-dir ~/.hermes/skills --apply-low-risk --approve-auto-apply --allow-auto-apply-skill store-playbook # only within local agent-created source boundary
hermes-curator-evolver auto-run --skills-dir ~/.hermes/skills --variants 3 --format json # generate 3 deterministic variants, pick winner (dry-run)
hermes-curator-evolver auto-run --skills-dir ~/.hermes/skills --apply-low-risk --approve-auto-apply --staged-verify # cheap built-in check before expensive verify
hermes-curator-evolver auto-run --skills-dir ~/.hermes/skills --apply-low-risk --approve-auto-apply --require-restore-drill # block apply unless last apply was drill-verified
hermes-curator-evolver install-auto --schedule daily --enable
hermes-curator-evolver install-auto --schedule daily --enable --semantic-candidates --rerank-candidates
hermes-curator-evolver uninstall-autoContributions are welcome. See CONTRIBUTING.md for local setup, TDD expectations, PR checklist, smoke tests, and CI behavior.
Inspired by SkillClaw — especially the idea that agent skills should evolve from real session evidence, not only from hand-written maintenance. Hermes Curator Evolver keeps that inspiration, but applies it through Hermes-native plugin hooks, local SQLite evidence, explicit model opt-ins, and conservative guarded writes.
Hermes already provides plugin removal:
hermes plugins disable curator-evolver
hermes plugins uninstall curator-evolver # alias: remove/rmIf you enabled the optional auto-evolve scheduler, remove it first:
hermes-curator-evolver uninstall-autoPlugin removal does not delete historical evidence by default. Remove it manually only if you want a clean slate:
rm -rf ~/.hermes/plugins/curator-evolver/data ~/.hermes/plugins/curator-evolver/backupsWhen enabled, Hermes can call:
curator_evidence_report
to retrieve a JSON evidence report.
git clone https://github.com/pingchesu/hermes-curator-evolver.git
cd hermes-curator-evolver
python -m pip install -e .
hermes plugins enable curator-evolverIf your Hermes environment does not provide pip, use:
uv pip install -e .You can also symlink this repository into the Hermes plugin directory:
mkdir -p ~/.hermes/plugins
ln -s /path/to/hermes-curator-evolver ~/.hermes/plugins/curator-evolver
hermes plugins enable curator-evolverDefault:
~/.hermes/plugins/curator-evolver/data/evidence.sqlite
Override:
export HERMES_CURATOR_EVOLVER_DB=/custom/path.sqlite- ✅ v0.1 — evidence/report plugin.
- ✅ v0.2 — proposal generation + verifier gate, dry-run by default.
- ✅ v0.3 — candidate generation interface with optional embedding/reranker model plan.
- ✅ v0.4 — guarded apply with explicit approval, backup, verification, and rollback.
- ✅ v0.5 — explicit model execution paths: Hermes chat-model drafts, Qwen embedding candidate ranking, and bge reranking.
- ✅ v0.6 — plug-and-play
auto-run+ optional native user scheduler for low-risk managed skill improvements without Hermes core changes. - ✅ v0.7 — explicit
--semantic-candidates/--rerank-candidatesfor model-assisted autorun candidate ordering. - ✅ v0.8 —
backfill-sessionsfor existing Hermes history,CONTRIBUTING.md, and GitHub Actions CI. - ✅ v0.9 — provenance-safe autorun: only local agent-created skills can be auto-applied; bundled, hub, plugin, external, pinned, and unknown sources are skipped.
- ✅ v0.10 —
bootstrapone-command setup plus a shorter, visual quick start. - ✅ v0.11 — size-bounded autorun: target a 90k
SKILL.mdsoft cap, spill bulky evidence intoreferences/, and skip already-over-hard-cap skills. - ✅ v0.12 — deterministic
--variants N, staged verification, and restore-drill gating for safer autorun choices. - ✅ v0.13 — read-only session mining into a human-review queue for
memory,skill_update,skill_new,replay_benchmark, orignoredecisions.
Built for people who want agent skills to improve — without letting automation silently rewrite the library.