Skip to content

feat(dual-solve): run claude + codex on an issue, operator picks the winner#271

Merged
plind-junior merged 14 commits into
testfrom
feat/dual-solve
Jun 25, 2026
Merged

feat(dual-solve): run claude + codex on an issue, operator picks the winner#271
plind-junior merged 14 commits into
testfrom
feat/dual-solve

Conversation

@plind-junior

@plind-junior plind-junior commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

What changed

adds vouch dual-solve <issue-url> — a cli-only sibling tool in the same vein
as auto-pr. it fetches a github issue, grounds an identical prompt with
vouch context, then runs claude and codex independently, each in its own git
worktree on a fresh branch. you see both diffs, pick the winner (or neither),
and the kept branch stays while the chosen solution's rationale is proposed
into the kb. all the orchestration lives in a new src/vouch/dual_solve.py
driven by auto_pr's injectable Runner, so every stage is unit-tested
against a fake runner; cli.py is a thin shell over it.

Why

auto-pr already proved the sibling-tool pattern — drive claude+codex from the
cli without touching the review gate. dual-solve answers a different question:
when you want two independent attempts at one issue and a human to judge, rather
than fixer↔verifier. the operator is the judge, the two engines never
cross-critique, and the decision (plus up to three approach claims) is captured
where the rest of the team will see it.

What might break

nothing on disk. this is not a kb.* method — it's cli-only, so there are
no changes to server.py, jsonl_server.py, capabilities.py, or
openclaw.plugin.json, and no existing .vouch/ file moves or changes shape.

the one point worth your eyes: this is the first sibling tool that writes to
the kb.
auto-pr is documented as "never writes to the kb"; dual-solve does —
but only ever as proposals. it registers the winning commit as a Source
(source intake, ungated by design) and then propose_claims the decision and
approach claims, which land in proposed/. nothing is auto-approved; approval
still requires a human vouch approve. so the review-gate invariant holds — the
"never writes to the kb" rule is widened to "only ever proposes", not broken.
calling this out explicitly since it's the one design call a reviewer might want
to weigh.

known follow-up (not in this pr): when --json is used or no workdir is
supplied, prepare's tempfile.mkdtemp dir and the engine worktrees are left
on disk (cleanup only git worktree removes the children). a clean fix wants a
small "prepare owns the tempdir" refactor; deferred deliberately rather than
rushed in here.

VEP

none needed — no surface change (no new kb.* method, no object-model, on-disk,
bundle, or audit-log change). flagging the kb-write behaviour above so you can
confirm you agree it doesn't warrant one.

Tests

  • make check passes locally (lint + mypy + pytest) — one pre-existing
    failure remains, test_volunteer_context::test_pending_proposal_not_volunteered
    (a latin-1 encode bug in storage.py); it fails identically on test and
    storage.py is byte-identical to base, so it's not from this branch. as a
    guard, dual-solve ascii-coerces claim text at the kb boundary so an untrusted
    issue title can't trip that same encode path and leave a corrupt proposal.
  • new behaviour has tests — 29 in tests/test_dual_solve.py, fake-runner
    driven (no network, no real claude/codex/gh), covering both-succeed, single
    survivor, both-fail abort, --no-record/--dry-run, gated-claim citation,
    and a non-ascii title regression.
  • CHANGELOG.md updated under ## [Unreleased].

front-loading every import in task 1 left ruff F401-red across tasks
1-6 and forced each task reviewer to re-flag the shrinking unused set.
each task now imports only the symbols its own code references, so the
gate stays green at every commit. consumes blocks already name the
source module per symbol.
add three pure helpers for the dual-solve feature: repo_root detects
the git repo via runner (testable against FakeRunner); ground_prompt
renders knowledge base context packs into bulleted prompt fragments;
build_prompt fills the shared _FIX_PROMPT template with issue details
and grounding. all carry type annotations and work against the test
harness with no network or real binaries.
the plan assumed a KBStore.list_pending() that does not exist; the
codebase queries pending via list_proposals(ProposalStatus.PENDING)
(server.py, cli.py). also fix payload dict-access and note the
ascii-only claim-text constraint forced by storage's latin-1 yaml write.
SubprocessRunner lives in auto_pr (not dual_solve); add choice: str|None
annotation mypy requires; ascii -- in the proposed-id echo to dodge the
locale latin-1 encode issue.
…rrupt the kb

a github issue title or engine summary with a non-latin-1 char (em dash,
smart quote) flowed into propose_claim's yaml write; storage encodes with
the locale default (latin-1 here), so the write raised UnicodeEncodeError
mid-stream and left a zero-byte proposal that poisoned list_proposals for
the whole kb. record_to_kb now coerces claim text to ascii at the boundary
(the verbatim original stays in the Source content, written as bytes), the
cli wraps finalize so any residual write error is a clean ClickException,
and a regression test exercises a non-ascii title. also fixes the
ground_prompt docstring (M3.2).
@plind-junior plind-junior changed the title Feat/dual solve feat(dual-solve): run claude + codex on an issue, operator picks the winner Jun 25, 2026
@plind-junior plind-junior merged commit a79efd9 into test Jun 25, 2026
5 checks passed
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b90ec957-2784-4570-9d18-f227b2f5fd8e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/dual-solve

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant