feat(dual-solve): run claude + codex on an issue, operator picks the winner#271
Merged
Conversation
front-loading every import in task 1 left ruff F401-red across tasks 1-6 and forced each task reviewer to re-flag the shrinking unused set. each task now imports only the symbols its own code references, so the gate stays green at every commit. consumes blocks already name the source module per symbol.
add three pure helpers for the dual-solve feature: repo_root detects the git repo via runner (testable against FakeRunner); ground_prompt renders knowledge base context packs into bulleted prompt fragments; build_prompt fills the shared _FIX_PROMPT template with issue details and grounding. all carry type annotations and work against the test harness with no network or real binaries.
the plan assumed a KBStore.list_pending() that does not exist; the codebase queries pending via list_proposals(ProposalStatus.PENDING) (server.py, cli.py). also fix payload dict-access and note the ascii-only claim-text constraint forced by storage's latin-1 yaml write.
SubprocessRunner lives in auto_pr (not dual_solve); add choice: str|None annotation mypy requires; ascii -- in the proposed-id echo to dodge the locale latin-1 encode issue.
…rrupt the kb a github issue title or engine summary with a non-latin-1 char (em dash, smart quote) flowed into propose_claim's yaml write; storage encodes with the locale default (latin-1 here), so the write raised UnicodeEncodeError mid-stream and left a zero-byte proposal that poisoned list_proposals for the whole kb. record_to_kb now coerces claim text to ascii at the boundary (the verbatim original stays in the Source content, written as bytes), the cli wraps finalize so any residual write error is a clean ClickException, and a regression test exercises a non-ascii title. also fixes the ground_prompt docstring (M3.2).
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed
adds
vouch dual-solve <issue-url>— a cli-only sibling tool in the same veinas
auto-pr. it fetches a github issue, grounds an identical prompt withvouch context, then runs claude and codex independently, each in its own gitworktree on a fresh branch. you see both diffs, pick the winner (or neither),
and the kept branch stays while the chosen solution's rationale is proposed
into the kb. all the orchestration lives in a new
src/vouch/dual_solve.pydriven by
auto_pr's injectableRunner, so every stage is unit-testedagainst a fake runner;
cli.pyis a thin shell over it.Why
auto-pralready proved the sibling-tool pattern — drive claude+codex from thecli without touching the review gate. dual-solve answers a different question:
when you want two independent attempts at one issue and a human to judge, rather
than fixer↔verifier. the operator is the judge, the two engines never
cross-critique, and the decision (plus up to three approach claims) is captured
where the rest of the team will see it.
What might break
nothing on disk. this is not a
kb.*method — it's cli-only, so there areno changes to
server.py,jsonl_server.py,capabilities.py, oropenclaw.plugin.json, and no existing.vouch/file moves or changes shape.the one point worth your eyes: this is the first sibling tool that writes to
the kb.
auto-pris documented as "never writes to the kb"; dual-solve does —but only ever as proposals. it registers the winning commit as a
Source(source intake, ungated by design) and then
propose_claims the decision andapproach claims, which land in
proposed/. nothing is auto-approved; approvalstill requires a human
vouch approve. so the review-gate invariant holds — the"never writes to the kb" rule is widened to "only ever proposes", not broken.
calling this out explicitly since it's the one design call a reviewer might want
to weigh.
known follow-up (not in this pr): when
--jsonis used or no workdir issupplied,
prepare'stempfile.mkdtempdir and the engine worktrees are lefton disk (cleanup only
git worktree removes the children). a clean fix wants asmall "prepare owns the tempdir" refactor; deferred deliberately rather than
rushed in here.
VEP
none needed — no surface change (no new
kb.*method, no object-model, on-disk,bundle, or audit-log change). flagging the kb-write behaviour above so you can
confirm you agree it doesn't warrant one.
Tests
make checkpasses locally (lint + mypy + pytest) — one pre-existingfailure remains,
test_volunteer_context::test_pending_proposal_not_volunteered(a latin-1 encode bug in
storage.py); it fails identically ontestandstorage.pyis byte-identical to base, so it's not from this branch. as aguard, dual-solve ascii-coerces claim text at the kb boundary so an untrusted
issue title can't trip that same encode path and leave a corrupt proposal.
tests/test_dual_solve.py, fake-runnerdriven (no network, no real claude/codex/gh), covering both-succeed, single
survivor, both-fail abort,
--no-record/--dry-run, gated-claim citation,and a non-ascii title regression.
CHANGELOG.mdupdated under## [Unreleased].