Skip to content

feat: consolidation — batch-propose merges/supersedes for near-duplicate approved claims #308

Description

@plind-junior

approved claims drift into near-duplicates over time: two agents record the same fact in slightly different words, or a later observation restates an earlier one. kb.dedup_scan already surfaces these read-only — scan_all in src/vouch/embeddings/dedup.py walks the embedding_index, compares same-kind vectors by cosine, and logs pairs to the embedding_dupes table — but it stops at reporting. nothing turns a detected cluster into an actionable cleanup. today a maintainer reading the scan output hand-runs vouch supersede for each pair, which is exactly the kind of repetitive, error-prone toil the review gate should be feeding, not gating in the raw.

this issue proposes a retroactive consolidation pass: cluster near-duplicate already-approved claims (embedding cosine, with an optional text-similarity tiebreak), pick a survivor per cluster, and emit supersede/merge intents into the pending queue for a human to approve or reject. the pass never mutates durable claims itself — it only proposes.

proposed surface

new method kb.consolidate (retroactive; distinct from the read-only kb.dedup_scan):

  • reuses the same cosine machinery as scan_all to cluster same-kind claims in the durable set at or above a threshold, then within each cluster nominates a survivor (highest confidence, then most recent updated_at, ties broken deterministically by id).
  • for each non-survivor, emits a pending supersede intent — a proposal that, on approval, calls lifecycle.supersede(store, old_claim_id=member, new_claim_id=survivor, actor=...). a --mode=merge variant instead proposes a single new claim that unions the evidence/entities/tags of the cluster and supersedes every member, so the union goes through proposals.propose_claimproposals.approve and the supersedes are recorded on approval.
  • flags: --threshold (default 0.95, matching dedup.DEFAULT_THRESHOLD), --mode {supersede|merge} (default supersede), --kind {claim} (claims only for now), --max-clusters N to bound a single pass, --dry-run to print clusters and the intents that would be proposed without writing anything.
  • cli mirror: vouch consolidate --threshold 0.95 --mode supersede --dry-run, printing each cluster as survivor <- member (cos=…) and the proposal ids it created.
  • config under .vouch/config.yaml: consolidate.threshold, consolidate.mode, consolidate.max_clusters as defaults, so a scheduled or manual pass reads policy from config rather than flags.

because this adds a new kb.* method, it touches the four registration sites — @mcp.tool() in src/vouch/server.py, _h_consolidate + HANDLERS["kb.consolidate"] in src/vouch/jsonl_server.py, METHODS in src/vouch/capabilities.py, and the vouch consolidate command in src/vouch/cli.py — plus tests/test_consolidate.py.

review gate & scope

the pass proposes, never approves. clustering and survivor selection are decision logic and live in the proposal/lifecycle layer, not in storage.py (which stays pure i/o). every consolidation lands in the pending queue and requires a human kb.approve; only on approval does lifecycle.supersede run, the survivor's supersedes and the old claim's superseded_by get written, and the existing claim.supersede audit event is emitted. --dry-run and any background or scheduled invocation both emit at most pending proposals — no durable artifact is ever written unattended.

lifecycle.supersede is currently a direct mutation (its module docstring is explicit that supersede is metadata about reviewed knowledge and skips the queue). the retroactive pass deliberately does not call it directly: a batch, machine-clustered decision over historical knowledge is a new assertion about which claim wins, so it goes through review. the existing single-pair vouch supersede path is unchanged. everything stays local-first — same .vouch/ files, same embedding_index, no network, no external service.

acceptance criteria

  • kb.consolidate clusters same-kind claims by cosine at or above --threshold, reusing the vector-comparison logic already in src/vouch/embeddings/dedup.py rather than duplicating it.
  • each non-survivor produces a pending proposal; nothing durable is written until a human approves, and --dry-run writes nothing at all.
  • --mode=merge proposes a single union claim via proposals.propose_claim that supersedes every cluster member on approval.
  • survivor selection is deterministic (confidenceupdated_at → id) and covered by a test with a fixed cluster.
  • approval of a supersede intent invokes lifecycle.supersede, sets superseded_by/supersedes, and emits the claim.supersede audit event; rejection leaves all claims untouched.
  • claims already in superseded, archived, contested, or redacted status are excluded from clustering so consolidated claims aren't re-proposed on the next pass.
  • method is registered at all four surfaces and test_capabilities passes; behaviour tested in tests/test_consolidate.py.
  • --max-clusters bounds a single pass; config keys consolidate.threshold / consolidate.mode / consolidate.max_clusters supply defaults.

distinction from adjacent issues: #147 (propose-time similarity warnings) fires when a new claim is being proposed and warns before it enters the queue — it prevents duplicates at ingest. #135 (vouch expire garbage-collect) reaps stale pending proposals that were never reviewed. #110 / #93 (batch approval) drain the pending queue but do not populate it. this issue is none of those: it is a retroactive pass over claims that are already approved and durable, proposing to collapse historical drift back through the same review gate that #110/#93 would then let a human clear in bulk.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestsize: M200-499 changed non-doc linesstoragekb storage, migrations, schemas, and proposals

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions