Skip to content

feat(embeddings): rerank, HyDE, dedup, scorer, model-identity migration#42

Merged
plind-junior merged 10 commits into
vouchdev:mainfrom
dripsmvcp:feat/semantic-search-phase-6-capabilities
May 22, 2026
Merged

feat(embeddings): rerank, HyDE, dedup, scorer, model-identity migration#42
plind-junior merged 10 commits into
vouchdev:mainfrom
dripsmvcp:feat/semantic-search-phase-6-capabilities

Conversation

@dripsmvcp

@dripsmvcp dripsmvcp commented May 20, 2026

Copy link
Copy Markdown
Contributor

Stacked on Phase 5 (feat/semantic-search-phase-5-fusion).

Summary

Phase 6 layers the "smarter" retrieval and operational capabilities on top of the basic semantic backend:

  • Cross-encoder reranking for higher-precision top-K
  • HyDE query expansion (template-only by default; LLM hook optional)
  • Ingest-time duplicate detection — flags near-duplicate artifacts on write
  • Scorer harness — recall@k / MRR / nDCG over a JSONL queries file
  • Model-identity migration — detects when embedding_model in index_meta differs from the registered default, exposes a backfill path

Tracks Tasks 18-22 in the plan.

Commits

Commit Module Purpose
4befafb embeddings/rerank.py Reranker ABC + CrossEncoderReranker (sentence-transformers ms-marco-MiniLM-L6-v2) + rerank(query, hits, reranker, top_k)
e9d2dce embeddings/hyde.py expand_query_template(query, min_chars=20) (pads short queries with a "documents about X" preamble) + expand_query_with_llm(query, llm) (optional LLM-backed expansion via caller-supplied callable)
ae73191 embeddings/dedup.py check_and_log (called from _embed_and_store since Phase 3) + list_duplicates + scan_all; logs to embedding_dupes table
e06b69d embeddings/scorer.py recall_at_k / mrr / ndcg metric fns + evaluate(queries_file, k, metrics) runner
e5d220a embeddings/migration.py detect_mismatch(kb_dir) returns mismatch info or None; backfill_embeddings(store) re-encodes every artifact under the current adapter
743e492 style: cleanup for B905, F401, I001

Design notes

Lazy heavy deps. CrossEncoderReranker.__init__ defers from sentence_transformers import CrossEncoder so the base install stays light. Same lazy-load pattern as the embedding adapters in Phase 1.

HyDE is template-only by default — no LLM dependency, just a preamble template applied to short queries. The expand_query_with_llm variant takes a caller-supplied Callable[[str], str] so vouch doesn't pin a specific LLM client.

Dedup hook was already wired in Phase 3. KBStore._embed_and_store calls embeddings.dedup.check_and_log inside a try/except ImportError. With this PR the import resolves and dedup runs on every write.

Migration backfill_embeddings iterates store.list_claims(), list_pages(), list_sources(), list_entities(), list_relations(), list_evidence() (guarded with try/except AttributeError per kind in case a future codebase variant drops a list method).

Schema / API deviations from plan body

  • Relation.typeRelation.relation (enum field; .relation.value for string)
  • Evidence.excerptEvidence.quote
  • Test fixture for dedup uses a custom _HashEmbedder (sha256 bytes scaled into [0,1]) instead of MockEmbedder from _fakes.py. Reason: under Python 3.14 + recent numpy, the existing MockEmbedder's raw-float32 unpacking can produce inf/nan values, collapsing the unit-normalized vector to zero and breaking the ≥0.95 cosine assertion. _HashEmbedder produces valid non-zero vectors. The production adapters are unaffected.

Test Plan

  • .venv/bin/pytest tests/embeddings -v50 passed, 4 deselected (+16 new tests across the 5 new modules)
  • .venv/bin/python -m ruff check src/vouch/embeddings tests/embeddingsclean

What's NOT in this PR

  • Reranker / HyDE / dedup / scorer / migration are not yet exposed via CLI or MCP/JSONL — Phase 8 adds vouch eval embedding, vouch dedup, vouch reindex --embeddings, kb_dedup_scan, etc.
  • build_context_pack still uses the old retrieval path — Phase 7.

Summary by CodeRabbit

  • New Features

    • Added embedding duplicate detection to identify and log similar vectors.
    • Added query expansion capabilities to enhance search queries.
    • Added cross-encoder based result reranking for improved search relevance.
    • Added retrieval evaluation metrics (recall, MRR, NDCG) to measure search quality.
    • Added embedding model mismatch detection and artifact re-encoding.
  • Tests

    • Added comprehensive test coverage for duplicate detection, query expansion, reranking, and evaluation features.

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 20, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 35deb1b9-a432-4f97-8844-579edab71a18

📥 Commits

Reviewing files that changed from the base of the PR and between b4006a9 and 4546117.

📒 Files selected for processing (4)
  • src/vouch/embeddings/dedup.py
  • src/vouch/embeddings/migration.py
  • src/vouch/embeddings/rerank.py
  • src/vouch/embeddings/scorer.py

📝 Walkthrough

Walkthrough

Adds five new embeddings modules: HyDE query expansion, ingest-time duplicate detection with scan and logging, embedding model migration and backfill, cross-encoder reranking with hit scoring, and retrieval evaluation metrics (recall@k, MRR, NDCG). All modules include comprehensive test coverage.

Changes

Embedding Retrieval Features

Layer / File(s) Summary
HyDE query expansion
src/vouch/embeddings/hyde.py, tests/embeddings/test_hyde.py
expand_query_template conditionally formats short queries using a template; expand_query_with_llm invokes an LLM for expansion. Tests validate length-based branching and idempotency for long queries.
Embedding duplicate detection at ingest time
src/vouch/embeddings/dedup.py, tests/embeddings/test_dedup.py
check_and_log searches nearest neighbors within the same kind and logs qualifying matches; scan_all performs bulk pairwise comparisons with deduplication; list_duplicates retrieves logged pairs. Tests use deterministic hash embeddings.
Embedding model migration and backfill
src/vouch/embeddings/migration.py, tests/embeddings/test_migration.py
detect_mismatch compares stored vs. current embedding model identity; backfill_embeddings re-encodes artifacts across all kinds when models change. Tests validate mismatch reporting and re-encode counts.
Cross-encoder based result reranking
src/vouch/embeddings/rerank.py, tests/embeddings/test_rerank.py
Reranker abstract base and CrossEncoderReranker implementation score candidates via sentence_transformers; rerank rebuilds hits with cross-encoder scores, sorts by score, and truncates to top-k. Tests verify scoring order and truncation.
Retrieval evaluation metrics and reporting
src/vouch/embeddings/scorer.py, tests/embeddings/test_scorer.py
recall_at_k, mrr, and ndcg compute retrieval quality metrics from hits against ground-truth relevance sets; evaluate aggregates metrics from JSONL queries; write_report exports as JSON. Tests cover full hit, miss, and partial recall scenarios.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • vouchdev/vouch#38: The main PR's new dedup.py logic reads from embedding_index and writes duplicate findings into the embedding_dupes table, which is introduced by the retrieved PR's embedding storage-layer schema changes in index_db.py.
  • vouchdev/vouch#39: The main PR adds vouch.embeddings.dedup functions like check_and_log() that are directly used by the write-time _embed_and_store() hook (from PR #39) to populate the embedding_dupes ledger.

Poem

🐰 Five features bloom in the embedding sky,
Dedups find twins that we can't deny,
HyDE expands queries with flair,
Rerankers polish results with care,
While metrics show how good we fare! 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.02% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main changes: it lists five major features added in the embeddings module (rerank, HyDE, dedup, scorer, model-identity migration).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/vouch/jsonl_server.py (1)

74-109: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Align JSONL kb.search response shape with MCP kb_search.

_h_search still returns a raw list, but MCP kb_search now returns {"backend": ..., "hits": [...]}. This breaks the “same surface” contract across transports.

Proposed fix
-def _h_search(p: dict) -> list[dict]:
+def _h_search(p: dict) -> dict[str, Any]:
@@
-    return [
-        {"kind": k, "id": i, "snippet": sn, "score": sc, "backend": used}
-        for k, i, sn, sc in hits
-    ]
+    return {
+        "backend": used,
+        "hits": [
+            {"kind": k, "id": i, "snippet": sn, "score": sc, "backend": used}
+            for k, i, sn, sc in hits
+        ],
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/jsonl_server.py` around lines 74 - 109, _h_search currently returns
a raw list of hit dicts but MCP's kb_search returns an envelope {"backend": ...,
"hits": [...]}; change _h_search to return that envelope instead. Update the
return value and type hint of _h_search to return a dict with keys "backend"
(the used variable) and "hits" (the list comprehension currently produced), i.e.
wrap the existing list comprehension into {"backend": used, "hits": [...]} and
keep existing logic that sets used (references: _h_search,
index_db.search_semantic, index_db.search, s.search_substring, rrf_fuse). Also
search for and update any callers that expect a raw list to handle the new
envelope shape.
♻️ Duplicate comments (3)
src/vouch/embeddings/st_minilm.py (1)

11-11: ⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Resolve numpy dependency configuration (duplicate issue).

Same module-level numpy import issue as base.py line 17. See the comment on that file for details and verification steps.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/embeddings/st_minilm.py` at line 11, The module-level "import numpy
as np" in src/vouch/embeddings/st_minilm.py should be removed and replaced with
a lazy/local import to avoid forcing a numpy dependency at import time; locate
the functions that use numpy (the module-level symbol "np" and any embedding
functions in st_minilm.py) and move "import numpy as np" inside those functions
(or wrap it in a try/except that raises a clear error if numpy is missing),
ensuring all references to "np" remain valid after the change.
src/vouch/embeddings/fastembed_bge.py (1)

11-11: ⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Resolve numpy dependency configuration (duplicate issue).

Same module-level numpy import issue as base.py line 17. See the comment on that file for details and verification steps.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/embeddings/fastembed_bge.py` at line 11, Move the module-level
"import numpy as np" out of the top of src/vouch/embeddings/fastembed_bge.py and
perform a lazy/local import where numpy is actually used (e.g., inside the
function(s) that reference "np"); if importing may fail, wrap the import in
try/except and raise a clear RuntimeError instructing to install numpy (similar
to the fix applied in base.py), ensuring you replace top-level "np" usages with
the locally imported name so there are no module-level numpy imports remaining.
src/vouch/embeddings/st_mpnet.py (1)

12-12: ⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Resolve numpy dependency configuration (duplicate issue).

Same module-level numpy import issue as base.py line 17. See the comment on that file for details and verification steps.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/embeddings/st_mpnet.py` at line 12, The module-level "import numpy
as np" in st_mpnet.py causes the same global dependency issue noted in base.py;
remove the top-level import and instead perform a local/lazy import of numpy
inside the function(s) or method(s) that actually use it (keep the symbol name
"np" when importing), or wrap the import in a try/except that raises a clear
ImportError with guidance; update any references in functions/classes in this
file to rely on the local import so no module-level numpy import remains.
🧹 Nitpick comments (2)
src/vouch/storage.py (1)

486-496: ⚡ Quick win

Uncaught exceptions from embedder.encode() will fail artifact writes.

The docstring states "Failures... are swallowed" but only ImportError and KeyError are caught. If encode() raises (e.g., model loading error, out-of-memory, or network failure for API-backed embedders), the exception propagates and fails the put_* operation, preventing the artifact from being stored.

If embeddings should be best-effort, wrap encode() in a broader exception handler. Otherwise, clarify in the docstring that only missing-dependency errors are swallowed.

Proposed fix: catch exceptions from encode()
     try:
         embedder = get_embedder()
     except KeyError:
         return
-    vec = embedder.encode(text)
+    try:
+        vec = embedder.encode(text)
+    except Exception:
+        return  # Embedding is best-effort; don't block artifact persistence
     with _index_db.open_db(self.kb_dir) as conn:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/storage.py` around lines 486 - 496, The call to
embedder.encode(text) can raise runtime errors (model load OOM, network/API
failures) which currently propagate and abort artifact writes; update the block
around get_embedder()/embedder.encode() so encode() is wrapped in a broad
try/except that logs the error and returns (i.e., swallow
non-ImportError/KeyError exceptions to keep embeddings best-effort) before
opening the DB and calling _index_db.put_embedding; reference embedder.encode,
get_embedder, and _index_db.put_embedding when locating where to add the
try/except and logging.
src/vouch/embeddings/migration.py (1)

34-34: ⚡ Quick win

force is unused and currently misleading.

Line 34 exposes force but behavior is identical regardless of value. Either implement force-gated behavior or remove the parameter before this API settles.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/embeddings/migration.py` at line 34, The exposed parameter force on
backfill_embeddings is unused and misleading; remove the force parameter from
the function signature (def backfill_embeddings(store: Any) -> int) and update
all call sites that pass force to call backfill_embeddings without it, or
alternatively if you prefer gated behavior, implement the gate inside
backfill_embeddings to skip already-embedded records when force is False and
only reprocess all records when force is True—refer to the backfill_embeddings
function name and any callers invoking backfill_embeddings to make the change
consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/superpowers/specs/2026-05-20-semantic-search-design.md`:
- Line 33: The fenced code block that begins with the listing
"src/vouch/embeddings/" is missing a language tag; update the opening fence from
``` to ```text (i.e., add the "text" language identifier) so the block is
properly marked and resolves markdownlint MD040.

In `@src/vouch/embeddings/cache.py`:
- Around line 15-23: The cache currently hashes only raw query text (in
_query_key) which can return embeddings from a different model; update the cache
key to include the embedding model identity (and optionally dimension) so keys
are scoped by model. Concretely, change _query_key to accept a model_id (or
build the combined key inside cache_query_vec by hashing f"{model_id}:{query}")
and use that when computing h in cache_query_vec (and update any callers of
_query_key accordingly); ensure the model identifier used is stable (e.g., model
name/version) and consider invalidating or migrating the existing
query_embedding_cache if needed.
- Line 10: Wrap the unconditional numpy import in cache.py with a
TYPE_CHECKING-only import so mypy can see numpy for type hints without requiring
it in [dev] extras: add "from typing import TYPE_CHECKING" and change "import
numpy as np" to "if TYPE_CHECKING: import numpy as np" (or alternatively add
"numpy>=1.26,<3" to dev dependencies); ensure any runtime code that uses np
still imports it at runtime (e.g., local import inside functions that actually
perform numeric work) so the module remains importable during CI type-checking.

In `@src/vouch/embeddings/dedup.py`:
- Around line 76-77: The dot-product vecs[k1] @ vecs[k2] can raise a
shape-mismatch; before computing cos in the dedup scan (where vecs, k1, k2, cos,
threshold are used) check that the two vectors have the same dimensionality
(e.g., compare .shape or len) and skip or handle mismatched pairs (log a warning
or continue) instead of performing the operation; alternatively catch the
exception around the dot-product and continue so the entire scan doesn't abort
when embeddings with different dim values exist.
- Line 9: The CI failure is caused by an unconditional "import numpy as np" in
src/vouch/embeddings/dedup.py while numpy is only declared under the
"embeddings" extra; fix by either adding "numpy>=1.26,<3" to the dev extras in
setup/pyproject so pip install -e '.[dev]' provides numpy, or update the CI
install command to include the embeddings extra (pip install -e
'.[dev,embeddings]') so the import in dedup.py resolves during type checking and
mypy finds the numpy stubs.

In `@src/vouch/embeddings/fusion.py`:
- Around line 22-27: The rrf_fuse function currently divides by (k + rank) and
can raise ZeroDivisionError if a caller passes k <= -1; add an upfront
validation in rrf_fuse that ensures k is an integer >= 0 (or at least k >
-rank_min where rank_min is 1) and raise a clear ValueError if the check fails.
Locate the rrf_fuse function and validate the parameter k before the loop
(referencing the k and rank variables) so the subsequent scores calculation
using 1.0 / (k + rank) is guaranteed safe.

In `@src/vouch/embeddings/migration.py`:
- Around line 38-83: The try/except blocks around each entire loop (e.g.,
store.list_claims(), store.list_pages(), store.list_sources(),
store.list_entities(), store.list_relations(), store.list_evidence()) currently
catch AttributeError for the whole loop and can hide per-item errors while still
allowing index_db.set_embedding_meta and embedder.name/version/dim to be written
as if the backfill succeeded; instead, narrow the exception handling by calling
the list_* iterator outside the try and only rescuing AttributeError when the
list_* method itself is missing, then wrap the per-item access and the call to
store._embed_and_store(kind=..., id=..., text=...) in their own try/except that
logs or re-raises real errors (so failures incrementing touched are not silently
ignored), and only call index_db.set_embedding_meta if all required lists ran
(or if you tracked successful touched items), referencing the functions/methods
store.list_claims, store.list_pages, store.list_sources, store.list_entities,
store.list_relations, store.list_evidence, store._embed_and_store, and
index_db.set_embedding_meta along with embedder.name/version/dim to locate and
update the logic.

In `@src/vouch/embeddings/rerank.py`:
- Line 49: The slice uses top_k directly which allows negative values to
silently drop tail elements; before the line that does "return reranked[:top_k]"
validate top_k (the parameter) and either clamp it to a non-negative int (e.g.
top_k = max(0, int(top_k))) or raise ValueError for negative inputs, and also
handle None (treat as "no limit") so the function that computes "reranked"
returns predictable results.

In `@src/vouch/embeddings/scorer.py`:
- Around line 57-59: The loop that reads JSONL lines currently does
json.loads(line) for every line (the block iterating "for line in f" producing
"row" and "q"), which will raise JSONDecodeError on blank lines; modify that
loop to skip empty or whitespace-only lines before calling json.loads (e.g.,
check if line.strip() is falsy and continue) so only non-empty lines are parsed
and processed into "row" and "q".
- Around line 54-71: The code silently accepts unknown metric names because
totals = {m: 0.0 for m in metrics} blindly initializes any string; instead
validate the requested metrics against the supported set (e.g.,
{"recall@k","mrr","ndcg"}) at the start of the function and raise a ValueError
listing unknown_metric(s) if any are present; then keep the existing per-query
loop that calls recall_at_k, mrr, ndcg and accumulates into totals, so only
validated metrics are initialized and averaged before returning.

In `@src/vouch/index_db.py`:
- Around line 185-192: Mypy is failing because it can't find numpy stubs for the
lazy imports in _vec_to_blob and _blob_to_vec; update the inline imports inside
those functions to silence mypy by adding a type ignore for untyped imports
(e.g., change the import lines in functions _vec_to_blob and _blob_to_vec to use
a "# type: ignore[import-untyped]" comment), or alternatively add the
types-numpy package to dev dependencies if you prefer proper stubs—apply the
chosen fix consistently to both functions.
- Around line 294-301: The SQL uses the SELECT alias "score" in the WHERE clause
which SQLite disallows, so update the query in the conn.execute call that builds
rows to either repeat the score expression in the WHERE (replace "score >= ?"
with "1.0 - vec_distance_cosine(vec, ?) >= ?") or wrap the SELECT as a subquery
that filters on score, then ORDER/LIMIT; keep the same parameter order
(q.tobytes(), *kinds, min_score, limit) and ensure variables q, kinds,
min_score, and limit are used unchanged so rows filtering works correctly and
sqlite-vec will be able to execute the predicate.

In `@src/vouch/jsonl_server.py`:
- Around line 84-105: Validate backend_arg and raise a ValueError for unknown
values instead of silently returning empty hits: after the existing branches
that handle "auto", "embedding", "fts5", "substring", and "hybrid" (which use
index_db.search_semantic, index_db.search, s.search_substring, and rrf_fuse) add
a check that backend_arg is one of the supported options and if not, raise
ValueError("unknown backend: {backend_arg}") (or similar) so callers receive a
deterministic invalid_request error; ensure this check runs before the function
returns and does not mask legitimate empty-hit results.

In `@src/vouch/server.py`:
- Around line 112-120: The current block around index_db.search swallows all
exceptions for both "auto" and "fts5", causing explicit backend="fts5" failures
to be treated as empty results; change the control flow so that you only catch
exceptions when backend == "auto" (and then set hits = [] to allow fallback),
but for backend == "fts5" let exceptions propagate (or re-raise) so callers see
the failure; locate the branch handling backend in server.py (the if backend in
("auto", "fts5") block), the call to index_db.search, and the conversion via
_to_dicts to implement this: perform the try/except only when backend == "auto",
or re-raise the caught exception when backend == "fts5".
- Around line 126-131: The hybrid branch currently ignores min_score; update the
backend=="hybrid" block to enforce the same threshold as embedding mode by
applying min_score to the fused results: call index_db.search_semantic and
index_db.search with the same min_score parameter if supported, then run
rrf_fuse(emb, fts, limit=limit) and filter the returned hits to only include
items with score >= min_score (or pass min_score into rrf_fuse if it accepts
it), and finally pass the filtered list to _to_dicts("hybrid"); reference the
hybrid block, index_db.search_semantic, index_db.search, rrf_fuse, and _to_dicts
to locate where to make the change.

---

Outside diff comments:
In `@src/vouch/jsonl_server.py`:
- Around line 74-109: _h_search currently returns a raw list of hit dicts but
MCP's kb_search returns an envelope {"backend": ..., "hits": [...]}; change
_h_search to return that envelope instead. Update the return value and type hint
of _h_search to return a dict with keys "backend" (the used variable) and "hits"
(the list comprehension currently produced), i.e. wrap the existing list
comprehension into {"backend": used, "hits": [...]} and keep existing logic that
sets used (references: _h_search, index_db.search_semantic, index_db.search,
s.search_substring, rrf_fuse). Also search for and update any callers that
expect a raw list to handle the new envelope shape.

---

Duplicate comments:
In `@src/vouch/embeddings/fastembed_bge.py`:
- Line 11: Move the module-level "import numpy as np" out of the top of
src/vouch/embeddings/fastembed_bge.py and perform a lazy/local import where
numpy is actually used (e.g., inside the function(s) that reference "np"); if
importing may fail, wrap the import in try/except and raise a clear RuntimeError
instructing to install numpy (similar to the fix applied in base.py), ensuring
you replace top-level "np" usages with the locally imported name so there are no
module-level numpy imports remaining.

In `@src/vouch/embeddings/st_minilm.py`:
- Line 11: The module-level "import numpy as np" in
src/vouch/embeddings/st_minilm.py should be removed and replaced with a
lazy/local import to avoid forcing a numpy dependency at import time; locate the
functions that use numpy (the module-level symbol "np" and any embedding
functions in st_minilm.py) and move "import numpy as np" inside those functions
(or wrap it in a try/except that raises a clear error if numpy is missing),
ensuring all references to "np" remain valid after the change.

In `@src/vouch/embeddings/st_mpnet.py`:
- Line 12: The module-level "import numpy as np" in st_mpnet.py causes the same
global dependency issue noted in base.py; remove the top-level import and
instead perform a local/lazy import of numpy inside the function(s) or method(s)
that actually use it (keep the symbol name "np" when importing), or wrap the
import in a try/except that raises a clear ImportError with guidance; update any
references in functions/classes in this file to rely on the local import so no
module-level numpy import remains.

---

Nitpick comments:
In `@src/vouch/embeddings/migration.py`:
- Line 34: The exposed parameter force on backfill_embeddings is unused and
misleading; remove the force parameter from the function signature (def
backfill_embeddings(store: Any) -> int) and update all call sites that pass
force to call backfill_embeddings without it, or alternatively if you prefer
gated behavior, implement the gate inside backfill_embeddings to skip
already-embedded records when force is False and only reprocess all records when
force is True—refer to the backfill_embeddings function name and any callers
invoking backfill_embeddings to make the change consistently.

In `@src/vouch/storage.py`:
- Around line 486-496: The call to embedder.encode(text) can raise runtime
errors (model load OOM, network/API failures) which currently propagate and
abort artifact writes; update the block around get_embedder()/embedder.encode()
so encode() is wrapped in a broad try/except that logs the error and returns
(i.e., swallow non-ImportError/KeyError exceptions to keep embeddings
best-effort) before opening the DB and calling _index_db.put_embedding;
reference embedder.encode, get_embedder, and _index_db.put_embedding when
locating where to add the try/except and logging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 433efaa1-a846-447b-a74a-42f97d729ffe

📥 Commits

Reviewing files that changed from the base of the PR and between 056e88c and 743e492.

📒 Files selected for processing (31)
  • docs/superpowers/plans/2026-05-20-semantic-search.md
  • docs/superpowers/specs/2026-05-20-semantic-search-design.md
  • pyproject.toml
  • src/vouch/embeddings/__init__.py
  • src/vouch/embeddings/base.py
  • src/vouch/embeddings/cache.py
  • src/vouch/embeddings/dedup.py
  • src/vouch/embeddings/fastembed_bge.py
  • src/vouch/embeddings/fusion.py
  • src/vouch/embeddings/hyde.py
  • src/vouch/embeddings/migration.py
  • src/vouch/embeddings/rerank.py
  • src/vouch/embeddings/scorer.py
  • src/vouch/embeddings/st_minilm.py
  • src/vouch/embeddings/st_mpnet.py
  • src/vouch/index_db.py
  • src/vouch/jsonl_server.py
  • src/vouch/server.py
  • src/vouch/storage.py
  • tests/embeddings/__init__.py
  • tests/embeddings/_fakes.py
  • tests/embeddings/test_core.py
  • tests/embeddings/test_dedup.py
  • tests/embeddings/test_fusion.py
  • tests/embeddings/test_hyde.py
  • tests/embeddings/test_integration.py
  • tests/embeddings/test_migration.py
  • tests/embeddings/test_rerank.py
  • tests/embeddings/test_scorer.py
  • tests/embeddings/test_search.py
  • tests/embeddings/test_storage.py


## 3. New package layout

```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language identifier to the fenced code block.

The code fence at this line is missing a language tag, which triggers markdownlint MD040.

🛠️ Suggested fix
-```
+```text
 src/vouch/embeddings/
   __init__.py                # public API: encode, search, register
   base.py                    # Embedder ABC + adapter registry
@@
   migration.py               # model-identity check + backfill orchestration
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 33-33: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/specs/2026-05-20-semantic-search-design.md` at line 33, The
fenced code block that begins with the listing "src/vouch/embeddings/" is
missing a language tag; update the opening fence from ``` to ```text (i.e., add
the "text" language identifier) so the block is properly marked and resolves
markdownlint MD040.

Comment thread src/vouch/embeddings/cache.py
Comment on lines +15 to +23
def _query_key(query: str) -> str:
return hashlib.sha256(query.encode("utf-8")).hexdigest()


def cache_query_vec(
kb_dir: Path, *, query: str, vec: np.ndarray, max_entries: int = 1024,
) -> None:
h = _query_key(query)
blob = np.asarray(vec, dtype=np.float32).tobytes()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Include embedding model identity in cache keys.

Line 15 hashes only the raw query text. After embedding-model migration/backfill, the same query can return a cached vector from the previous model, causing stale scoring (and potentially dimension mismatch downstream). Scope the key by model identity (or clear query_embedding_cache during migration).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/embeddings/cache.py` around lines 15 - 23, The cache currently
hashes only raw query text (in _query_key) which can return embeddings from a
different model; update the cache key to include the embedding model identity
(and optionally dimension) so keys are scoped by model. Concretely, change
_query_key to accept a model_id (or build the combined key inside
cache_query_vec by hashing f"{model_id}:{query}") and use that when computing h
in cache_query_vec (and update any callers of _query_key accordingly); ensure
the model identifier used is stable (e.g., model name/version) and consider
invalidating or migrating the existing query_embedding_cache if needed.

Comment thread src/vouch/embeddings/dedup.py
Comment thread src/vouch/embeddings/dedup.py
Comment thread src/vouch/index_db.py
Comment thread src/vouch/index_db.py
Comment thread src/vouch/jsonl_server.py
Comment on lines +84 to +105
if backend_arg in ("auto", "embedding"):
hits = index_db.search_semantic(
s.kb_dir, q, limit=limit, min_score=min_score,
)
if hits:
used = "embedding"
if not hits and backend_arg in ("auto", "fts5"):
try:
hits = index_db.search(s.kb_dir, q, limit=limit)
used = "fts5" if hits else used
except Exception:
hits = []
if not hits and backend_arg in ("auto", "substring"):
hits = s.search_substring(q, limit=limit)
backend = "substring"
used = "substring"
if backend_arg == "hybrid":
from .embeddings.fusion import rrf_fuse
emb = index_db.search_semantic(s.kb_dir, q, limit=limit * 2)
fts = index_db.search(s.kb_dir, q, limit=limit * 2)
hits = rrf_fuse(emb, fts, limit=limit)
used = "hybrid"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject unknown backend values explicitly.

Unlike MCP kb_search, this handler does not validate backend; unsupported values silently return empty results. Return a ValueError for unknown backends so clients get deterministic invalid_request responses.

Proposed fix
     backend_arg = p.get("backend", "auto")
+    allowed = {"auto", "embedding", "fts5", "substring", "hybrid"}
+    if backend_arg not in allowed:
+        raise ValueError(f"unknown backend: {backend_arg}")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/jsonl_server.py` around lines 84 - 105, Validate backend_arg and
raise a ValueError for unknown values instead of silently returning empty hits:
after the existing branches that handle "auto", "embedding", "fts5",
"substring", and "hybrid" (which use index_db.search_semantic, index_db.search,
s.search_substring, and rrf_fuse) add a check that backend_arg is one of the
supported options and if not, raise ValueError("unknown backend: {backend_arg}")
(or similar) so callers receive a deterministic invalid_request error; ensure
this check runs before the function returns and does not mask legitimate
empty-hit results.

Comment thread src/vouch/server.py
Comment on lines +112 to +120
if backend in ("auto", "fts5"):
try:
hits = index_db.search(store.kb_dir, query, limit=limit)
except Exception:
hits = []
if hits:
return _to_dicts(hits, "fts5")
if backend == "fts5":
return _to_dicts([], "fts5")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t swallow FTS5 failures for explicit backend="fts5" requests.

Lines 115–116 currently hide all FTS errors and return empty hits, which turns backend failures into false negatives. Keep graceful fallback for auto, but propagate an error for explicit fts5.

Proposed fix
     if backend in ("auto", "fts5"):
         try:
             hits = index_db.search(store.kb_dir, query, limit=limit)
-        except Exception:
-            hits = []
+        except Exception as e:
+            if backend == "fts5":
+                raise ValueError(f"fts5 search failed: {e}") from e
+            hits = []
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if backend in ("auto", "fts5"):
try:
hits = index_db.search(store.kb_dir, query, limit=limit)
except Exception:
hits = []
if hits:
return _to_dicts(hits, "fts5")
if backend == "fts5":
return _to_dicts([], "fts5")
if backend in ("auto", "fts5"):
try:
hits = index_db.search(store.kb_dir, query, limit=limit)
except Exception as e:
if backend == "fts5":
raise ValueError(f"fts5 search failed: {e}") from e
hits = []
if hits:
return _to_dicts(hits, "fts5")
if backend == "fts5":
return _to_dicts([], "fts5")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/server.py` around lines 112 - 120, The current block around
index_db.search swallows all exceptions for both "auto" and "fts5", causing
explicit backend="fts5" failures to be treated as empty results; change the
control flow so that you only catch exceptions when backend == "auto" (and then
set hits = [] to allow fallback), but for backend == "fts5" let exceptions
propagate (or re-raise) so callers see the failure; locate the branch handling
backend in server.py (the if backend in ("auto", "fts5") block), the call to
index_db.search, and the conversion via _to_dicts to implement this: perform the
try/except only when backend == "auto", or re-raise the caught exception when
backend == "fts5".

Comment thread src/vouch/server.py
Comment on lines +126 to +131
if backend == "hybrid":
from .embeddings.fusion import rrf_fuse
emb = index_db.search_semantic(store.kb_dir, query, limit=limit * 2)
fts = index_db.search(store.kb_dir, query, limit=limit * 2)
hits = rrf_fuse(emb, fts, limit=limit)
return _to_dicts(hits, "hybrid")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Apply min_score in hybrid semantic leg.

Line 128 ignores min_score, so backend="hybrid" does not honor the same threshold semantics as embedding mode.

Proposed fix
-        emb = index_db.search_semantic(store.kb_dir, query, limit=limit * 2)
+        emb = index_db.search_semantic(
+            store.kb_dir, query, limit=limit * 2, min_score=min_score
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if backend == "hybrid":
from .embeddings.fusion import rrf_fuse
emb = index_db.search_semantic(store.kb_dir, query, limit=limit * 2)
fts = index_db.search(store.kb_dir, query, limit=limit * 2)
hits = rrf_fuse(emb, fts, limit=limit)
return _to_dicts(hits, "hybrid")
if backend == "hybrid":
from .embeddings.fusion import rrf_fuse
emb = index_db.search_semantic(
store.kb_dir, query, limit=limit * 2, min_score=min_score
)
fts = index_db.search(store.kb_dir, query, limit=limit * 2)
hits = rrf_fuse(emb, fts, limit=limit)
return _to_dicts(hits, "hybrid")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/server.py` around lines 126 - 131, The hybrid branch currently
ignores min_score; update the backend=="hybrid" block to enforce the same
threshold as embedding mode by applying min_score to the fused results: call
index_db.search_semantic and index_db.search with the same min_score parameter
if supported, then run rrf_fuse(emb, fts, limit=limit) and filter the returned
hits to only include items with score >= min_score (or pass min_score into
rrf_fuse if it accepts it), and finally pass the filtered list to
_to_dicts("hybrid"); reference the hybrid block, index_db.search_semantic,
index_db.search, rrf_fuse, and _to_dicts to locate where to make the change.

@dripsmvcp dripsmvcp force-pushed the feat/semantic-search-phase-6-capabilities branch 3 times, most recently from 7c74231 to 30d64d7 Compare May 20, 2026 13:11

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (11)
src/vouch/embeddings/rerank.py (1)

49-49: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reject negative top_k for predictable truncation behavior.

Negative top_k currently yields surprising output shape instead of a clear validation error.

Proposed fix
 def rerank(
     *, query: str, hits: list[Hit], reranker: Reranker, top_k: int = 50,
 ) -> list[Hit]:
     if not hits:
         return []
+    if top_k < 0:
+        raise ValueError("top_k must be >= 0")
     candidates = [h[2] or h[1] for h in hits]
     scores = reranker.score(query, candidates)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/embeddings/rerank.py` at line 49, The slice return reranked[:top_k]
should reject negative top_k to avoid surprising shapes; in the function that
returns reranked (look for the function handling reranking that assigns the
variable `reranked` and returns it), validate the `top_k` parameter and raise a
ValueError if top_k is negative (e.g., `if top_k is not None and top_k < 0:
raise ValueError("top_k must be non-negative")`), leaving the existing slicing
behavior for None or non-negative values unchanged.
src/vouch/jsonl_server.py (1)

79-107: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject unsupported backend values in JSONL search.

At Line 79, unknown backend values still fall through and return an empty success payload instead of an invalid_request error. This makes client behavior non-deterministic compared with MCP search handling.

Proposed minimal fix
 def _h_search(p: dict) -> list[dict]:
@@
-    backend_arg = p.get("backend", "auto")
+    backend_arg = p.get("backend", "auto")
+    allowed = {"auto", "embedding", "fts5", "substring", "hybrid"}
+    if backend_arg not in allowed:
+        raise ValueError(f"unknown backend: {backend_arg}")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/jsonl_server.py` around lines 79 - 107, Validate the incoming
backend_arg early and reject any value not in the supported set ("auto",
"embedding", "fts5", "substring", "hybrid") instead of falling through to an
empty result; add a check at the top of the branch handling (where backend_arg
and used are initialized) and return/raise an invalid_request error consistent
with MCP search handling if the value is unsupported. Locate the validation near
the variables backend_arg and used and ensure any caller-visible error follows
the same error shape/mechanism used elsewhere in this module (e.g., the JSONL
response error path), so search_semantic, search, search_substring and the
hybrid rrf_fuse branch remain unchanged except for pre-validation of
backend_arg.
src/vouch/server.py (2)

126-132: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Honor min_score in hybrid semantic candidate fetch.

At Line 130, hybrid mode doesn’t pass min_score to semantic search, so filtering behavior differs from embedding mode.

Proposed fix
-        emb = index_db.search_semantic(store.kb_dir, query, limit=limit * 2)
+        emb = index_db.search_semantic(
+            store.kb_dir, query, limit=limit * 2, min_score=min_score
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/server.py` around lines 126 - 132, Hybrid branch currently calls
index_db.search_semantic without passing min_score so results are not filtered
like embedding mode; update the call in the hybrid block (the
index_db.search_semantic invocation that produces emb) to include the same
min_score parameter used elsewhere (e.g., index_db.search(...,
min_score=min_score)) so emb = index_db.search_semantic(store.kb_dir, query,
limit=limit * 2, min_score=min_score), leaving the subsequent rrf_fuse(emb, fts,
limit=limit) unchanged.

112-120: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not suppress FTS failures for explicit backend="fts5".

At Line 115, explicit FTS requests still convert backend failures into empty hits, masking real errors.

Proposed fix
     if backend in ("auto", "fts5"):
         try:
             hits = index_db.search(store.kb_dir, query, limit=limit)
-        except Exception:
-            hits = []
+        except Exception as e:
+            if backend == "fts5":
+                raise ValueError(f"fts5 search failed: {e}") from e
+            hits = []
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/server.py` around lines 112 - 120, The code currently swallows all
exceptions from index_db.search regardless of whether backend is "auto" or
"fts5", which masks real FTS errors for explicit backend="fts5"; change the
control flow so that index_db.search exceptions are only suppressed for backend
== "auto" (e.g., perform the try/except only in the auto branch and set hits=[]
on error), while for backend == "fts5" call index_db.search without catching
Exception so real errors propagate; adjust the subsequent logic that calls
_to_dicts(hits, "fts5") accordingly.
src/vouch/embeddings/fusion.py (1)

22-27: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard invalid k in rrf_fuse.

At Line 27, callers can still trigger divide-by-zero when k + rank == 0 (e.g., k=-1 at first rank).

Proposed fix
 def rrf_fuse(a: Hits, b: Hits, *, limit: int = 10, k: int = 60) -> Hits:
     """Reciprocal Rank Fusion: score = sum(1 / (k + rank)) across lists."""
+    if k <= -1:
+        raise ValueError("k must be >= 0")
     scores: dict[tuple[str, str], float] = {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/embeddings/fusion.py` around lines 22 - 27, The rrf_fuse function
can divide by zero if callers pass a negative k (e.g., k = -1 makes k + rank ==
0 for rank==1); add an input guard at the top of rrf_fuse that validates k (for
example require k >= 0) and raise a clear ValueError if the check fails so the
loop using 1.0 / (k + rank) cannot encounter division-by-zero.
src/vouch/index_db.py (1)

294-301: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix sqlite-vec SQL filter: score alias is not valid in WHERE.

At Line 298, filtering on score (a SELECT alias) causes sqlite-vec query failure, so this path degrades to fallback every time.

Proposed fix
                 rows = conn.execute(
                     f"SELECT kind, id, "
-                    f"  1.0 - vec_distance_cosine(vec, ?) AS score "
+                    f"  (1.0 - vec_distance_cosine(vec, ?)) AS score "
                     f"FROM embedding_index "
-                    f"WHERE kind IN ({placeholders}) AND score >= ? "
+                    f"WHERE kind IN ({placeholders}) "
+                    f"  AND (1.0 - vec_distance_cosine(vec, ?)) >= ? "
                     f"ORDER BY score DESC LIMIT ?",
-                    (q.tobytes(), *kinds, min_score, limit),
+                    (q.tobytes(), *kinds, q.tobytes(), min_score, limit),
                 ).fetchall()
#!/bin/bash
# Verify the problematic alias usage exists in query construction
rg -n -C2 'AND score >= \?' src/vouch/index_db.py
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/index_db.py` around lines 294 - 301, The SQL uses the SELECT alias
"score" inside WHERE which SQLite doesn't allow; update the query in the
conn.execute call so the WHERE repeats the expression instead of referencing the
alias (e.g. use "AND 1.0 - vec_distance_cosine(vec, ?) >= ?" while still
selecting "1.0 - vec_distance_cosine(vec, ?) AS score"), and adjust the
parameter tuple passed to conn.execute so q.tobytes() appears for each ? for the
vector expression (i.e. pass q.tobytes() twice along with *kinds, min_score,
limit) to keep parameter ordering consistent for the rows assignment in
index_db.py.
src/vouch/embeddings/scorer.py (2)

54-71: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject unknown metric names early.

Typos in metrics currently produce misleading zero outputs instead of a failure signal. Validate requested metric names before accumulation and raise on unknown entries.

Suggested patch
 def evaluate(
@@
 ) -> dict[str, float]:
     """Run a metric sweep over a JSONL queries file."""
     from .. import index_db
+    valid = {"recall@k", "mrr", "ndcg"}
+    unknown = set(metrics) - valid
+    if unknown:
+        raise ValueError(f"Unknown metrics: {sorted(unknown)}")
     totals = {m: 0.0 for m in metrics}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/embeddings/scorer.py` around lines 54 - 71, Before iterating
queries validate the requested metrics list: define the allowed metric names
(e.g. "recall@k", "mrr", "ndcg"), compute unknown = set(metrics) - allowed, and
if unknown is non-empty raise a ValueError listing the unknown metric names;
keep the rest of the logic in scorer.py (the totals={} initialization and the
loop using recall_at_k, mrr, ndcg and index_db.search_semantic) unchanged so
typos no longer silently yield zeros.

56-59: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Skip blank lines in JSONL input.

Whitespace-only lines will raise JSONDecodeError and abort the run. Skip empty lines before parsing.

Suggested patch
     with queries_file.open() as f:
         for line in f:
+            if not line.strip():
+                continue
             row = json.loads(line)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/embeddings/scorer.py` around lines 56 - 59, The loop reading JSONL
in scorer.py currently calls json.loads on every line and will raise
JSONDecodeError on blank/whitespace-only lines; update the code in the block
that opens queries_file (the for line in f: loop) to skip lines that are empty
or contain only whitespace (e.g., check if line.strip() is falsy) before calling
json.loads, so row = json.loads(line) and subsequent use of q = row["query"]
only run for non-blank lines.
src/vouch/embeddings/dedup.py (1)

76-77: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard pairwise similarity against mismatched vector shapes.

scan_all can fail on mixed embedding dimensions; skip mismatched pairs before dot product so one bad pair doesn’t abort the entire audit.

Suggested patch
-            cos = float(vecs[k1] @ vecs[k2])
+            if vecs[k1].shape != vecs[k2].shape:
+                continue
+            cos = float(vecs[k1] @ vecs[k2])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/embeddings/dedup.py` around lines 76 - 77, scan_all can crash when
embeddings have mixed dimensions because the code does an unguarded dot product
(vecs[k1] @ vecs[k2]); update the pairwise loop in scan_all (or the function
performing the check) to skip any pair where vecs[k1].shape != vecs[k2].shape
(or lengths differ) before computing cos, and continue to the next pair instead
of attempting the dot product, ensuring the rest of the audit runs even if a
mismatched embedding is present; reference the variables vecs, k1, k2 and
threshold when adding the guard and skip logic.
src/vouch/embeddings/cache.py (1)

15-23: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scope query-cache keys by embedding model identity.

The key currently depends only on query text, so cached vectors can bleed across model changes and return stale/wrong embeddings. Include a stable model identifier in key derivation (and optionally clear legacy entries once).

Suggested patch
-def _query_key(query: str) -> str:
-    return hashlib.sha256(query.encode("utf-8")).hexdigest()
+def _query_key(query: str, model_id: str) -> str:
+    return hashlib.sha256(f"{model_id}:{query}".encode("utf-8")).hexdigest()

 def cache_query_vec(
     kb_dir: Path, *, query: str, vec: np.ndarray, max_entries: int = 1024,
 ) -> None:
-    h = _query_key(query)
+    from .base import get_embedder
+    model = get_embedder().name
+    h = _query_key(query, model)
@@
 def lookup_query_vec(kb_dir: Path, *, query: str) -> np.ndarray | None:
-    h = _query_key(query)
+    from .base import get_embedder
+    model = get_embedder().name
+    h = _query_key(query, model)

Also applies to: 46-48

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/embeddings/cache.py` around lines 15 - 23, Include the embedding
model identity in query cache keys to avoid cross-model cache pollution: change
_query_key to accept a model identifier (e.g., model_id or model_name) and
incorporate that identifier into the SHA256 input, update cache_query_vec to
pass the active model id into _query_key, and update the corresponding cache
read function (e.g., get_cached_query_vec or the lookup at the other location
referenced) to use the same model-aware _query_key; optionally add a
migration/cleanup step to invalidate or remove legacy keys lacking the model id.
src/vouch/embeddings/migration.py (1)

38-83: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Narrow AttributeError handling to missing list methods only.

Catching AttributeError around whole loops can hide real failures inside loop bodies, then metadata is still updated as if backfill succeeded. Resolve method presence up front (e.g., getattr(..., None) + callable) and let per-item errors surface.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/embeddings/migration.py` around lines 38 - 83, The current code
wraps entire loops in broad AttributeError handlers which can mask real errors;
instead, for each collection, check method existence up front (e.g., list_claims
= getattr(store, "list_claims", None); if callable(list_claims): for c in
list_claims(): store._embed_and_store(...)) and remove the surrounding
try/except blocks so per-item exceptions surface; do this for list_claims,
list_pages, list_sources, list_entities, list_relations, and list_evidence, keep
using store._embed_and_store for each item, and only call
index_db.set_embedding_meta(store.kb_dir, model=embedder.name,
version=embedder.version, dim=embedder.dim) after these conditional loops.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/vouch/embeddings/rerank.py`:
- Around line 43-47: The reranker.score return length isn't validated before
rebuilding reranked results, so mismatched counts between scores and
hits/candidates can silently drop or misalign items; before the list
comprehension that builds reranked (which references reranker.score, hits, and
candidates), check that len(scores) == len(hits) (or len(candidates)) and if
not, raise a clear exception (e.g., ValueError) containing the sizes and
identifiers (query/candidate count) so the contract violation is visible; only
proceed to build reranked when counts match.

In `@src/vouch/index_db.py`:
- Around line 311-315: The fallback path computes raw dot products (score = q @
vec) which breaks cosine semantics when stored vectors from _blob_to_vec are not
unit-normalized; update the scoring in the loop that builds scored (using
variables q, vec, min_score, scored) to compute cosine similarity by normalizing
vec (and q if not already unit) before taking the dot—i.e., compute vec_norm =
||vec|| and use (q·vec)/(||q||*vec_norm) or normalize vec in-place to unit and
then use q·vec, and only append to scored when that cosine score >= min_score.

In `@src/vouch/jsonl_server.py`:
- Around line 99-104: Hybrid mode currently obtains semantic candidates via
index_db.search_semantic without applying the min_score threshold, causing
inconsistent filtering versus backend="embedding"; update the hybrid branch
(where backend_arg == "hybrid" and emb is assigned from
index_db.search_semantic) to apply the same min_score used by the embedding
backend — either by passing min_score into index_db.search_semantic if supported
or by filtering the returned emb list to only include items with score >=
min_score before fusing with rrf_fuse (refer to variables/backend_arg, emb,
min_score, index_db.search_semantic, and rrf_fuse).

---

Duplicate comments:
In `@src/vouch/embeddings/cache.py`:
- Around line 15-23: Include the embedding model identity in query cache keys to
avoid cross-model cache pollution: change _query_key to accept a model
identifier (e.g., model_id or model_name) and incorporate that identifier into
the SHA256 input, update cache_query_vec to pass the active model id into
_query_key, and update the corresponding cache read function (e.g.,
get_cached_query_vec or the lookup at the other location referenced) to use the
same model-aware _query_key; optionally add a migration/cleanup step to
invalidate or remove legacy keys lacking the model id.

In `@src/vouch/embeddings/dedup.py`:
- Around line 76-77: scan_all can crash when embeddings have mixed dimensions
because the code does an unguarded dot product (vecs[k1] @ vecs[k2]); update the
pairwise loop in scan_all (or the function performing the check) to skip any
pair where vecs[k1].shape != vecs[k2].shape (or lengths differ) before computing
cos, and continue to the next pair instead of attempting the dot product,
ensuring the rest of the audit runs even if a mismatched embedding is present;
reference the variables vecs, k1, k2 and threshold when adding the guard and
skip logic.

In `@src/vouch/embeddings/fusion.py`:
- Around line 22-27: The rrf_fuse function can divide by zero if callers pass a
negative k (e.g., k = -1 makes k + rank == 0 for rank==1); add an input guard at
the top of rrf_fuse that validates k (for example require k >= 0) and raise a
clear ValueError if the check fails so the loop using 1.0 / (k + rank) cannot
encounter division-by-zero.

In `@src/vouch/embeddings/migration.py`:
- Around line 38-83: The current code wraps entire loops in broad AttributeError
handlers which can mask real errors; instead, for each collection, check method
existence up front (e.g., list_claims = getattr(store, "list_claims", None); if
callable(list_claims): for c in list_claims(): store._embed_and_store(...)) and
remove the surrounding try/except blocks so per-item exceptions surface; do this
for list_claims, list_pages, list_sources, list_entities, list_relations, and
list_evidence, keep using store._embed_and_store for each item, and only call
index_db.set_embedding_meta(store.kb_dir, model=embedder.name,
version=embedder.version, dim=embedder.dim) after these conditional loops.

In `@src/vouch/embeddings/rerank.py`:
- Line 49: The slice return reranked[:top_k] should reject negative top_k to
avoid surprising shapes; in the function that returns reranked (look for the
function handling reranking that assigns the variable `reranked` and returns
it), validate the `top_k` parameter and raise a ValueError if top_k is negative
(e.g., `if top_k is not None and top_k < 0: raise ValueError("top_k must be
non-negative")`), leaving the existing slicing behavior for None or non-negative
values unchanged.

In `@src/vouch/embeddings/scorer.py`:
- Around line 54-71: Before iterating queries validate the requested metrics
list: define the allowed metric names (e.g. "recall@k", "mrr", "ndcg"), compute
unknown = set(metrics) - allowed, and if unknown is non-empty raise a ValueError
listing the unknown metric names; keep the rest of the logic in scorer.py (the
totals={} initialization and the loop using recall_at_k, mrr, ndcg and
index_db.search_semantic) unchanged so typos no longer silently yield zeros.
- Around line 56-59: The loop reading JSONL in scorer.py currently calls
json.loads on every line and will raise JSONDecodeError on blank/whitespace-only
lines; update the code in the block that opens queries_file (the for line in f:
loop) to skip lines that are empty or contain only whitespace (e.g., check if
line.strip() is falsy) before calling json.loads, so row = json.loads(line) and
subsequent use of q = row["query"] only run for non-blank lines.

In `@src/vouch/index_db.py`:
- Around line 294-301: The SQL uses the SELECT alias "score" inside WHERE which
SQLite doesn't allow; update the query in the conn.execute call so the WHERE
repeats the expression instead of referencing the alias (e.g. use "AND 1.0 -
vec_distance_cosine(vec, ?) >= ?" while still selecting "1.0 -
vec_distance_cosine(vec, ?) AS score"), and adjust the parameter tuple passed to
conn.execute so q.tobytes() appears for each ? for the vector expression (i.e.
pass q.tobytes() twice along with *kinds, min_score, limit) to keep parameter
ordering consistent for the rows assignment in index_db.py.

In `@src/vouch/jsonl_server.py`:
- Around line 79-107: Validate the incoming backend_arg early and reject any
value not in the supported set ("auto", "embedding", "fts5", "substring",
"hybrid") instead of falling through to an empty result; add a check at the top
of the branch handling (where backend_arg and used are initialized) and
return/raise an invalid_request error consistent with MCP search handling if the
value is unsupported. Locate the validation near the variables backend_arg and
used and ensure any caller-visible error follows the same error shape/mechanism
used elsewhere in this module (e.g., the JSONL response error path), so
search_semantic, search, search_substring and the hybrid rrf_fuse branch remain
unchanged except for pre-validation of backend_arg.

In `@src/vouch/server.py`:
- Around line 126-132: Hybrid branch currently calls index_db.search_semantic
without passing min_score so results are not filtered like embedding mode;
update the call in the hybrid block (the index_db.search_semantic invocation
that produces emb) to include the same min_score parameter used elsewhere (e.g.,
index_db.search(..., min_score=min_score)) so emb =
index_db.search_semantic(store.kb_dir, query, limit=limit * 2,
min_score=min_score), leaving the subsequent rrf_fuse(emb, fts, limit=limit)
unchanged.
- Around line 112-120: The code currently swallows all exceptions from
index_db.search regardless of whether backend is "auto" or "fts5", which masks
real FTS errors for explicit backend="fts5"; change the control flow so that
index_db.search exceptions are only suppressed for backend == "auto" (e.g.,
perform the try/except only in the auto branch and set hits=[] on error), while
for backend == "fts5" call index_db.search without catching Exception so real
errors propagate; adjust the subsequent logic that calls _to_dicts(hits, "fts5")
accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4dab7d36-73a7-4762-9709-3030db0a1c5e

📥 Commits

Reviewing files that changed from the base of the PR and between 743e492 and 7c74231.

📒 Files selected for processing (22)
  • pyproject.toml
  • src/vouch/embeddings/__init__.py
  • src/vouch/embeddings/cache.py
  • src/vouch/embeddings/dedup.py
  • src/vouch/embeddings/fusion.py
  • src/vouch/embeddings/hyde.py
  • src/vouch/embeddings/migration.py
  • src/vouch/embeddings/rerank.py
  • src/vouch/embeddings/scorer.py
  • src/vouch/index_db.py
  • src/vouch/jsonl_server.py
  • src/vouch/server.py
  • src/vouch/storage.py
  • tests/embeddings/conftest.py
  • tests/embeddings/test_dedup.py
  • tests/embeddings/test_fusion.py
  • tests/embeddings/test_hyde.py
  • tests/embeddings/test_migration.py
  • tests/embeddings/test_rerank.py
  • tests/embeddings/test_scorer.py
  • tests/embeddings/test_search.py
  • tests/embeddings/test_storage.py

Comment thread src/vouch/embeddings/rerank.py
Comment thread src/vouch/index_db.py
Comment thread src/vouch/jsonl_server.py
Comment on lines +99 to +104
if backend_arg == "hybrid":
from .embeddings.fusion import (
rrf_fuse, # type: ignore[import-not-found,import-untyped,unused-ignore]
)
emb = index_db.search_semantic(s.kb_dir, q, limit=limit * 2)
fts = index_db.search(s.kb_dir, q, limit=limit * 2)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Apply min_score in hybrid semantic leg.

At Line 103, hybrid mode ignores min_score for semantic candidates, so threshold semantics differ from backend="embedding".

Proposed fix
-        emb = index_db.search_semantic(s.kb_dir, q, limit=limit * 2)
+        emb = index_db.search_semantic(
+            s.kb_dir, q, limit=limit * 2, min_score=min_score
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if backend_arg == "hybrid":
from .embeddings.fusion import (
rrf_fuse, # type: ignore[import-not-found,import-untyped,unused-ignore]
)
emb = index_db.search_semantic(s.kb_dir, q, limit=limit * 2)
fts = index_db.search(s.kb_dir, q, limit=limit * 2)
if backend_arg == "hybrid":
from .embeddings.fusion import (
rrf_fuse, # type: ignore[import-not-found,import-untyped,unused-ignore]
)
emb = index_db.search_semantic(
s.kb_dir, q, limit=limit * 2, min_score=min_score
)
fts = index_db.search(s.kb_dir, q, limit=limit * 2)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/jsonl_server.py` around lines 99 - 104, Hybrid mode currently
obtains semantic candidates via index_db.search_semantic without applying the
min_score threshold, causing inconsistent filtering versus backend="embedding";
update the hybrid branch (where backend_arg == "hybrid" and emb is assigned from
index_db.search_semantic) to apply the same min_score used by the embedding
backend — either by passing min_score into index_db.search_semantic if supported
or by filtering the returned emb list to only include items with score >=
min_score before fusing with rrf_fuse (refer to variables/backend_arg, emb,
min_score, index_db.search_semantic, and rrf_fuse).

@dripsmvcp dripsmvcp force-pushed the feat/semantic-search-phase-6-capabilities branch 3 times, most recently from 1a428c7 to 23c4fcc Compare May 21, 2026 05:34
@dripsmvcp dripsmvcp force-pushed the feat/semantic-search-phase-6-capabilities branch from 23c4fcc to b4006a9 Compare May 21, 2026 05:37

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (8)
src/vouch/jsonl_server.py (2)

99-103: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Apply min_score in hybrid semantic retrieval.

Hybrid mode currently bypasses the threshold on the semantic leg, so filtering differs from backend="embedding".

Suggested fix
-        emb = index_db.search_semantic(s.kb_dir, q, limit=limit * 2)
+        emb = index_db.search_semantic(
+            s.kb_dir, q, limit=limit * 2, min_score=min_score
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/jsonl_server.py` around lines 99 - 103, Hybrid mode currently
ignores the semantic threshold: when backend_arg == "hybrid" filter the semantic
search results returned by index_db.search_semantic using the same min_score
used by embedding-only retrieval before calling rrf_fuse so the semantic leg
respects the threshold; specifically locate the block with backend_arg,
index_db.search_semantic, index_db.search and rrf_fuse and apply the min_score
filter to the emb results (keeping the limit logic) or pass filtered emb into
rrf_fuse so hits reflect the same min_score behavior as backend="embedding".

79-84: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject unknown backend values explicitly.

Unsupported backend values currently fall through and return empty hits, which hides request errors. Validate allowed values and raise ValueError.

Suggested fix
     backend_arg = p.get("backend", "auto")
+    allowed = {"auto", "embedding", "fts5", "substring", "hybrid"}
+    if backend_arg not in allowed:
+        raise ValueError(f"unknown backend: {backend_arg}")
     min_score = float(p.get("min_score", 0.0))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/jsonl_server.py` around lines 79 - 84, The code lets unknown
backend_arg values fall through silently; update the handling around backend_arg
(variable backend_arg and used) so you explicitly validate allowed values (e.g.,
"auto", "embedding", and any other supported names used elsewhere) and raise a
ValueError for unsupported values; implement this by converting the existing if
backend_arg in ("auto", "embedding"): ... elif backend_arg == "X": ... pattern
into an if/elif/else and in the else branch raise ValueError(f"Unsupported
backend: {backend_arg}"), ensuring used is set only for supported branches and
hits remains untouched otherwise.
src/vouch/embeddings/dedup.py (1)

76-77: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard mixed-dimension vectors before dot-product.

Line 76 can raise on shape mismatch and abort the full duplicate scan when embeddings with different dimensions coexist. Skip mismatched pairs before computing cosine.

Suggested fix
-            cos = float(vecs[k1] @ vecs[k2])
+            if vecs[k1].shape != vecs[k2].shape:
+                continue
+            cos = float(vecs[k1] @ vecs[k2])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/embeddings/dedup.py` around lines 76 - 77, Guard the dot-product
between embeddings by skipping pairs with mismatched dimensions: before
computing cos = float(vecs[k1] @ vecs[k2]) check that vecs[k1] and vecs[k2] have
compatible shapes (e.g., same length / same .shape or both 1-D) and
continue/skip the pair if they differ; update the loop in
src/vouch/embeddings/dedup.py where vecs, k1, k2 and threshold are used to
perform this shape check so a shape-mismatch cannot raise and abort the
duplicate scan.
src/vouch/embeddings/scorer.py (2)

56-59: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Skip empty JSONL lines before parsing.

Line 58 will fail on blank lines. Ignore whitespace-only lines before json.loads.

Suggested fix
     with queries_file.open() as f:
         for line in f:
+            if not line.strip():
+                continue
             row = json.loads(line)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/embeddings/scorer.py` around lines 56 - 59, The loop that reads
queries_file in scorer.py calls json.loads on every line and will crash on
blank/whitespace-only lines; update the loop inside the with block that
processes each line (the code around queries_file.open() and the for line in f)
to strip whitespace from line, skip/continue when the stripped line is empty,
then call json.loads on the non-empty line to extract row and q = row["query"]
so blank JSONL lines are ignored safely.

54-55: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject unknown metric names up front.

Line 54 currently accepts arbitrary metric strings and can return misleading zeros for typos. Validate against the supported metric set first.

Suggested fix
 def evaluate(
@@
 ) -> dict[str, float]:
     """Run a metric sweep over a JSONL queries file."""
     from .. import index_db
+    valid = {"recall@k", "mrr", "ndcg"}
+    unknown = set(metrics) - valid
+    if unknown:
+        raise ValueError(f"Unknown metrics: {sorted(unknown)}")
     totals = {m: 0.0 for m in metrics}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/embeddings/scorer.py` around lines 54 - 55, The code currently
builds totals = {m: 0.0 for m in metrics} allowing arbitrary metric names and
silently returning zeros for typos; before that initialization validate the
incoming metrics list against the supported metric set (e.g., SUPPORTED_METRICS
or supported_metrics in scorer.py) and raise a clear exception (ValueError) if
any unknown metric is present so callers get immediate feedback rather than
misleading results; update the validation near where totals and n are set and
reference the metrics variable and the module's supported metric constant in the
error message.
src/vouch/embeddings/migration.py (1)

38-83: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Narrow AttributeError handling in backfill loops.

Catching AttributeError around whole loops can swallow real per-item failures while Lines 81-83 still update embedding metadata. Restrict fallback to missing list_* methods only.

Suggested fix pattern
-    try:
-        for c in store.list_claims():
-            store._embed_and_store(kind="claim", id=c.id, text=c.text)
-            touched += 1
-    except AttributeError:
-        pass
+    list_claims = getattr(store, "list_claims", None)
+    if callable(list_claims):
+        for c in list_claims():
+            store._embed_and_store(kind="claim", id=c.id, text=c.text)
+            touched += 1

Apply the same pattern to list_pages, list_sources, list_entities, list_relations, and list_evidence.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/embeddings/migration.py` around lines 38 - 83, The AttributeError
handlers around entire backfill loops are too broad and can hide per-item
failures; change each try/except so you only skip the whole loop when the store
is missing the list_* method (e.g., check hasattr(store, "list_claims") or wrap
only the call to store.list_claims() in a try/except AttributeError), then
iterate and call store._embed_and_store normally so item-level exceptions
surface; apply this pattern to list_pages, list_sources, list_entities,
list_relations, and list_evidence, leaving index_db.set_embedding_meta and
embedder.name/version/dim updates unchanged.
src/vouch/embeddings/rerank.py (1)

43-49: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate reranker score count and top_k before slicing.

Line 46 currently permits silent truncation on score-count mismatch, and Line 49 allows negative top_k semantics. Enforce both contracts explicitly.

Suggested fix
     scores = reranker.score(query, candidates)
+    if len(scores) != len(candidates):
+        msg = f"Reranker returned {len(scores)} scores for {len(candidates)} candidates"
+        raise ValueError(msg)
+    if top_k < 0:
+        raise ValueError("top_k must be >= 0")
     reranked = [
         (kind, id_, snip, score)
-        for (kind, id_, snip, _orig), score in zip(hits, scores, strict=False)
+        for (kind, id_, snip, _orig), score in zip(hits, scores, strict=True)
     ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/embeddings/rerank.py` around lines 43 - 49, The code currently zips
hits with scores and slices by top_k without validating inputs; update the
rerank routine to (1) call reranker.score(query, candidates) and assert the
returned scores length equals len(hits) (raise a clear ValueError referencing
reranker.score and hits if mismatch), (2) validate that top_k is an int >= 0
(raise ValueError mentioning top_k) and treat None or 0 appropriately, and (3)
when returning results, slice with min(top_k, len(reranked)) instead of allowing
negative or out-of-range slicing; reference the variables and calls
reranker.score, hits, scores, reranked, and top_k so the checks are added
immediately before building/slicing reranked.
src/vouch/embeddings/fusion.py (1)

22-27: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate k before reciprocal-rank scoring.

Line 27 can divide by zero when k <= -1. Add a parameter guard up front.

Suggested fix
 def rrf_fuse(a: Hits, b: Hits, *, limit: int = 10, k: int = 60) -> Hits:
     """Reciprocal Rank Fusion: score = sum(1 / (k + rank)) across lists."""
+    if k <= -1:
+        raise ValueError("k must be greater than -1")
     scores: dict[tuple[str, str], float] = {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/embeddings/fusion.py` around lines 22 - 27, r rf_fuse currently
computes 1.0 / (k + rank) and can hit a zero or unexpected denominator; add an
upfront guard in rrf_fuse that validates k (e.g., require k >= 0) and raise a
clear ValueError if the check fails, mentioning the parameter k and the
denominator (k + rank) so callers know why the input is invalid; locate this
check at the top of rrf_fuse before the loop that uses k.
🧹 Nitpick comments (1)
tests/embeddings/test_storage.py (1)

138-153: ⚡ Quick win

test_search_works_under_both_backends doesn’t actually cover both paths.

Right now it validates one search call only. Consider splitting into two tests (or parametrizing) that explicitly force each backend branch so regressions in either path are caught.

One way to make coverage explicit
-def test_search_works_under_both_backends(kb_dir: Path) -> None:
+@pytest.mark.parametrize("force_vec_loaded", [True, False])
+def test_search_works_under_both_backends(
+    kb_dir: Path, monkeypatch: pytest.MonkeyPatch, force_vec_loaded: bool
+) -> None:
@@
-    hits = index_db.search_embedding(
+    monkeypatch.setattr(index_db, "_load_sqlite_vec", lambda _conn: force_vec_loaded)
+    hits = index_db.search_embedding(
         kb_dir, query_vec=vecs[0], kinds=("claim",), limit=3,
     )
     assert hits[0][1] == "c0"  # exact self-match
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/embeddings/test_storage.py` around lines 138 - 153, Split or
parametrize test_search_works_under_both_backends so it explicitly runs the
search against both backend branches instead of a single call: create two
subtests (or a pytest.mark.parametrize) that set the backend selection used by
index_db.search_embedding (e.g., by patching whatever flag/function determines
the backend or by invoking the internal backend-specific call directly),
populate the DB via index_db.open_db and index_db.put_embedding as before, run
index_db.search_embedding for each backend variant, and assert the same
expectations (hits[0][1] == "c0") for both runs; reference the existing
test_search_works_under_both_backends function and
index_db.search_embedding/index_db.open_db/index_db.put_embedding when
implementing the changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/vouch/index_db.py`:
- Around line 309-319: In search_embedding (the non-sqlite-vec fallback loop)
you must skip rows whose stored vector dimensionality doesn't match the query
vector q to avoid NumPy matmul errors: after converting blob -> vec via
_blob_to_vec and before computing score = float(q @ vec), compare dim (the
stored dimension) with q.shape[0] (or len(q)) and continue/skip if they differ;
optionally record or surface the mismatch using existing helpers like
embedding_meta/detect_mismatch/backfill_embeddings instead of scoring. Ensure
the check is applied to the loop that builds scored (variables: kind, id_, blob,
dim, q, min_score, _blob_to_vec) so only matching-dimension vectors are scored
and appended.

In `@src/vouch/storage.py`:
- Around line 486-509: The embedding/dedup steps after get_embedder()
(embedder.encode, _index_db.put_embedding, _index_db.set_embedding_meta, and
check_and_log) must be best-effort and must not raise back to callers like
put_source/put_claim/put_page; wrap the entire sequence (from vec =
embedder.encode(text) through the dedup check_and_log call) in a broad
try/except that catches Exception, logs the failure with context
(embedder.name/version, kind, id, and error), and otherwise continues, so
transient embedding or dedup errors don't abort the successful file write.

In `@tests/embeddings/test_search.py`:
- Line 113: The test currently asserts backend membership only in ("embedding",
"fts5", "substring") which fails when backend == "hybrid"; update the assertions
in tests/embeddings/test_search.py that check resp["result"][0]["backend"] (and
the similar assertion at the other occurrence) to include "hybrid" in the
allowed tuple so the test accepts hybrid as a valid backend value.

---

Duplicate comments:
In `@src/vouch/embeddings/dedup.py`:
- Around line 76-77: Guard the dot-product between embeddings by skipping pairs
with mismatched dimensions: before computing cos = float(vecs[k1] @ vecs[k2])
check that vecs[k1] and vecs[k2] have compatible shapes (e.g., same length /
same .shape or both 1-D) and continue/skip the pair if they differ; update the
loop in src/vouch/embeddings/dedup.py where vecs, k1, k2 and threshold are used
to perform this shape check so a shape-mismatch cannot raise and abort the
duplicate scan.

In `@src/vouch/embeddings/fusion.py`:
- Around line 22-27: r rf_fuse currently computes 1.0 / (k + rank) and can hit a
zero or unexpected denominator; add an upfront guard in rrf_fuse that validates
k (e.g., require k >= 0) and raise a clear ValueError if the check fails,
mentioning the parameter k and the denominator (k + rank) so callers know why
the input is invalid; locate this check at the top of rrf_fuse before the loop
that uses k.

In `@src/vouch/embeddings/migration.py`:
- Around line 38-83: The AttributeError handlers around entire backfill loops
are too broad and can hide per-item failures; change each try/except so you only
skip the whole loop when the store is missing the list_* method (e.g., check
hasattr(store, "list_claims") or wrap only the call to store.list_claims() in a
try/except AttributeError), then iterate and call store._embed_and_store
normally so item-level exceptions surface; apply this pattern to list_pages,
list_sources, list_entities, list_relations, and list_evidence, leaving
index_db.set_embedding_meta and embedder.name/version/dim updates unchanged.

In `@src/vouch/embeddings/rerank.py`:
- Around line 43-49: The code currently zips hits with scores and slices by
top_k without validating inputs; update the rerank routine to (1) call
reranker.score(query, candidates) and assert the returned scores length equals
len(hits) (raise a clear ValueError referencing reranker.score and hits if
mismatch), (2) validate that top_k is an int >= 0 (raise ValueError mentioning
top_k) and treat None or 0 appropriately, and (3) when returning results, slice
with min(top_k, len(reranked)) instead of allowing negative or out-of-range
slicing; reference the variables and calls reranker.score, hits, scores,
reranked, and top_k so the checks are added immediately before building/slicing
reranked.

In `@src/vouch/embeddings/scorer.py`:
- Around line 56-59: The loop that reads queries_file in scorer.py calls
json.loads on every line and will crash on blank/whitespace-only lines; update
the loop inside the with block that processes each line (the code around
queries_file.open() and the for line in f) to strip whitespace from line,
skip/continue when the stripped line is empty, then call json.loads on the
non-empty line to extract row and q = row["query"] so blank JSONL lines are
ignored safely.
- Around line 54-55: The code currently builds totals = {m: 0.0 for m in
metrics} allowing arbitrary metric names and silently returning zeros for typos;
before that initialization validate the incoming metrics list against the
supported metric set (e.g., SUPPORTED_METRICS or supported_metrics in scorer.py)
and raise a clear exception (ValueError) if any unknown metric is present so
callers get immediate feedback rather than misleading results; update the
validation near where totals and n are set and reference the metrics variable
and the module's supported metric constant in the error message.

In `@src/vouch/jsonl_server.py`:
- Around line 99-103: Hybrid mode currently ignores the semantic threshold: when
backend_arg == "hybrid" filter the semantic search results returned by
index_db.search_semantic using the same min_score used by embedding-only
retrieval before calling rrf_fuse so the semantic leg respects the threshold;
specifically locate the block with backend_arg, index_db.search_semantic,
index_db.search and rrf_fuse and apply the min_score filter to the emb results
(keeping the limit logic) or pass filtered emb into rrf_fuse so hits reflect the
same min_score behavior as backend="embedding".
- Around line 79-84: The code lets unknown backend_arg values fall through
silently; update the handling around backend_arg (variable backend_arg and used)
so you explicitly validate allowed values (e.g., "auto", "embedding", and any
other supported names used elsewhere) and raise a ValueError for unsupported
values; implement this by converting the existing if backend_arg in ("auto",
"embedding"): ... elif backend_arg == "X": ... pattern into an if/elif/else and
in the else branch raise ValueError(f"Unsupported backend: {backend_arg}"),
ensuring used is set only for supported branches and hits remains untouched
otherwise.

---

Nitpick comments:
In `@tests/embeddings/test_storage.py`:
- Around line 138-153: Split or parametrize
test_search_works_under_both_backends so it explicitly runs the search against
both backend branches instead of a single call: create two subtests (or a
pytest.mark.parametrize) that set the backend selection used by
index_db.search_embedding (e.g., by patching whatever flag/function determines
the backend or by invoking the internal backend-specific call directly),
populate the DB via index_db.open_db and index_db.put_embedding as before, run
index_db.search_embedding for each backend variant, and assert the same
expectations (hits[0][1] == "c0") for both runs; reference the existing
test_search_works_under_both_backends function and
index_db.search_embedding/index_db.open_db/index_db.put_embedding when
implementing the changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 296d22aa-86ae-4d04-88d5-ff771f17778b

📥 Commits

Reviewing files that changed from the base of the PR and between 7c74231 and 23c4fcc.

📒 Files selected for processing (20)
  • src/vouch/embeddings/__init__.py
  • src/vouch/embeddings/cache.py
  • src/vouch/embeddings/dedup.py
  • src/vouch/embeddings/fusion.py
  • src/vouch/embeddings/hyde.py
  • src/vouch/embeddings/migration.py
  • src/vouch/embeddings/rerank.py
  • src/vouch/embeddings/scorer.py
  • src/vouch/index_db.py
  • src/vouch/jsonl_server.py
  • src/vouch/server.py
  • src/vouch/storage.py
  • tests/embeddings/test_dedup.py
  • tests/embeddings/test_fusion.py
  • tests/embeddings/test_hyde.py
  • tests/embeddings/test_migration.py
  • tests/embeddings/test_rerank.py
  • tests/embeddings/test_scorer.py
  • tests/embeddings/test_search.py
  • tests/embeddings/test_storage.py
✅ Files skipped from review due to trivial changes (2)
  • src/vouch/embeddings/init.py
  • src/vouch/server.py

Comment thread src/vouch/index_db.py
Comment on lines +309 to +319
rows = conn.execute(
f"SELECT kind, id, vec, dim FROM embedding_index "
f"WHERE kind IN ({placeholders})",
kinds,
).fetchall()
scored: list[tuple[str, str, str, float]] = []
for kind, id_, blob, dim in rows:
vec = _blob_to_vec(blob, dim)
score = float(q @ vec)
if score >= min_score:
scored.append((kind, id_, "", score))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate target file
ls -la src/vouch/index_db.py

# Show the relevant region with line numbers
nl -ba src/vouch/index_db.py | sed -n '250,360p'

# Show helper(s) and any uses/definition of _blob_to_vec and dim handling
rg -n "_blob_to_vec|embedding_index|dim" src/vouch/index_db.py

# Search whole repo for embedding_index schema / migrations
rg -n "CREATE TABLE embedding_index|embedding_index\s*\(" -S .

# Identify how dim mismatch is handled elsewhere (if any)
rg -n "dim\s*!=(|!=)|q @ vec|vec_distance_cosine|tobytes\(\)" -S src/vouch

Repository: vouchdev/vouch

Length of output: 191


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="src/vouch/index_db.py"

# Print target section with line numbers (via python)
python3 - <<'PY'
import itertools
path="src/vouch/index_db.py"
start=250
end=360
with open(path,'r',encoding='utf-8') as f:
    for i,line in enumerate(f, start=1):
        if start <= i <= end:
            print(f"{i:5d}\t{line.rstrip()}")
PY

# Targeted searches
rg -n "_blob_to_vec|q\s*@\s*vec|vec_distance_cosine|embedding_index|SELECT kind, id, vec, dim" "$FILE" -S

Repository: vouchdev/vouch

Length of output: 4754


🏁 Script executed:

#!/bin/bash
set -euo pipefail
FILE="src/vouch/index_db.py"

python3 - <<'PY'
path="src/vouch/index_db.py"
ranges=[(1,120),(150,260)]
for start,end in ranges:
    print(f"\n# --- lines {start}-{end} ---")
    with open(path,'r',encoding='utf-8') as f:
        for i,line in enumerate(f, start=1):
            if start<=i<=end:
                print(f"{i:5d}\t{line.rstrip()}")
PY

rg -n "CREATE TABLE IF NOT EXISTS embedding_index|INSERT OR REPLACE INTO embedding_index|model|dim\s*," "$FILE" -S

Repository: vouchdev/vouch

Length of output: 9850


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search for dim/embedding_dim usage across the repo
rg -n "embedding_dim|embedding_model|embedding_model_version|\bdim\b" src -S

# Specifically trace query vector generation and whether its dim is checked against DB meta
rg -n "query_embedding_cache|lookup_query_vec|cache_query_vec|get_embedder|encode\(" src/vouch -S

# Find all calls to search_embedding
rg -n "search_embedding\(" -S src/vouch

Repository: vouchdev/vouch

Length of output: 7188


🏁 Script executed:

#!/bin/bash
set -euo pipefail
FILE_MIG="src/vouch/embeddings/migration.py"
FILE_STORAGE="src/vouch/storage.py"
FILE_DEDUP="src/vouch/embeddings/dedup.py"

python3 - <<'PY'
import pathlib
files=["src/vouch/embeddings/migration.py","src/vouch/storage.py","src/vouch/embeddings/dedup.py"]
for f in files:
    p=pathlib.Path(f)
    print(f"\n==== {f} ({p.stat().st_size} bytes) ====")
PY

# Print key regions
python3 - <<'PY'
def show(path, start, end):
    with open(path,'r',encoding='utf-8') as f:
        for i,line in enumerate(f, start=1):
            if start<=i<=end:
                print(f"{i:5d}\t{line.rstrip()}")
show("src/vouch/embeddings/migration.py",1,220)
PY

python3 - <<'PY'
# storage: look for where embeddings are persisted and dim/meta validated
show=lambda path,start,end: __import__("builtins").exec
PY

Repository: vouchdev/vouch

Length of output: 3248


🏁 Script executed:

#!/bin/bash
set -euo pipefail

python3 - <<'PY'
import pathlib
def show(path,start,end):
    with open(path,'r',encoding='utf-8') as f:
        for i,line in enumerate(f, start=1):
            if start<=i<=end:
                print(f"{i:5d}\t{line.rstrip()}")
for f in ["src/vouch/embeddings/migration.py","src/vouch/storage.py"]:
    p=pathlib.Path(f)
    print(f"\n==== {f} ({p.stat().st_size} bytes) ==== ")
PY

python3 - <<'PY'
def show(path,start,end):
    with open(path,'r',encoding='utf-8') as f:
        for i,line in enumerate(f, start=1):
            if start<=i<=end:
                print(f"{i:5d}\t{line.rstrip()}")

print("\n--- migration.py 1-160 ---")
show("src/vouch/embeddings/migration.py",1,160)

print("\n--- migration.py 160-320 ---")
show("src/vouch/embeddings/migration.py",160,320)

print("\n--- storage.py 430-560 ---")
show("src/vouch/storage.py",430,560)

print("\n--- storage.py 560-720 ---")
show("src/vouch/storage.py",560,720)
PY

# Find where mismatch detection/backfill is called
rg -n "detect_mismatch|backfill_embeddings|backfill|embedding_meta\(" src/vouch -S

Repository: vouchdev/vouch

Length of output: 11113


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "detect_mismatch\(|backfill_embeddings\(" -S src
rg -n "embedding_dim|embedding_model" -S src/vouch | head -n 50

Repository: vouchdev/vouch

Length of output: 745


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "embeddings\.migration|backfill_embeddings|detect_mismatch" -S src

Repository: vouchdev/vouch

Length of output: 260


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "backfill_embeddings|detect_mismatch|embedding_dim|embedding_model_version" -S .

Repository: vouchdev/vouch

Length of output: 4081


🏁 Script executed:

#!/bin/bash
set -euo pipefail
FILE_CACHE="src/vouch/embeddings/cache.py"
FILE_DEDUP="src/vouch/embeddings/dedup.py"
FILE_MIGRATION="src/vouch/embeddings/migration.py"

python3 - <<'PY'
import pathlib
def show(path,start,end):
    with open(path,'r',encoding='utf-8') as f:
        for i,line in enumerate(f, start=1):
            if start<=i<=end:
                print(f"{i:5d}\t{line.rstrip()}")
print("\n--- cache.py 1-140 ---")
show("src/vouch/embeddings/cache.py",1,140)
print("\n--- dedup.py 1-140 ---")
show("src/vouch/embeddings/dedup.py",1,140)
print("\n--- migration.py 1-140 ---")
show("src/vouch/embeddings/migration.py",1,140)
PY

# Search for runtime use of migration/backfill (not docs/tests)
rg -n "from \.embeddings\.migration import|embeddings\.migration|detect_mismatch\(|backfill_embeddings\(" -S src tests

Repository: vouchdev/vouch

Length of output: 10353


🏁 Script executed:

#!/bin/bash
set -euo pipefail
sed -n '1,120p' src/vouch/embeddings/base.py

Repository: vouchdev/vouch

Length of output: 1943


Filter by embedding dimension before scoring in search_embedding.

search_embedding’s non-sqlite-vec fallback loop computes score = float(q @ vec) for every row selected, but it never checks that the row’s dim matches q’s dimension. If the DB contains vectors from a different embedding adapter (stale/partially backfilled KB), NumPy matmul will raise and abort the entire search. There’s also no runtime guard in search_embedding that uses embedding_meta/detect_mismatch/backfill_embeddings.

Suggested fix
 def search_embedding(
     kb_dir: Path,
@@
 ) -> list[tuple[str, str, str, float]]:
     import numpy as np
     q = np.asarray(query_vec, dtype=np.float32)
+    qdim = int(q.size)
     qnorm = float(np.linalg.norm(q))
     if qnorm > 0:
         q = q / qnorm
@@
                 rows = conn.execute(
                     f"SELECT kind, id, score FROM ("
                     f"  SELECT kind, id, 1.0 - vec_distance_cosine(vec, ?) AS score "
                     f"  FROM embedding_index "
-                    f"  WHERE kind IN ({placeholders})"
+                    f"  WHERE kind IN ({placeholders}) AND dim = ?"
                     f") WHERE score >= ? "
                     f"ORDER BY score DESC LIMIT ?",
-                    (q.tobytes(), *kinds, min_score, limit),
+                    (q.tobytes(), *kinds, qdim, min_score, limit),
                 ).fetchall()
                 return [(k, i, "", float(s)) for k, i, s in rows]
             except sqlite3.OperationalError:
                 pass
         rows = conn.execute(
             f"SELECT kind, id, vec, dim FROM embedding_index "
-            f"WHERE kind IN ({placeholders})",
-            kinds,
+            f"WHERE kind IN ({placeholders}) AND dim = ?",
+            (*kinds, qdim),
         ).fetchall()
     scored: list[tuple[str, str, str, float]] = []
     for kind, id_, blob, dim in rows:
+        if dim != qdim:
+            continue
         vec = _blob_to_vec(blob, dim)
         score = float(q @ vec)
         if score >= min_score:
             scored.append((kind, id_, "", score))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/index_db.py` around lines 309 - 319, In search_embedding (the
non-sqlite-vec fallback loop) you must skip rows whose stored vector
dimensionality doesn't match the query vector q to avoid NumPy matmul errors:
after converting blob -> vec via _blob_to_vec and before computing score =
float(q @ vec), compare dim (the stored dimension) with q.shape[0] (or len(q))
and continue/skip if they differ; optionally record or surface the mismatch
using existing helpers like embedding_meta/detect_mismatch/backfill_embeddings
instead of scoring. Ensure the check is applied to the loop that builds scored
(variables: kind, id_, blob, dim, q, min_score, _blob_to_vec) so only
matching-dimension vectors are scored and appended.

Comment thread src/vouch/storage.py
Comment on lines +486 to +509
try:
embedder = get_embedder()
except KeyError:
return
vec = embedder.encode(text)
with _index_db.open_db(self.kb_dir) as conn:
_index_db.put_embedding(
conn, kind=kind, id=id, vec=vec, content_hash=h,
model=embedder.name, model_version=embedder.version,
dim=embedder.dim,
)
_index_db.set_embedding_meta(
self.kb_dir, model=embedder.name,
version=embedder.version, dim=embedder.dim,
)
try:
# NB: dedup module is added in a later phase; ignore missing-stub
# / missing-module noise in CI's [dev]-only mypy run.
from .embeddings.dedup import ( # type: ignore[import-not-found,import-untyped,unused-ignore]
check_and_log,
)
check_and_log(self.kb_dir, kind=kind, id=id, vec=vec)
except ImportError:
pass

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep the embedding hook best-effort all the way through.

The docstring says this path is optional, but Lines 490-507 still let encode(), put_embedding(), set_embedding_meta(), and check_and_log() abort the caller. That means a transient embedding/dedup failure can break put_source, put_claim, put_page, etc., even though the file write already succeeded.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/storage.py` around lines 486 - 509, The embedding/dedup steps after
get_embedder() (embedder.encode, _index_db.put_embedding,
_index_db.set_embedding_meta, and check_and_log) must be best-effort and must
not raise back to callers like put_source/put_claim/put_page; wrap the entire
sequence (from vec = embedder.encode(text) through the dedup check_and_log call)
in a broad try/except that catches Exception, logs the failure with context
(embedder.name/version, kind, id, and error), and otherwise continues, so
transient embedding or dedup errors don't abort the successful file write.

})
assert resp["ok"] is True
assert resp["result"]
assert resp["result"][0]["backend"] in ("embedding", "fts5", "substring")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Allow hybrid in backend assertions.

These assertions can reject valid behavior now that backend selection includes "hybrid".

Suggested patch
-    assert resp["result"][0]["backend"] in ("embedding", "fts5", "substring")
+    assert resp["result"][0]["backend"] in ("embedding", "fts5", "substring", "hybrid")
@@
-    assert result["backend"] in ("embedding", "fts5", "substring")
+    assert result["backend"] in ("embedding", "fts5", "substring", "hybrid")

Also applies to: 127-127

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/embeddings/test_search.py` at line 113, The test currently asserts
backend membership only in ("embedding", "fts5", "substring") which fails when
backend == "hybrid"; update the assertions in tests/embeddings/test_search.py
that check resp["result"][0]["backend"] (and the similar assertion at the other
occurrence) to include "hybrid" in the allowed tuple so the test accepts hybrid
as a valid backend value.

@plind-junior

Copy link
Copy Markdown
Collaborator

Really clean phase breakdown — modules are well-isolated and the test
coverage for rerank/scorer/dedup is solid. Left inline comments on a
few correctness/cleanup points; main blockers are the broad
except AttributeError: pass blocks in migration.py, the dead
force parameter, and the non-strict zip in rerank.

Two questions outside the diff:

Happy to re-review once the migration cleanup lands. 🙏

…eview)

a reranker returning the wrong number of scores used to silently drop or
misalign results via zip(strict=False). now raises ValueError on a count
mismatch and on negative top_k instead of producing a corrupted ranking.
skip blank lines in the queries JSONL instead of crashing on
JSONDecodeError, and reject unknown metric names up front rather than
silently returning 0.0 for a typo.
…v#42 review)

the per-kind try/except AttributeError also swallowed errors raised inside
the loop body, then set_embedding_meta still ran as if migration succeeded.
now only missing list-methods are skipped (via getattr) and any real error
propagates. also drops the unused force parameter.
…dev#42 review)

after a model migration the embedding_index can hold vectors of different
dims for the same kind; the dot product would raise and abort the whole
scan. skip incomparable shape pairs instead.
@plind-junior plind-junior merged commit f6346ad into vouchdev:main May 22, 2026
5 checks passed
plind-junior pushed a commit that referenced this pull request May 22, 2026
a reranker returning the wrong number of scores used to silently drop or
misalign results via zip(strict=False). now raises ValueError on a count
mismatch and on negative top_k instead of producing a corrupted ranking.
plind-junior pushed a commit that referenced this pull request May 22, 2026
skip blank lines in the queries JSONL instead of crashing on
JSONDecodeError, and reject unknown metric names up front rather than
silently returning 0.0 for a typo.
plind-junior pushed a commit that referenced this pull request May 22, 2026
…iew)

the per-kind try/except AttributeError also swallowed errors raised inside
the loop body, then set_embedding_meta still ran as if migration succeeded.
now only missing list-methods are skipped (via getattr) and any real error
propagates. also drops the unused force parameter.
plind-junior pushed a commit that referenced this pull request May 22, 2026
…eview)

after a model migration the embedding_index can hold vectors of different
dims for the same kind; the dot product would raise and abort the whole
scan. skip incomparable shape pairs instead.
plind-junior added a commit that referenced this pull request May 22, 2026
…pabilities

feat(embeddings): rerank, HyDE, dedup, scorer, model-identity migration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants