feat(embeddings): rerank, HyDE, dedup, scorer, model-identity migration#42
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughAdds five new embeddings modules: HyDE query expansion, ingest-time duplicate detection with scan and logging, embedding model migration and backfill, cross-encoder reranking with hit scoring, and retrieval evaluation metrics (recall@k, MRR, NDCG). All modules include comprehensive test coverage. ChangesEmbedding Retrieval Features
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 15
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/vouch/jsonl_server.py (1)
74-109:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAlign JSONL
kb.searchresponse shape with MCPkb_search.
_h_searchstill returns a raw list, but MCPkb_searchnow returns{"backend": ..., "hits": [...]}. This breaks the “same surface” contract across transports.Proposed fix
-def _h_search(p: dict) -> list[dict]: +def _h_search(p: dict) -> dict[str, Any]: @@ - return [ - {"kind": k, "id": i, "snippet": sn, "score": sc, "backend": used} - for k, i, sn, sc in hits - ] + return { + "backend": used, + "hits": [ + {"kind": k, "id": i, "snippet": sn, "score": sc, "backend": used} + for k, i, sn, sc in hits + ], + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/jsonl_server.py` around lines 74 - 109, _h_search currently returns a raw list of hit dicts but MCP's kb_search returns an envelope {"backend": ..., "hits": [...]}; change _h_search to return that envelope instead. Update the return value and type hint of _h_search to return a dict with keys "backend" (the used variable) and "hits" (the list comprehension currently produced), i.e. wrap the existing list comprehension into {"backend": used, "hits": [...]} and keep existing logic that sets used (references: _h_search, index_db.search_semantic, index_db.search, s.search_substring, rrf_fuse). Also search for and update any callers that expect a raw list to handle the new envelope shape.
♻️ Duplicate comments (3)
src/vouch/embeddings/st_minilm.py (1)
11-11:⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy liftResolve numpy dependency configuration (duplicate issue).
Same module-level numpy import issue as
base.pyline 17. See the comment on that file for details and verification steps.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/embeddings/st_minilm.py` at line 11, The module-level "import numpy as np" in src/vouch/embeddings/st_minilm.py should be removed and replaced with a lazy/local import to avoid forcing a numpy dependency at import time; locate the functions that use numpy (the module-level symbol "np" and any embedding functions in st_minilm.py) and move "import numpy as np" inside those functions (or wrap it in a try/except that raises a clear error if numpy is missing), ensuring all references to "np" remain valid after the change.src/vouch/embeddings/fastembed_bge.py (1)
11-11:⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy liftResolve numpy dependency configuration (duplicate issue).
Same module-level numpy import issue as
base.pyline 17. See the comment on that file for details and verification steps.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/embeddings/fastembed_bge.py` at line 11, Move the module-level "import numpy as np" out of the top of src/vouch/embeddings/fastembed_bge.py and perform a lazy/local import where numpy is actually used (e.g., inside the function(s) that reference "np"); if importing may fail, wrap the import in try/except and raise a clear RuntimeError instructing to install numpy (similar to the fix applied in base.py), ensuring you replace top-level "np" usages with the locally imported name so there are no module-level numpy imports remaining.src/vouch/embeddings/st_mpnet.py (1)
12-12:⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy liftResolve numpy dependency configuration (duplicate issue).
Same module-level numpy import issue as
base.pyline 17. See the comment on that file for details and verification steps.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/embeddings/st_mpnet.py` at line 12, The module-level "import numpy as np" in st_mpnet.py causes the same global dependency issue noted in base.py; remove the top-level import and instead perform a local/lazy import of numpy inside the function(s) or method(s) that actually use it (keep the symbol name "np" when importing), or wrap the import in a try/except that raises a clear ImportError with guidance; update any references in functions/classes in this file to rely on the local import so no module-level numpy import remains.
🧹 Nitpick comments (2)
src/vouch/storage.py (1)
486-496: ⚡ Quick winUncaught exceptions from
embedder.encode()will fail artifact writes.The docstring states "Failures... are swallowed" but only
ImportErrorandKeyErrorare caught. Ifencode()raises (e.g., model loading error, out-of-memory, or network failure for API-backed embedders), the exception propagates and fails theput_*operation, preventing the artifact from being stored.If embeddings should be best-effort, wrap
encode()in a broader exception handler. Otherwise, clarify in the docstring that only missing-dependency errors are swallowed.Proposed fix: catch exceptions from encode()
try: embedder = get_embedder() except KeyError: return - vec = embedder.encode(text) + try: + vec = embedder.encode(text) + except Exception: + return # Embedding is best-effort; don't block artifact persistence with _index_db.open_db(self.kb_dir) as conn:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/storage.py` around lines 486 - 496, The call to embedder.encode(text) can raise runtime errors (model load OOM, network/API failures) which currently propagate and abort artifact writes; update the block around get_embedder()/embedder.encode() so encode() is wrapped in a broad try/except that logs the error and returns (i.e., swallow non-ImportError/KeyError exceptions to keep embeddings best-effort) before opening the DB and calling _index_db.put_embedding; reference embedder.encode, get_embedder, and _index_db.put_embedding when locating where to add the try/except and logging.src/vouch/embeddings/migration.py (1)
34-34: ⚡ Quick win
forceis unused and currently misleading.Line 34 exposes
forcebut behavior is identical regardless of value. Either implement force-gated behavior or remove the parameter before this API settles.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/embeddings/migration.py` at line 34, The exposed parameter force on backfill_embeddings is unused and misleading; remove the force parameter from the function signature (def backfill_embeddings(store: Any) -> int) and update all call sites that pass force to call backfill_embeddings without it, or alternatively if you prefer gated behavior, implement the gate inside backfill_embeddings to skip already-embedded records when force is False and only reprocess all records when force is True—refer to the backfill_embeddings function name and any callers invoking backfill_embeddings to make the change consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/superpowers/specs/2026-05-20-semantic-search-design.md`:
- Line 33: The fenced code block that begins with the listing
"src/vouch/embeddings/" is missing a language tag; update the opening fence from
``` to ```text (i.e., add the "text" language identifier) so the block is
properly marked and resolves markdownlint MD040.
In `@src/vouch/embeddings/cache.py`:
- Around line 15-23: The cache currently hashes only raw query text (in
_query_key) which can return embeddings from a different model; update the cache
key to include the embedding model identity (and optionally dimension) so keys
are scoped by model. Concretely, change _query_key to accept a model_id (or
build the combined key inside cache_query_vec by hashing f"{model_id}:{query}")
and use that when computing h in cache_query_vec (and update any callers of
_query_key accordingly); ensure the model identifier used is stable (e.g., model
name/version) and consider invalidating or migrating the existing
query_embedding_cache if needed.
- Line 10: Wrap the unconditional numpy import in cache.py with a
TYPE_CHECKING-only import so mypy can see numpy for type hints without requiring
it in [dev] extras: add "from typing import TYPE_CHECKING" and change "import
numpy as np" to "if TYPE_CHECKING: import numpy as np" (or alternatively add
"numpy>=1.26,<3" to dev dependencies); ensure any runtime code that uses np
still imports it at runtime (e.g., local import inside functions that actually
perform numeric work) so the module remains importable during CI type-checking.
In `@src/vouch/embeddings/dedup.py`:
- Around line 76-77: The dot-product vecs[k1] @ vecs[k2] can raise a
shape-mismatch; before computing cos in the dedup scan (where vecs, k1, k2, cos,
threshold are used) check that the two vectors have the same dimensionality
(e.g., compare .shape or len) and skip or handle mismatched pairs (log a warning
or continue) instead of performing the operation; alternatively catch the
exception around the dot-product and continue so the entire scan doesn't abort
when embeddings with different dim values exist.
- Line 9: The CI failure is caused by an unconditional "import numpy as np" in
src/vouch/embeddings/dedup.py while numpy is only declared under the
"embeddings" extra; fix by either adding "numpy>=1.26,<3" to the dev extras in
setup/pyproject so pip install -e '.[dev]' provides numpy, or update the CI
install command to include the embeddings extra (pip install -e
'.[dev,embeddings]') so the import in dedup.py resolves during type checking and
mypy finds the numpy stubs.
In `@src/vouch/embeddings/fusion.py`:
- Around line 22-27: The rrf_fuse function currently divides by (k + rank) and
can raise ZeroDivisionError if a caller passes k <= -1; add an upfront
validation in rrf_fuse that ensures k is an integer >= 0 (or at least k >
-rank_min where rank_min is 1) and raise a clear ValueError if the check fails.
Locate the rrf_fuse function and validate the parameter k before the loop
(referencing the k and rank variables) so the subsequent scores calculation
using 1.0 / (k + rank) is guaranteed safe.
In `@src/vouch/embeddings/migration.py`:
- Around line 38-83: The try/except blocks around each entire loop (e.g.,
store.list_claims(), store.list_pages(), store.list_sources(),
store.list_entities(), store.list_relations(), store.list_evidence()) currently
catch AttributeError for the whole loop and can hide per-item errors while still
allowing index_db.set_embedding_meta and embedder.name/version/dim to be written
as if the backfill succeeded; instead, narrow the exception handling by calling
the list_* iterator outside the try and only rescuing AttributeError when the
list_* method itself is missing, then wrap the per-item access and the call to
store._embed_and_store(kind=..., id=..., text=...) in their own try/except that
logs or re-raises real errors (so failures incrementing touched are not silently
ignored), and only call index_db.set_embedding_meta if all required lists ran
(or if you tracked successful touched items), referencing the functions/methods
store.list_claims, store.list_pages, store.list_sources, store.list_entities,
store.list_relations, store.list_evidence, store._embed_and_store, and
index_db.set_embedding_meta along with embedder.name/version/dim to locate and
update the logic.
In `@src/vouch/embeddings/rerank.py`:
- Line 49: The slice uses top_k directly which allows negative values to
silently drop tail elements; before the line that does "return reranked[:top_k]"
validate top_k (the parameter) and either clamp it to a non-negative int (e.g.
top_k = max(0, int(top_k))) or raise ValueError for negative inputs, and also
handle None (treat as "no limit") so the function that computes "reranked"
returns predictable results.
In `@src/vouch/embeddings/scorer.py`:
- Around line 57-59: The loop that reads JSONL lines currently does
json.loads(line) for every line (the block iterating "for line in f" producing
"row" and "q"), which will raise JSONDecodeError on blank lines; modify that
loop to skip empty or whitespace-only lines before calling json.loads (e.g.,
check if line.strip() is falsy and continue) so only non-empty lines are parsed
and processed into "row" and "q".
- Around line 54-71: The code silently accepts unknown metric names because
totals = {m: 0.0 for m in metrics} blindly initializes any string; instead
validate the requested metrics against the supported set (e.g.,
{"recall@k","mrr","ndcg"}) at the start of the function and raise a ValueError
listing unknown_metric(s) if any are present; then keep the existing per-query
loop that calls recall_at_k, mrr, ndcg and accumulates into totals, so only
validated metrics are initialized and averaged before returning.
In `@src/vouch/index_db.py`:
- Around line 185-192: Mypy is failing because it can't find numpy stubs for the
lazy imports in _vec_to_blob and _blob_to_vec; update the inline imports inside
those functions to silence mypy by adding a type ignore for untyped imports
(e.g., change the import lines in functions _vec_to_blob and _blob_to_vec to use
a "# type: ignore[import-untyped]" comment), or alternatively add the
types-numpy package to dev dependencies if you prefer proper stubs—apply the
chosen fix consistently to both functions.
- Around line 294-301: The SQL uses the SELECT alias "score" in the WHERE clause
which SQLite disallows, so update the query in the conn.execute call that builds
rows to either repeat the score expression in the WHERE (replace "score >= ?"
with "1.0 - vec_distance_cosine(vec, ?) >= ?") or wrap the SELECT as a subquery
that filters on score, then ORDER/LIMIT; keep the same parameter order
(q.tobytes(), *kinds, min_score, limit) and ensure variables q, kinds,
min_score, and limit are used unchanged so rows filtering works correctly and
sqlite-vec will be able to execute the predicate.
In `@src/vouch/jsonl_server.py`:
- Around line 84-105: Validate backend_arg and raise a ValueError for unknown
values instead of silently returning empty hits: after the existing branches
that handle "auto", "embedding", "fts5", "substring", and "hybrid" (which use
index_db.search_semantic, index_db.search, s.search_substring, and rrf_fuse) add
a check that backend_arg is one of the supported options and if not, raise
ValueError("unknown backend: {backend_arg}") (or similar) so callers receive a
deterministic invalid_request error; ensure this check runs before the function
returns and does not mask legitimate empty-hit results.
In `@src/vouch/server.py`:
- Around line 112-120: The current block around index_db.search swallows all
exceptions for both "auto" and "fts5", causing explicit backend="fts5" failures
to be treated as empty results; change the control flow so that you only catch
exceptions when backend == "auto" (and then set hits = [] to allow fallback),
but for backend == "fts5" let exceptions propagate (or re-raise) so callers see
the failure; locate the branch handling backend in server.py (the if backend in
("auto", "fts5") block), the call to index_db.search, and the conversion via
_to_dicts to implement this: perform the try/except only when backend == "auto",
or re-raise the caught exception when backend == "fts5".
- Around line 126-131: The hybrid branch currently ignores min_score; update the
backend=="hybrid" block to enforce the same threshold as embedding mode by
applying min_score to the fused results: call index_db.search_semantic and
index_db.search with the same min_score parameter if supported, then run
rrf_fuse(emb, fts, limit=limit) and filter the returned hits to only include
items with score >= min_score (or pass min_score into rrf_fuse if it accepts
it), and finally pass the filtered list to _to_dicts("hybrid"); reference the
hybrid block, index_db.search_semantic, index_db.search, rrf_fuse, and _to_dicts
to locate where to make the change.
---
Outside diff comments:
In `@src/vouch/jsonl_server.py`:
- Around line 74-109: _h_search currently returns a raw list of hit dicts but
MCP's kb_search returns an envelope {"backend": ..., "hits": [...]}; change
_h_search to return that envelope instead. Update the return value and type hint
of _h_search to return a dict with keys "backend" (the used variable) and "hits"
(the list comprehension currently produced), i.e. wrap the existing list
comprehension into {"backend": used, "hits": [...]} and keep existing logic that
sets used (references: _h_search, index_db.search_semantic, index_db.search,
s.search_substring, rrf_fuse). Also search for and update any callers that
expect a raw list to handle the new envelope shape.
---
Duplicate comments:
In `@src/vouch/embeddings/fastembed_bge.py`:
- Line 11: Move the module-level "import numpy as np" out of the top of
src/vouch/embeddings/fastembed_bge.py and perform a lazy/local import where
numpy is actually used (e.g., inside the function(s) that reference "np"); if
importing may fail, wrap the import in try/except and raise a clear RuntimeError
instructing to install numpy (similar to the fix applied in base.py), ensuring
you replace top-level "np" usages with the locally imported name so there are no
module-level numpy imports remaining.
In `@src/vouch/embeddings/st_minilm.py`:
- Line 11: The module-level "import numpy as np" in
src/vouch/embeddings/st_minilm.py should be removed and replaced with a
lazy/local import to avoid forcing a numpy dependency at import time; locate the
functions that use numpy (the module-level symbol "np" and any embedding
functions in st_minilm.py) and move "import numpy as np" inside those functions
(or wrap it in a try/except that raises a clear error if numpy is missing),
ensuring all references to "np" remain valid after the change.
In `@src/vouch/embeddings/st_mpnet.py`:
- Line 12: The module-level "import numpy as np" in st_mpnet.py causes the same
global dependency issue noted in base.py; remove the top-level import and
instead perform a local/lazy import of numpy inside the function(s) or method(s)
that actually use it (keep the symbol name "np" when importing), or wrap the
import in a try/except that raises a clear ImportError with guidance; update any
references in functions/classes in this file to rely on the local import so no
module-level numpy import remains.
---
Nitpick comments:
In `@src/vouch/embeddings/migration.py`:
- Line 34: The exposed parameter force on backfill_embeddings is unused and
misleading; remove the force parameter from the function signature (def
backfill_embeddings(store: Any) -> int) and update all call sites that pass
force to call backfill_embeddings without it, or alternatively if you prefer
gated behavior, implement the gate inside backfill_embeddings to skip
already-embedded records when force is False and only reprocess all records when
force is True—refer to the backfill_embeddings function name and any callers
invoking backfill_embeddings to make the change consistently.
In `@src/vouch/storage.py`:
- Around line 486-496: The call to embedder.encode(text) can raise runtime
errors (model load OOM, network/API failures) which currently propagate and
abort artifact writes; update the block around get_embedder()/embedder.encode()
so encode() is wrapped in a broad try/except that logs the error and returns
(i.e., swallow non-ImportError/KeyError exceptions to keep embeddings
best-effort) before opening the DB and calling _index_db.put_embedding;
reference embedder.encode, get_embedder, and _index_db.put_embedding when
locating where to add the try/except and logging.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 433efaa1-a846-447b-a74a-42f97d729ffe
📒 Files selected for processing (31)
docs/superpowers/plans/2026-05-20-semantic-search.mddocs/superpowers/specs/2026-05-20-semantic-search-design.mdpyproject.tomlsrc/vouch/embeddings/__init__.pysrc/vouch/embeddings/base.pysrc/vouch/embeddings/cache.pysrc/vouch/embeddings/dedup.pysrc/vouch/embeddings/fastembed_bge.pysrc/vouch/embeddings/fusion.pysrc/vouch/embeddings/hyde.pysrc/vouch/embeddings/migration.pysrc/vouch/embeddings/rerank.pysrc/vouch/embeddings/scorer.pysrc/vouch/embeddings/st_minilm.pysrc/vouch/embeddings/st_mpnet.pysrc/vouch/index_db.pysrc/vouch/jsonl_server.pysrc/vouch/server.pysrc/vouch/storage.pytests/embeddings/__init__.pytests/embeddings/_fakes.pytests/embeddings/test_core.pytests/embeddings/test_dedup.pytests/embeddings/test_fusion.pytests/embeddings/test_hyde.pytests/embeddings/test_integration.pytests/embeddings/test_migration.pytests/embeddings/test_rerank.pytests/embeddings/test_scorer.pytests/embeddings/test_search.pytests/embeddings/test_storage.py
|
|
||
| ## 3. New package layout | ||
|
|
||
| ``` |
There was a problem hiding this comment.
Add a language identifier to the fenced code block.
The code fence at this line is missing a language tag, which triggers markdownlint MD040.
🛠️ Suggested fix
-```
+```text
src/vouch/embeddings/
__init__.py # public API: encode, search, register
base.py # Embedder ABC + adapter registry
@@
migration.py # model-identity check + backfill orchestration</details>
<!-- suggestion_start -->
<details>
<summary>📝 Committable suggestion</summary>
> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
```suggestion
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 33-33: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/superpowers/specs/2026-05-20-semantic-search-design.md` at line 33, The
fenced code block that begins with the listing "src/vouch/embeddings/" is
missing a language tag; update the opening fence from ``` to ```text (i.e., add
the "text" language identifier) so the block is properly marked and resolves
markdownlint MD040.
| def _query_key(query: str) -> str: | ||
| return hashlib.sha256(query.encode("utf-8")).hexdigest() | ||
|
|
||
|
|
||
| def cache_query_vec( | ||
| kb_dir: Path, *, query: str, vec: np.ndarray, max_entries: int = 1024, | ||
| ) -> None: | ||
| h = _query_key(query) | ||
| blob = np.asarray(vec, dtype=np.float32).tobytes() |
There was a problem hiding this comment.
Include embedding model identity in cache keys.
Line 15 hashes only the raw query text. After embedding-model migration/backfill, the same query can return a cached vector from the previous model, causing stale scoring (and potentially dimension mismatch downstream). Scope the key by model identity (or clear query_embedding_cache during migration).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vouch/embeddings/cache.py` around lines 15 - 23, The cache currently
hashes only raw query text (in _query_key) which can return embeddings from a
different model; update the cache key to include the embedding model identity
(and optionally dimension) so keys are scoped by model. Concretely, change
_query_key to accept a model_id (or build the combined key inside
cache_query_vec by hashing f"{model_id}:{query}") and use that when computing h
in cache_query_vec (and update any callers of _query_key accordingly); ensure
the model identifier used is stable (e.g., model name/version) and consider
invalidating or migrating the existing query_embedding_cache if needed.
| if backend_arg in ("auto", "embedding"): | ||
| hits = index_db.search_semantic( | ||
| s.kb_dir, q, limit=limit, min_score=min_score, | ||
| ) | ||
| if hits: | ||
| used = "embedding" | ||
| if not hits and backend_arg in ("auto", "fts5"): | ||
| try: | ||
| hits = index_db.search(s.kb_dir, q, limit=limit) | ||
| used = "fts5" if hits else used | ||
| except Exception: | ||
| hits = [] | ||
| if not hits and backend_arg in ("auto", "substring"): | ||
| hits = s.search_substring(q, limit=limit) | ||
| backend = "substring" | ||
| used = "substring" | ||
| if backend_arg == "hybrid": | ||
| from .embeddings.fusion import rrf_fuse | ||
| emb = index_db.search_semantic(s.kb_dir, q, limit=limit * 2) | ||
| fts = index_db.search(s.kb_dir, q, limit=limit * 2) | ||
| hits = rrf_fuse(emb, fts, limit=limit) | ||
| used = "hybrid" | ||
|
|
There was a problem hiding this comment.
Reject unknown backend values explicitly.
Unlike MCP kb_search, this handler does not validate backend; unsupported values silently return empty results. Return a ValueError for unknown backends so clients get deterministic invalid_request responses.
Proposed fix
backend_arg = p.get("backend", "auto")
+ allowed = {"auto", "embedding", "fts5", "substring", "hybrid"}
+ if backend_arg not in allowed:
+ raise ValueError(f"unknown backend: {backend_arg}")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vouch/jsonl_server.py` around lines 84 - 105, Validate backend_arg and
raise a ValueError for unknown values instead of silently returning empty hits:
after the existing branches that handle "auto", "embedding", "fts5",
"substring", and "hybrid" (which use index_db.search_semantic, index_db.search,
s.search_substring, and rrf_fuse) add a check that backend_arg is one of the
supported options and if not, raise ValueError("unknown backend: {backend_arg}")
(or similar) so callers receive a deterministic invalid_request error; ensure
this check runs before the function returns and does not mask legitimate
empty-hit results.
| if backend in ("auto", "fts5"): | ||
| try: | ||
| hits = index_db.search(store.kb_dir, query, limit=limit) | ||
| except Exception: | ||
| hits = [] | ||
| if hits: | ||
| return _to_dicts(hits, "fts5") | ||
| if backend == "fts5": | ||
| return _to_dicts([], "fts5") |
There was a problem hiding this comment.
Don’t swallow FTS5 failures for explicit backend="fts5" requests.
Lines 115–116 currently hide all FTS errors and return empty hits, which turns backend failures into false negatives. Keep graceful fallback for auto, but propagate an error for explicit fts5.
Proposed fix
if backend in ("auto", "fts5"):
try:
hits = index_db.search(store.kb_dir, query, limit=limit)
- except Exception:
- hits = []
+ except Exception as e:
+ if backend == "fts5":
+ raise ValueError(f"fts5 search failed: {e}") from e
+ hits = []📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if backend in ("auto", "fts5"): | |
| try: | |
| hits = index_db.search(store.kb_dir, query, limit=limit) | |
| except Exception: | |
| hits = [] | |
| if hits: | |
| return _to_dicts(hits, "fts5") | |
| if backend == "fts5": | |
| return _to_dicts([], "fts5") | |
| if backend in ("auto", "fts5"): | |
| try: | |
| hits = index_db.search(store.kb_dir, query, limit=limit) | |
| except Exception as e: | |
| if backend == "fts5": | |
| raise ValueError(f"fts5 search failed: {e}") from e | |
| hits = [] | |
| if hits: | |
| return _to_dicts(hits, "fts5") | |
| if backend == "fts5": | |
| return _to_dicts([], "fts5") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vouch/server.py` around lines 112 - 120, The current block around
index_db.search swallows all exceptions for both "auto" and "fts5", causing
explicit backend="fts5" failures to be treated as empty results; change the
control flow so that you only catch exceptions when backend == "auto" (and then
set hits = [] to allow fallback), but for backend == "fts5" let exceptions
propagate (or re-raise) so callers see the failure; locate the branch handling
backend in server.py (the if backend in ("auto", "fts5") block), the call to
index_db.search, and the conversion via _to_dicts to implement this: perform the
try/except only when backend == "auto", or re-raise the caught exception when
backend == "fts5".
| if backend == "hybrid": | ||
| from .embeddings.fusion import rrf_fuse | ||
| emb = index_db.search_semantic(store.kb_dir, query, limit=limit * 2) | ||
| fts = index_db.search(store.kb_dir, query, limit=limit * 2) | ||
| hits = rrf_fuse(emb, fts, limit=limit) | ||
| return _to_dicts(hits, "hybrid") |
There was a problem hiding this comment.
Apply min_score in hybrid semantic leg.
Line 128 ignores min_score, so backend="hybrid" does not honor the same threshold semantics as embedding mode.
Proposed fix
- emb = index_db.search_semantic(store.kb_dir, query, limit=limit * 2)
+ emb = index_db.search_semantic(
+ store.kb_dir, query, limit=limit * 2, min_score=min_score
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if backend == "hybrid": | |
| from .embeddings.fusion import rrf_fuse | |
| emb = index_db.search_semantic(store.kb_dir, query, limit=limit * 2) | |
| fts = index_db.search(store.kb_dir, query, limit=limit * 2) | |
| hits = rrf_fuse(emb, fts, limit=limit) | |
| return _to_dicts(hits, "hybrid") | |
| if backend == "hybrid": | |
| from .embeddings.fusion import rrf_fuse | |
| emb = index_db.search_semantic( | |
| store.kb_dir, query, limit=limit * 2, min_score=min_score | |
| ) | |
| fts = index_db.search(store.kb_dir, query, limit=limit * 2) | |
| hits = rrf_fuse(emb, fts, limit=limit) | |
| return _to_dicts(hits, "hybrid") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vouch/server.py` around lines 126 - 131, The hybrid branch currently
ignores min_score; update the backend=="hybrid" block to enforce the same
threshold as embedding mode by applying min_score to the fused results: call
index_db.search_semantic and index_db.search with the same min_score parameter
if supported, then run rrf_fuse(emb, fts, limit=limit) and filter the returned
hits to only include items with score >= min_score (or pass min_score into
rrf_fuse if it accepts it), and finally pass the filtered list to
_to_dicts("hybrid"); reference the hybrid block, index_db.search_semantic,
index_db.search, rrf_fuse, and _to_dicts to locate where to make the change.
7c74231 to
30d64d7
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (11)
src/vouch/embeddings/rerank.py (1)
49-49:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReject negative
top_kfor predictable truncation behavior.Negative
top_kcurrently yields surprising output shape instead of a clear validation error.Proposed fix
def rerank( *, query: str, hits: list[Hit], reranker: Reranker, top_k: int = 50, ) -> list[Hit]: if not hits: return [] + if top_k < 0: + raise ValueError("top_k must be >= 0") candidates = [h[2] or h[1] for h in hits] scores = reranker.score(query, candidates)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/embeddings/rerank.py` at line 49, The slice return reranked[:top_k] should reject negative top_k to avoid surprising shapes; in the function that returns reranked (look for the function handling reranking that assigns the variable `reranked` and returns it), validate the `top_k` parameter and raise a ValueError if top_k is negative (e.g., `if top_k is not None and top_k < 0: raise ValueError("top_k must be non-negative")`), leaving the existing slicing behavior for None or non-negative values unchanged.src/vouch/jsonl_server.py (1)
79-107:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject unsupported backend values in JSONL search.
At Line 79, unknown
backendvalues still fall through and return an empty success payload instead of aninvalid_requesterror. This makes client behavior non-deterministic compared with MCP search handling.Proposed minimal fix
def _h_search(p: dict) -> list[dict]: @@ - backend_arg = p.get("backend", "auto") + backend_arg = p.get("backend", "auto") + allowed = {"auto", "embedding", "fts5", "substring", "hybrid"} + if backend_arg not in allowed: + raise ValueError(f"unknown backend: {backend_arg}")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/jsonl_server.py` around lines 79 - 107, Validate the incoming backend_arg early and reject any value not in the supported set ("auto", "embedding", "fts5", "substring", "hybrid") instead of falling through to an empty result; add a check at the top of the branch handling (where backend_arg and used are initialized) and return/raise an invalid_request error consistent with MCP search handling if the value is unsupported. Locate the validation near the variables backend_arg and used and ensure any caller-visible error follows the same error shape/mechanism used elsewhere in this module (e.g., the JSONL response error path), so search_semantic, search, search_substring and the hybrid rrf_fuse branch remain unchanged except for pre-validation of backend_arg.src/vouch/server.py (2)
126-132:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winHonor
min_scorein hybrid semantic candidate fetch.At Line 130, hybrid mode doesn’t pass
min_scoreto semantic search, so filtering behavior differs from embedding mode.Proposed fix
- emb = index_db.search_semantic(store.kb_dir, query, limit=limit * 2) + emb = index_db.search_semantic( + store.kb_dir, query, limit=limit * 2, min_score=min_score + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/server.py` around lines 126 - 132, Hybrid branch currently calls index_db.search_semantic without passing min_score so results are not filtered like embedding mode; update the call in the hybrid block (the index_db.search_semantic invocation that produces emb) to include the same min_score parameter used elsewhere (e.g., index_db.search(..., min_score=min_score)) so emb = index_db.search_semantic(store.kb_dir, query, limit=limit * 2, min_score=min_score), leaving the subsequent rrf_fuse(emb, fts, limit=limit) unchanged.
112-120:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDo not suppress FTS failures for explicit
backend="fts5".At Line 115, explicit FTS requests still convert backend failures into empty hits, masking real errors.
Proposed fix
if backend in ("auto", "fts5"): try: hits = index_db.search(store.kb_dir, query, limit=limit) - except Exception: - hits = [] + except Exception as e: + if backend == "fts5": + raise ValueError(f"fts5 search failed: {e}") from e + hits = []🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/server.py` around lines 112 - 120, The code currently swallows all exceptions from index_db.search regardless of whether backend is "auto" or "fts5", which masks real FTS errors for explicit backend="fts5"; change the control flow so that index_db.search exceptions are only suppressed for backend == "auto" (e.g., perform the try/except only in the auto branch and set hits=[] on error), while for backend == "fts5" call index_db.search without catching Exception so real errors propagate; adjust the subsequent logic that calls _to_dicts(hits, "fts5") accordingly.src/vouch/embeddings/fusion.py (1)
22-27:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winGuard invalid
kinrrf_fuse.At Line 27, callers can still trigger divide-by-zero when
k + rank == 0(e.g.,k=-1at first rank).Proposed fix
def rrf_fuse(a: Hits, b: Hits, *, limit: int = 10, k: int = 60) -> Hits: """Reciprocal Rank Fusion: score = sum(1 / (k + rank)) across lists.""" + if k <= -1: + raise ValueError("k must be >= 0") scores: dict[tuple[str, str], float] = {}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/embeddings/fusion.py` around lines 22 - 27, The rrf_fuse function can divide by zero if callers pass a negative k (e.g., k = -1 makes k + rank == 0 for rank==1); add an input guard at the top of rrf_fuse that validates k (for example require k >= 0) and raise a clear ValueError if the check fails so the loop using 1.0 / (k + rank) cannot encounter division-by-zero.src/vouch/index_db.py (1)
294-301:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix sqlite-vec SQL filter:
scorealias is not valid inWHERE.At Line 298, filtering on
score(a SELECT alias) causes sqlite-vec query failure, so this path degrades to fallback every time.Proposed fix
rows = conn.execute( f"SELECT kind, id, " - f" 1.0 - vec_distance_cosine(vec, ?) AS score " + f" (1.0 - vec_distance_cosine(vec, ?)) AS score " f"FROM embedding_index " - f"WHERE kind IN ({placeholders}) AND score >= ? " + f"WHERE kind IN ({placeholders}) " + f" AND (1.0 - vec_distance_cosine(vec, ?)) >= ? " f"ORDER BY score DESC LIMIT ?", - (q.tobytes(), *kinds, min_score, limit), + (q.tobytes(), *kinds, q.tobytes(), min_score, limit), ).fetchall()#!/bin/bash # Verify the problematic alias usage exists in query construction rg -n -C2 'AND score >= \?' src/vouch/index_db.py🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/index_db.py` around lines 294 - 301, The SQL uses the SELECT alias "score" inside WHERE which SQLite doesn't allow; update the query in the conn.execute call so the WHERE repeats the expression instead of referencing the alias (e.g. use "AND 1.0 - vec_distance_cosine(vec, ?) >= ?" while still selecting "1.0 - vec_distance_cosine(vec, ?) AS score"), and adjust the parameter tuple passed to conn.execute so q.tobytes() appears for each ? for the vector expression (i.e. pass q.tobytes() twice along with *kinds, min_score, limit) to keep parameter ordering consistent for the rows assignment in index_db.py.src/vouch/embeddings/scorer.py (2)
54-71:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject unknown metric names early.
Typos in
metricscurrently produce misleading zero outputs instead of a failure signal. Validate requested metric names before accumulation and raise on unknown entries.Suggested patch
def evaluate( @@ ) -> dict[str, float]: """Run a metric sweep over a JSONL queries file.""" from .. import index_db + valid = {"recall@k", "mrr", "ndcg"} + unknown = set(metrics) - valid + if unknown: + raise ValueError(f"Unknown metrics: {sorted(unknown)}") totals = {m: 0.0 for m in metrics}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/embeddings/scorer.py` around lines 54 - 71, Before iterating queries validate the requested metrics list: define the allowed metric names (e.g. "recall@k", "mrr", "ndcg"), compute unknown = set(metrics) - allowed, and if unknown is non-empty raise a ValueError listing the unknown metric names; keep the rest of the logic in scorer.py (the totals={} initialization and the loop using recall_at_k, mrr, ndcg and index_db.search_semantic) unchanged so typos no longer silently yield zeros.
56-59:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSkip blank lines in JSONL input.
Whitespace-only lines will raise
JSONDecodeErrorand abort the run. Skip empty lines before parsing.Suggested patch
with queries_file.open() as f: for line in f: + if not line.strip(): + continue row = json.loads(line)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/embeddings/scorer.py` around lines 56 - 59, The loop reading JSONL in scorer.py currently calls json.loads on every line and will raise JSONDecodeError on blank/whitespace-only lines; update the code in the block that opens queries_file (the for line in f: loop) to skip lines that are empty or contain only whitespace (e.g., check if line.strip() is falsy) before calling json.loads, so row = json.loads(line) and subsequent use of q = row["query"] only run for non-blank lines.src/vouch/embeddings/dedup.py (1)
76-77:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard pairwise similarity against mismatched vector shapes.
scan_allcan fail on mixed embedding dimensions; skip mismatched pairs before dot product so one bad pair doesn’t abort the entire audit.Suggested patch
- cos = float(vecs[k1] @ vecs[k2]) + if vecs[k1].shape != vecs[k2].shape: + continue + cos = float(vecs[k1] @ vecs[k2])🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/embeddings/dedup.py` around lines 76 - 77, scan_all can crash when embeddings have mixed dimensions because the code does an unguarded dot product (vecs[k1] @ vecs[k2]); update the pairwise loop in scan_all (or the function performing the check) to skip any pair where vecs[k1].shape != vecs[k2].shape (or lengths differ) before computing cos, and continue to the next pair instead of attempting the dot product, ensuring the rest of the audit runs even if a mismatched embedding is present; reference the variables vecs, k1, k2 and threshold when adding the guard and skip logic.src/vouch/embeddings/cache.py (1)
15-23:⚠️ Potential issue | 🟠 Major | ⚡ Quick winScope query-cache keys by embedding model identity.
The key currently depends only on query text, so cached vectors can bleed across model changes and return stale/wrong embeddings. Include a stable model identifier in key derivation (and optionally clear legacy entries once).
Suggested patch
-def _query_key(query: str) -> str: - return hashlib.sha256(query.encode("utf-8")).hexdigest() +def _query_key(query: str, model_id: str) -> str: + return hashlib.sha256(f"{model_id}:{query}".encode("utf-8")).hexdigest() def cache_query_vec( kb_dir: Path, *, query: str, vec: np.ndarray, max_entries: int = 1024, ) -> None: - h = _query_key(query) + from .base import get_embedder + model = get_embedder().name + h = _query_key(query, model) @@ def lookup_query_vec(kb_dir: Path, *, query: str) -> np.ndarray | None: - h = _query_key(query) + from .base import get_embedder + model = get_embedder().name + h = _query_key(query, model)Also applies to: 46-48
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/embeddings/cache.py` around lines 15 - 23, Include the embedding model identity in query cache keys to avoid cross-model cache pollution: change _query_key to accept a model identifier (e.g., model_id or model_name) and incorporate that identifier into the SHA256 input, update cache_query_vec to pass the active model id into _query_key, and update the corresponding cache read function (e.g., get_cached_query_vec or the lookup at the other location referenced) to use the same model-aware _query_key; optionally add a migration/cleanup step to invalidate or remove legacy keys lacking the model id.src/vouch/embeddings/migration.py (1)
38-83:⚠️ Potential issue | 🟠 Major | ⚡ Quick winNarrow
AttributeErrorhandling to missing list methods only.Catching
AttributeErroraround whole loops can hide real failures inside loop bodies, then metadata is still updated as if backfill succeeded. Resolve method presence up front (e.g.,getattr(..., None)+callable) and let per-item errors surface.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/embeddings/migration.py` around lines 38 - 83, The current code wraps entire loops in broad AttributeError handlers which can mask real errors; instead, for each collection, check method existence up front (e.g., list_claims = getattr(store, "list_claims", None); if callable(list_claims): for c in list_claims(): store._embed_and_store(...)) and remove the surrounding try/except blocks so per-item exceptions surface; do this for list_claims, list_pages, list_sources, list_entities, list_relations, and list_evidence, keep using store._embed_and_store for each item, and only call index_db.set_embedding_meta(store.kb_dir, model=embedder.name, version=embedder.version, dim=embedder.dim) after these conditional loops.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/vouch/embeddings/rerank.py`:
- Around line 43-47: The reranker.score return length isn't validated before
rebuilding reranked results, so mismatched counts between scores and
hits/candidates can silently drop or misalign items; before the list
comprehension that builds reranked (which references reranker.score, hits, and
candidates), check that len(scores) == len(hits) (or len(candidates)) and if
not, raise a clear exception (e.g., ValueError) containing the sizes and
identifiers (query/candidate count) so the contract violation is visible; only
proceed to build reranked when counts match.
In `@src/vouch/index_db.py`:
- Around line 311-315: The fallback path computes raw dot products (score = q @
vec) which breaks cosine semantics when stored vectors from _blob_to_vec are not
unit-normalized; update the scoring in the loop that builds scored (using
variables q, vec, min_score, scored) to compute cosine similarity by normalizing
vec (and q if not already unit) before taking the dot—i.e., compute vec_norm =
||vec|| and use (q·vec)/(||q||*vec_norm) or normalize vec in-place to unit and
then use q·vec, and only append to scored when that cosine score >= min_score.
In `@src/vouch/jsonl_server.py`:
- Around line 99-104: Hybrid mode currently obtains semantic candidates via
index_db.search_semantic without applying the min_score threshold, causing
inconsistent filtering versus backend="embedding"; update the hybrid branch
(where backend_arg == "hybrid" and emb is assigned from
index_db.search_semantic) to apply the same min_score used by the embedding
backend — either by passing min_score into index_db.search_semantic if supported
or by filtering the returned emb list to only include items with score >=
min_score before fusing with rrf_fuse (refer to variables/backend_arg, emb,
min_score, index_db.search_semantic, and rrf_fuse).
---
Duplicate comments:
In `@src/vouch/embeddings/cache.py`:
- Around line 15-23: Include the embedding model identity in query cache keys to
avoid cross-model cache pollution: change _query_key to accept a model
identifier (e.g., model_id or model_name) and incorporate that identifier into
the SHA256 input, update cache_query_vec to pass the active model id into
_query_key, and update the corresponding cache read function (e.g.,
get_cached_query_vec or the lookup at the other location referenced) to use the
same model-aware _query_key; optionally add a migration/cleanup step to
invalidate or remove legacy keys lacking the model id.
In `@src/vouch/embeddings/dedup.py`:
- Around line 76-77: scan_all can crash when embeddings have mixed dimensions
because the code does an unguarded dot product (vecs[k1] @ vecs[k2]); update the
pairwise loop in scan_all (or the function performing the check) to skip any
pair where vecs[k1].shape != vecs[k2].shape (or lengths differ) before computing
cos, and continue to the next pair instead of attempting the dot product,
ensuring the rest of the audit runs even if a mismatched embedding is present;
reference the variables vecs, k1, k2 and threshold when adding the guard and
skip logic.
In `@src/vouch/embeddings/fusion.py`:
- Around line 22-27: The rrf_fuse function can divide by zero if callers pass a
negative k (e.g., k = -1 makes k + rank == 0 for rank==1); add an input guard at
the top of rrf_fuse that validates k (for example require k >= 0) and raise a
clear ValueError if the check fails so the loop using 1.0 / (k + rank) cannot
encounter division-by-zero.
In `@src/vouch/embeddings/migration.py`:
- Around line 38-83: The current code wraps entire loops in broad AttributeError
handlers which can mask real errors; instead, for each collection, check method
existence up front (e.g., list_claims = getattr(store, "list_claims", None); if
callable(list_claims): for c in list_claims(): store._embed_and_store(...)) and
remove the surrounding try/except blocks so per-item exceptions surface; do this
for list_claims, list_pages, list_sources, list_entities, list_relations, and
list_evidence, keep using store._embed_and_store for each item, and only call
index_db.set_embedding_meta(store.kb_dir, model=embedder.name,
version=embedder.version, dim=embedder.dim) after these conditional loops.
In `@src/vouch/embeddings/rerank.py`:
- Line 49: The slice return reranked[:top_k] should reject negative top_k to
avoid surprising shapes; in the function that returns reranked (look for the
function handling reranking that assigns the variable `reranked` and returns
it), validate the `top_k` parameter and raise a ValueError if top_k is negative
(e.g., `if top_k is not None and top_k < 0: raise ValueError("top_k must be
non-negative")`), leaving the existing slicing behavior for None or non-negative
values unchanged.
In `@src/vouch/embeddings/scorer.py`:
- Around line 54-71: Before iterating queries validate the requested metrics
list: define the allowed metric names (e.g. "recall@k", "mrr", "ndcg"), compute
unknown = set(metrics) - allowed, and if unknown is non-empty raise a ValueError
listing the unknown metric names; keep the rest of the logic in scorer.py (the
totals={} initialization and the loop using recall_at_k, mrr, ndcg and
index_db.search_semantic) unchanged so typos no longer silently yield zeros.
- Around line 56-59: The loop reading JSONL in scorer.py currently calls
json.loads on every line and will raise JSONDecodeError on blank/whitespace-only
lines; update the code in the block that opens queries_file (the for line in f:
loop) to skip lines that are empty or contain only whitespace (e.g., check if
line.strip() is falsy) before calling json.loads, so row = json.loads(line) and
subsequent use of q = row["query"] only run for non-blank lines.
In `@src/vouch/index_db.py`:
- Around line 294-301: The SQL uses the SELECT alias "score" inside WHERE which
SQLite doesn't allow; update the query in the conn.execute call so the WHERE
repeats the expression instead of referencing the alias (e.g. use "AND 1.0 -
vec_distance_cosine(vec, ?) >= ?" while still selecting "1.0 -
vec_distance_cosine(vec, ?) AS score"), and adjust the parameter tuple passed to
conn.execute so q.tobytes() appears for each ? for the vector expression (i.e.
pass q.tobytes() twice along with *kinds, min_score, limit) to keep parameter
ordering consistent for the rows assignment in index_db.py.
In `@src/vouch/jsonl_server.py`:
- Around line 79-107: Validate the incoming backend_arg early and reject any
value not in the supported set ("auto", "embedding", "fts5", "substring",
"hybrid") instead of falling through to an empty result; add a check at the top
of the branch handling (where backend_arg and used are initialized) and
return/raise an invalid_request error consistent with MCP search handling if the
value is unsupported. Locate the validation near the variables backend_arg and
used and ensure any caller-visible error follows the same error shape/mechanism
used elsewhere in this module (e.g., the JSONL response error path), so
search_semantic, search, search_substring and the hybrid rrf_fuse branch remain
unchanged except for pre-validation of backend_arg.
In `@src/vouch/server.py`:
- Around line 126-132: Hybrid branch currently calls index_db.search_semantic
without passing min_score so results are not filtered like embedding mode;
update the call in the hybrid block (the index_db.search_semantic invocation
that produces emb) to include the same min_score parameter used elsewhere (e.g.,
index_db.search(..., min_score=min_score)) so emb =
index_db.search_semantic(store.kb_dir, query, limit=limit * 2,
min_score=min_score), leaving the subsequent rrf_fuse(emb, fts, limit=limit)
unchanged.
- Around line 112-120: The code currently swallows all exceptions from
index_db.search regardless of whether backend is "auto" or "fts5", which masks
real FTS errors for explicit backend="fts5"; change the control flow so that
index_db.search exceptions are only suppressed for backend == "auto" (e.g.,
perform the try/except only in the auto branch and set hits=[] on error), while
for backend == "fts5" call index_db.search without catching Exception so real
errors propagate; adjust the subsequent logic that calls _to_dicts(hits, "fts5")
accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 4dab7d36-73a7-4762-9709-3030db0a1c5e
📒 Files selected for processing (22)
pyproject.tomlsrc/vouch/embeddings/__init__.pysrc/vouch/embeddings/cache.pysrc/vouch/embeddings/dedup.pysrc/vouch/embeddings/fusion.pysrc/vouch/embeddings/hyde.pysrc/vouch/embeddings/migration.pysrc/vouch/embeddings/rerank.pysrc/vouch/embeddings/scorer.pysrc/vouch/index_db.pysrc/vouch/jsonl_server.pysrc/vouch/server.pysrc/vouch/storage.pytests/embeddings/conftest.pytests/embeddings/test_dedup.pytests/embeddings/test_fusion.pytests/embeddings/test_hyde.pytests/embeddings/test_migration.pytests/embeddings/test_rerank.pytests/embeddings/test_scorer.pytests/embeddings/test_search.pytests/embeddings/test_storage.py
| if backend_arg == "hybrid": | ||
| from .embeddings.fusion import ( | ||
| rrf_fuse, # type: ignore[import-not-found,import-untyped,unused-ignore] | ||
| ) | ||
| emb = index_db.search_semantic(s.kb_dir, q, limit=limit * 2) | ||
| fts = index_db.search(s.kb_dir, q, limit=limit * 2) |
There was a problem hiding this comment.
Apply min_score in hybrid semantic leg.
At Line 103, hybrid mode ignores min_score for semantic candidates, so threshold semantics differ from backend="embedding".
Proposed fix
- emb = index_db.search_semantic(s.kb_dir, q, limit=limit * 2)
+ emb = index_db.search_semantic(
+ s.kb_dir, q, limit=limit * 2, min_score=min_score
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if backend_arg == "hybrid": | |
| from .embeddings.fusion import ( | |
| rrf_fuse, # type: ignore[import-not-found,import-untyped,unused-ignore] | |
| ) | |
| emb = index_db.search_semantic(s.kb_dir, q, limit=limit * 2) | |
| fts = index_db.search(s.kb_dir, q, limit=limit * 2) | |
| if backend_arg == "hybrid": | |
| from .embeddings.fusion import ( | |
| rrf_fuse, # type: ignore[import-not-found,import-untyped,unused-ignore] | |
| ) | |
| emb = index_db.search_semantic( | |
| s.kb_dir, q, limit=limit * 2, min_score=min_score | |
| ) | |
| fts = index_db.search(s.kb_dir, q, limit=limit * 2) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vouch/jsonl_server.py` around lines 99 - 104, Hybrid mode currently
obtains semantic candidates via index_db.search_semantic without applying the
min_score threshold, causing inconsistent filtering versus backend="embedding";
update the hybrid branch (where backend_arg == "hybrid" and emb is assigned from
index_db.search_semantic) to apply the same min_score used by the embedding
backend — either by passing min_score into index_db.search_semantic if supported
or by filtering the returned emb list to only include items with score >=
min_score before fusing with rrf_fuse (refer to variables/backend_arg, emb,
min_score, index_db.search_semantic, and rrf_fuse).
1a428c7 to
23c4fcc
Compare
- r.relation.value used instead of plan's r.type.value (actual field name) - ev.quote used instead of plan's ev.excerpt (actual field name) - All list_* loops wrapped in try/except AttributeError per instructions
23c4fcc to
b4006a9
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (8)
src/vouch/jsonl_server.py (2)
99-103:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winApply
min_scorein hybrid semantic retrieval.Hybrid mode currently bypasses the threshold on the semantic leg, so filtering differs from
backend="embedding".Suggested fix
- emb = index_db.search_semantic(s.kb_dir, q, limit=limit * 2) + emb = index_db.search_semantic( + s.kb_dir, q, limit=limit * 2, min_score=min_score + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/jsonl_server.py` around lines 99 - 103, Hybrid mode currently ignores the semantic threshold: when backend_arg == "hybrid" filter the semantic search results returned by index_db.search_semantic using the same min_score used by embedding-only retrieval before calling rrf_fuse so the semantic leg respects the threshold; specifically locate the block with backend_arg, index_db.search_semantic, index_db.search and rrf_fuse and apply the min_score filter to the emb results (keeping the limit logic) or pass filtered emb into rrf_fuse so hits reflect the same min_score behavior as backend="embedding".
79-84:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject unknown
backendvalues explicitly.Unsupported
backendvalues currently fall through and return empty hits, which hides request errors. Validate allowed values and raiseValueError.Suggested fix
backend_arg = p.get("backend", "auto") + allowed = {"auto", "embedding", "fts5", "substring", "hybrid"} + if backend_arg not in allowed: + raise ValueError(f"unknown backend: {backend_arg}") min_score = float(p.get("min_score", 0.0))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/jsonl_server.py` around lines 79 - 84, The code lets unknown backend_arg values fall through silently; update the handling around backend_arg (variable backend_arg and used) so you explicitly validate allowed values (e.g., "auto", "embedding", and any other supported names used elsewhere) and raise a ValueError for unsupported values; implement this by converting the existing if backend_arg in ("auto", "embedding"): ... elif backend_arg == "X": ... pattern into an if/elif/else and in the else branch raise ValueError(f"Unsupported backend: {backend_arg}"), ensuring used is set only for supported branches and hits remains untouched otherwise.src/vouch/embeddings/dedup.py (1)
76-77:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard mixed-dimension vectors before dot-product.
Line 76 can raise on shape mismatch and abort the full duplicate scan when embeddings with different dimensions coexist. Skip mismatched pairs before computing cosine.
Suggested fix
- cos = float(vecs[k1] @ vecs[k2]) + if vecs[k1].shape != vecs[k2].shape: + continue + cos = float(vecs[k1] @ vecs[k2])🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/embeddings/dedup.py` around lines 76 - 77, Guard the dot-product between embeddings by skipping pairs with mismatched dimensions: before computing cos = float(vecs[k1] @ vecs[k2]) check that vecs[k1] and vecs[k2] have compatible shapes (e.g., same length / same .shape or both 1-D) and continue/skip the pair if they differ; update the loop in src/vouch/embeddings/dedup.py where vecs, k1, k2 and threshold are used to perform this shape check so a shape-mismatch cannot raise and abort the duplicate scan.src/vouch/embeddings/scorer.py (2)
56-59:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSkip empty JSONL lines before parsing.
Line 58 will fail on blank lines. Ignore whitespace-only lines before
json.loads.Suggested fix
with queries_file.open() as f: for line in f: + if not line.strip(): + continue row = json.loads(line)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/embeddings/scorer.py` around lines 56 - 59, The loop that reads queries_file in scorer.py calls json.loads on every line and will crash on blank/whitespace-only lines; update the loop inside the with block that processes each line (the code around queries_file.open() and the for line in f) to strip whitespace from line, skip/continue when the stripped line is empty, then call json.loads on the non-empty line to extract row and q = row["query"] so blank JSONL lines are ignored safely.
54-55:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject unknown metric names up front.
Line 54 currently accepts arbitrary metric strings and can return misleading zeros for typos. Validate against the supported metric set first.
Suggested fix
def evaluate( @@ ) -> dict[str, float]: """Run a metric sweep over a JSONL queries file.""" from .. import index_db + valid = {"recall@k", "mrr", "ndcg"} + unknown = set(metrics) - valid + if unknown: + raise ValueError(f"Unknown metrics: {sorted(unknown)}") totals = {m: 0.0 for m in metrics}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/embeddings/scorer.py` around lines 54 - 55, The code currently builds totals = {m: 0.0 for m in metrics} allowing arbitrary metric names and silently returning zeros for typos; before that initialization validate the incoming metrics list against the supported metric set (e.g., SUPPORTED_METRICS or supported_metrics in scorer.py) and raise a clear exception (ValueError) if any unknown metric is present so callers get immediate feedback rather than misleading results; update the validation near where totals and n are set and reference the metrics variable and the module's supported metric constant in the error message.src/vouch/embeddings/migration.py (1)
38-83:⚠️ Potential issue | 🟠 Major | ⚡ Quick winNarrow
AttributeErrorhandling in backfill loops.Catching
AttributeErroraround whole loops can swallow real per-item failures while Lines 81-83 still update embedding metadata. Restrict fallback to missinglist_*methods only.Suggested fix pattern
- try: - for c in store.list_claims(): - store._embed_and_store(kind="claim", id=c.id, text=c.text) - touched += 1 - except AttributeError: - pass + list_claims = getattr(store, "list_claims", None) + if callable(list_claims): + for c in list_claims(): + store._embed_and_store(kind="claim", id=c.id, text=c.text) + touched += 1Apply the same pattern to
list_pages,list_sources,list_entities,list_relations, andlist_evidence.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/embeddings/migration.py` around lines 38 - 83, The AttributeError handlers around entire backfill loops are too broad and can hide per-item failures; change each try/except so you only skip the whole loop when the store is missing the list_* method (e.g., check hasattr(store, "list_claims") or wrap only the call to store.list_claims() in a try/except AttributeError), then iterate and call store._embed_and_store normally so item-level exceptions surface; apply this pattern to list_pages, list_sources, list_entities, list_relations, and list_evidence, leaving index_db.set_embedding_meta and embedder.name/version/dim updates unchanged.src/vouch/embeddings/rerank.py (1)
43-49:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate reranker score count and
top_kbefore slicing.Line 46 currently permits silent truncation on score-count mismatch, and Line 49 allows negative
top_ksemantics. Enforce both contracts explicitly.Suggested fix
scores = reranker.score(query, candidates) + if len(scores) != len(candidates): + msg = f"Reranker returned {len(scores)} scores for {len(candidates)} candidates" + raise ValueError(msg) + if top_k < 0: + raise ValueError("top_k must be >= 0") reranked = [ (kind, id_, snip, score) - for (kind, id_, snip, _orig), score in zip(hits, scores, strict=False) + for (kind, id_, snip, _orig), score in zip(hits, scores, strict=True) ]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/embeddings/rerank.py` around lines 43 - 49, The code currently zips hits with scores and slices by top_k without validating inputs; update the rerank routine to (1) call reranker.score(query, candidates) and assert the returned scores length equals len(hits) (raise a clear ValueError referencing reranker.score and hits if mismatch), (2) validate that top_k is an int >= 0 (raise ValueError mentioning top_k) and treat None or 0 appropriately, and (3) when returning results, slice with min(top_k, len(reranked)) instead of allowing negative or out-of-range slicing; reference the variables and calls reranker.score, hits, scores, reranked, and top_k so the checks are added immediately before building/slicing reranked.src/vouch/embeddings/fusion.py (1)
22-27:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winValidate
kbefore reciprocal-rank scoring.Line 27 can divide by zero when
k <= -1. Add a parameter guard up front.Suggested fix
def rrf_fuse(a: Hits, b: Hits, *, limit: int = 10, k: int = 60) -> Hits: """Reciprocal Rank Fusion: score = sum(1 / (k + rank)) across lists.""" + if k <= -1: + raise ValueError("k must be greater than -1") scores: dict[tuple[str, str], float] = {}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/embeddings/fusion.py` around lines 22 - 27, r rf_fuse currently computes 1.0 / (k + rank) and can hit a zero or unexpected denominator; add an upfront guard in rrf_fuse that validates k (e.g., require k >= 0) and raise a clear ValueError if the check fails, mentioning the parameter k and the denominator (k + rank) so callers know why the input is invalid; locate this check at the top of rrf_fuse before the loop that uses k.
🧹 Nitpick comments (1)
tests/embeddings/test_storage.py (1)
138-153: ⚡ Quick win
test_search_works_under_both_backendsdoesn’t actually cover both paths.Right now it validates one search call only. Consider splitting into two tests (or parametrizing) that explicitly force each backend branch so regressions in either path are caught.
One way to make coverage explicit
-def test_search_works_under_both_backends(kb_dir: Path) -> None: +@pytest.mark.parametrize("force_vec_loaded", [True, False]) +def test_search_works_under_both_backends( + kb_dir: Path, monkeypatch: pytest.MonkeyPatch, force_vec_loaded: bool +) -> None: @@ - hits = index_db.search_embedding( + monkeypatch.setattr(index_db, "_load_sqlite_vec", lambda _conn: force_vec_loaded) + hits = index_db.search_embedding( kb_dir, query_vec=vecs[0], kinds=("claim",), limit=3, ) assert hits[0][1] == "c0" # exact self-match🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/embeddings/test_storage.py` around lines 138 - 153, Split or parametrize test_search_works_under_both_backends so it explicitly runs the search against both backend branches instead of a single call: create two subtests (or a pytest.mark.parametrize) that set the backend selection used by index_db.search_embedding (e.g., by patching whatever flag/function determines the backend or by invoking the internal backend-specific call directly), populate the DB via index_db.open_db and index_db.put_embedding as before, run index_db.search_embedding for each backend variant, and assert the same expectations (hits[0][1] == "c0") for both runs; reference the existing test_search_works_under_both_backends function and index_db.search_embedding/index_db.open_db/index_db.put_embedding when implementing the changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/vouch/index_db.py`:
- Around line 309-319: In search_embedding (the non-sqlite-vec fallback loop)
you must skip rows whose stored vector dimensionality doesn't match the query
vector q to avoid NumPy matmul errors: after converting blob -> vec via
_blob_to_vec and before computing score = float(q @ vec), compare dim (the
stored dimension) with q.shape[0] (or len(q)) and continue/skip if they differ;
optionally record or surface the mismatch using existing helpers like
embedding_meta/detect_mismatch/backfill_embeddings instead of scoring. Ensure
the check is applied to the loop that builds scored (variables: kind, id_, blob,
dim, q, min_score, _blob_to_vec) so only matching-dimension vectors are scored
and appended.
In `@src/vouch/storage.py`:
- Around line 486-509: The embedding/dedup steps after get_embedder()
(embedder.encode, _index_db.put_embedding, _index_db.set_embedding_meta, and
check_and_log) must be best-effort and must not raise back to callers like
put_source/put_claim/put_page; wrap the entire sequence (from vec =
embedder.encode(text) through the dedup check_and_log call) in a broad
try/except that catches Exception, logs the failure with context
(embedder.name/version, kind, id, and error), and otherwise continues, so
transient embedding or dedup errors don't abort the successful file write.
In `@tests/embeddings/test_search.py`:
- Line 113: The test currently asserts backend membership only in ("embedding",
"fts5", "substring") which fails when backend == "hybrid"; update the assertions
in tests/embeddings/test_search.py that check resp["result"][0]["backend"] (and
the similar assertion at the other occurrence) to include "hybrid" in the
allowed tuple so the test accepts hybrid as a valid backend value.
---
Duplicate comments:
In `@src/vouch/embeddings/dedup.py`:
- Around line 76-77: Guard the dot-product between embeddings by skipping pairs
with mismatched dimensions: before computing cos = float(vecs[k1] @ vecs[k2])
check that vecs[k1] and vecs[k2] have compatible shapes (e.g., same length /
same .shape or both 1-D) and continue/skip the pair if they differ; update the
loop in src/vouch/embeddings/dedup.py where vecs, k1, k2 and threshold are used
to perform this shape check so a shape-mismatch cannot raise and abort the
duplicate scan.
In `@src/vouch/embeddings/fusion.py`:
- Around line 22-27: r rf_fuse currently computes 1.0 / (k + rank) and can hit a
zero or unexpected denominator; add an upfront guard in rrf_fuse that validates
k (e.g., require k >= 0) and raise a clear ValueError if the check fails,
mentioning the parameter k and the denominator (k + rank) so callers know why
the input is invalid; locate this check at the top of rrf_fuse before the loop
that uses k.
In `@src/vouch/embeddings/migration.py`:
- Around line 38-83: The AttributeError handlers around entire backfill loops
are too broad and can hide per-item failures; change each try/except so you only
skip the whole loop when the store is missing the list_* method (e.g., check
hasattr(store, "list_claims") or wrap only the call to store.list_claims() in a
try/except AttributeError), then iterate and call store._embed_and_store
normally so item-level exceptions surface; apply this pattern to list_pages,
list_sources, list_entities, list_relations, and list_evidence, leaving
index_db.set_embedding_meta and embedder.name/version/dim updates unchanged.
In `@src/vouch/embeddings/rerank.py`:
- Around line 43-49: The code currently zips hits with scores and slices by
top_k without validating inputs; update the rerank routine to (1) call
reranker.score(query, candidates) and assert the returned scores length equals
len(hits) (raise a clear ValueError referencing reranker.score and hits if
mismatch), (2) validate that top_k is an int >= 0 (raise ValueError mentioning
top_k) and treat None or 0 appropriately, and (3) when returning results, slice
with min(top_k, len(reranked)) instead of allowing negative or out-of-range
slicing; reference the variables and calls reranker.score, hits, scores,
reranked, and top_k so the checks are added immediately before building/slicing
reranked.
In `@src/vouch/embeddings/scorer.py`:
- Around line 56-59: The loop that reads queries_file in scorer.py calls
json.loads on every line and will crash on blank/whitespace-only lines; update
the loop inside the with block that processes each line (the code around
queries_file.open() and the for line in f) to strip whitespace from line,
skip/continue when the stripped line is empty, then call json.loads on the
non-empty line to extract row and q = row["query"] so blank JSONL lines are
ignored safely.
- Around line 54-55: The code currently builds totals = {m: 0.0 for m in
metrics} allowing arbitrary metric names and silently returning zeros for typos;
before that initialization validate the incoming metrics list against the
supported metric set (e.g., SUPPORTED_METRICS or supported_metrics in scorer.py)
and raise a clear exception (ValueError) if any unknown metric is present so
callers get immediate feedback rather than misleading results; update the
validation near where totals and n are set and reference the metrics variable
and the module's supported metric constant in the error message.
In `@src/vouch/jsonl_server.py`:
- Around line 99-103: Hybrid mode currently ignores the semantic threshold: when
backend_arg == "hybrid" filter the semantic search results returned by
index_db.search_semantic using the same min_score used by embedding-only
retrieval before calling rrf_fuse so the semantic leg respects the threshold;
specifically locate the block with backend_arg, index_db.search_semantic,
index_db.search and rrf_fuse and apply the min_score filter to the emb results
(keeping the limit logic) or pass filtered emb into rrf_fuse so hits reflect the
same min_score behavior as backend="embedding".
- Around line 79-84: The code lets unknown backend_arg values fall through
silently; update the handling around backend_arg (variable backend_arg and used)
so you explicitly validate allowed values (e.g., "auto", "embedding", and any
other supported names used elsewhere) and raise a ValueError for unsupported
values; implement this by converting the existing if backend_arg in ("auto",
"embedding"): ... elif backend_arg == "X": ... pattern into an if/elif/else and
in the else branch raise ValueError(f"Unsupported backend: {backend_arg}"),
ensuring used is set only for supported branches and hits remains untouched
otherwise.
---
Nitpick comments:
In `@tests/embeddings/test_storage.py`:
- Around line 138-153: Split or parametrize
test_search_works_under_both_backends so it explicitly runs the search against
both backend branches instead of a single call: create two subtests (or a
pytest.mark.parametrize) that set the backend selection used by
index_db.search_embedding (e.g., by patching whatever flag/function determines
the backend or by invoking the internal backend-specific call directly),
populate the DB via index_db.open_db and index_db.put_embedding as before, run
index_db.search_embedding for each backend variant, and assert the same
expectations (hits[0][1] == "c0") for both runs; reference the existing
test_search_works_under_both_backends function and
index_db.search_embedding/index_db.open_db/index_db.put_embedding when
implementing the changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 296d22aa-86ae-4d04-88d5-ff771f17778b
📒 Files selected for processing (20)
src/vouch/embeddings/__init__.pysrc/vouch/embeddings/cache.pysrc/vouch/embeddings/dedup.pysrc/vouch/embeddings/fusion.pysrc/vouch/embeddings/hyde.pysrc/vouch/embeddings/migration.pysrc/vouch/embeddings/rerank.pysrc/vouch/embeddings/scorer.pysrc/vouch/index_db.pysrc/vouch/jsonl_server.pysrc/vouch/server.pysrc/vouch/storage.pytests/embeddings/test_dedup.pytests/embeddings/test_fusion.pytests/embeddings/test_hyde.pytests/embeddings/test_migration.pytests/embeddings/test_rerank.pytests/embeddings/test_scorer.pytests/embeddings/test_search.pytests/embeddings/test_storage.py
✅ Files skipped from review due to trivial changes (2)
- src/vouch/embeddings/init.py
- src/vouch/server.py
| rows = conn.execute( | ||
| f"SELECT kind, id, vec, dim FROM embedding_index " | ||
| f"WHERE kind IN ({placeholders})", | ||
| kinds, | ||
| ).fetchall() | ||
| scored: list[tuple[str, str, str, float]] = [] | ||
| for kind, id_, blob, dim in rows: | ||
| vec = _blob_to_vec(blob, dim) | ||
| score = float(q @ vec) | ||
| if score >= min_score: | ||
| scored.append((kind, id_, "", score)) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate target file
ls -la src/vouch/index_db.py
# Show the relevant region with line numbers
nl -ba src/vouch/index_db.py | sed -n '250,360p'
# Show helper(s) and any uses/definition of _blob_to_vec and dim handling
rg -n "_blob_to_vec|embedding_index|dim" src/vouch/index_db.py
# Search whole repo for embedding_index schema / migrations
rg -n "CREATE TABLE embedding_index|embedding_index\s*\(" -S .
# Identify how dim mismatch is handled elsewhere (if any)
rg -n "dim\s*!=(|!=)|q @ vec|vec_distance_cosine|tobytes\(\)" -S src/vouchRepository: vouchdev/vouch
Length of output: 191
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/vouch/index_db.py"
# Print target section with line numbers (via python)
python3 - <<'PY'
import itertools
path="src/vouch/index_db.py"
start=250
end=360
with open(path,'r',encoding='utf-8') as f:
for i,line in enumerate(f, start=1):
if start <= i <= end:
print(f"{i:5d}\t{line.rstrip()}")
PY
# Targeted searches
rg -n "_blob_to_vec|q\s*@\s*vec|vec_distance_cosine|embedding_index|SELECT kind, id, vec, dim" "$FILE" -SRepository: vouchdev/vouch
Length of output: 4754
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/vouch/index_db.py"
python3 - <<'PY'
path="src/vouch/index_db.py"
ranges=[(1,120),(150,260)]
for start,end in ranges:
print(f"\n# --- lines {start}-{end} ---")
with open(path,'r',encoding='utf-8') as f:
for i,line in enumerate(f, start=1):
if start<=i<=end:
print(f"{i:5d}\t{line.rstrip()}")
PY
rg -n "CREATE TABLE IF NOT EXISTS embedding_index|INSERT OR REPLACE INTO embedding_index|model|dim\s*," "$FILE" -SRepository: vouchdev/vouch
Length of output: 9850
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Search for dim/embedding_dim usage across the repo
rg -n "embedding_dim|embedding_model|embedding_model_version|\bdim\b" src -S
# Specifically trace query vector generation and whether its dim is checked against DB meta
rg -n "query_embedding_cache|lookup_query_vec|cache_query_vec|get_embedder|encode\(" src/vouch -S
# Find all calls to search_embedding
rg -n "search_embedding\(" -S src/vouchRepository: vouchdev/vouch
Length of output: 7188
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE_MIG="src/vouch/embeddings/migration.py"
FILE_STORAGE="src/vouch/storage.py"
FILE_DEDUP="src/vouch/embeddings/dedup.py"
python3 - <<'PY'
import pathlib
files=["src/vouch/embeddings/migration.py","src/vouch/storage.py","src/vouch/embeddings/dedup.py"]
for f in files:
p=pathlib.Path(f)
print(f"\n==== {f} ({p.stat().st_size} bytes) ====")
PY
# Print key regions
python3 - <<'PY'
def show(path, start, end):
with open(path,'r',encoding='utf-8') as f:
for i,line in enumerate(f, start=1):
if start<=i<=end:
print(f"{i:5d}\t{line.rstrip()}")
show("src/vouch/embeddings/migration.py",1,220)
PY
python3 - <<'PY'
# storage: look for where embeddings are persisted and dim/meta validated
show=lambda path,start,end: __import__("builtins").exec
PYRepository: vouchdev/vouch
Length of output: 3248
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python3 - <<'PY'
import pathlib
def show(path,start,end):
with open(path,'r',encoding='utf-8') as f:
for i,line in enumerate(f, start=1):
if start<=i<=end:
print(f"{i:5d}\t{line.rstrip()}")
for f in ["src/vouch/embeddings/migration.py","src/vouch/storage.py"]:
p=pathlib.Path(f)
print(f"\n==== {f} ({p.stat().st_size} bytes) ==== ")
PY
python3 - <<'PY'
def show(path,start,end):
with open(path,'r',encoding='utf-8') as f:
for i,line in enumerate(f, start=1):
if start<=i<=end:
print(f"{i:5d}\t{line.rstrip()}")
print("\n--- migration.py 1-160 ---")
show("src/vouch/embeddings/migration.py",1,160)
print("\n--- migration.py 160-320 ---")
show("src/vouch/embeddings/migration.py",160,320)
print("\n--- storage.py 430-560 ---")
show("src/vouch/storage.py",430,560)
print("\n--- storage.py 560-720 ---")
show("src/vouch/storage.py",560,720)
PY
# Find where mismatch detection/backfill is called
rg -n "detect_mismatch|backfill_embeddings|backfill|embedding_meta\(" src/vouch -SRepository: vouchdev/vouch
Length of output: 11113
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "detect_mismatch\(|backfill_embeddings\(" -S src
rg -n "embedding_dim|embedding_model" -S src/vouch | head -n 50Repository: vouchdev/vouch
Length of output: 745
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "embeddings\.migration|backfill_embeddings|detect_mismatch" -S srcRepository: vouchdev/vouch
Length of output: 260
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "backfill_embeddings|detect_mismatch|embedding_dim|embedding_model_version" -S .Repository: vouchdev/vouch
Length of output: 4081
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE_CACHE="src/vouch/embeddings/cache.py"
FILE_DEDUP="src/vouch/embeddings/dedup.py"
FILE_MIGRATION="src/vouch/embeddings/migration.py"
python3 - <<'PY'
import pathlib
def show(path,start,end):
with open(path,'r',encoding='utf-8') as f:
for i,line in enumerate(f, start=1):
if start<=i<=end:
print(f"{i:5d}\t{line.rstrip()}")
print("\n--- cache.py 1-140 ---")
show("src/vouch/embeddings/cache.py",1,140)
print("\n--- dedup.py 1-140 ---")
show("src/vouch/embeddings/dedup.py",1,140)
print("\n--- migration.py 1-140 ---")
show("src/vouch/embeddings/migration.py",1,140)
PY
# Search for runtime use of migration/backfill (not docs/tests)
rg -n "from \.embeddings\.migration import|embeddings\.migration|detect_mismatch\(|backfill_embeddings\(" -S src testsRepository: vouchdev/vouch
Length of output: 10353
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1,120p' src/vouch/embeddings/base.pyRepository: vouchdev/vouch
Length of output: 1943
Filter by embedding dimension before scoring in search_embedding.
search_embedding’s non-sqlite-vec fallback loop computes score = float(q @ vec) for every row selected, but it never checks that the row’s dim matches q’s dimension. If the DB contains vectors from a different embedding adapter (stale/partially backfilled KB), NumPy matmul will raise and abort the entire search. There’s also no runtime guard in search_embedding that uses embedding_meta/detect_mismatch/backfill_embeddings.
Suggested fix
def search_embedding(
kb_dir: Path,
@@
) -> list[tuple[str, str, str, float]]:
import numpy as np
q = np.asarray(query_vec, dtype=np.float32)
+ qdim = int(q.size)
qnorm = float(np.linalg.norm(q))
if qnorm > 0:
q = q / qnorm
@@
rows = conn.execute(
f"SELECT kind, id, score FROM ("
f" SELECT kind, id, 1.0 - vec_distance_cosine(vec, ?) AS score "
f" FROM embedding_index "
- f" WHERE kind IN ({placeholders})"
+ f" WHERE kind IN ({placeholders}) AND dim = ?"
f") WHERE score >= ? "
f"ORDER BY score DESC LIMIT ?",
- (q.tobytes(), *kinds, min_score, limit),
+ (q.tobytes(), *kinds, qdim, min_score, limit),
).fetchall()
return [(k, i, "", float(s)) for k, i, s in rows]
except sqlite3.OperationalError:
pass
rows = conn.execute(
f"SELECT kind, id, vec, dim FROM embedding_index "
- f"WHERE kind IN ({placeholders})",
- kinds,
+ f"WHERE kind IN ({placeholders}) AND dim = ?",
+ (*kinds, qdim),
).fetchall()
scored: list[tuple[str, str, str, float]] = []
for kind, id_, blob, dim in rows:
+ if dim != qdim:
+ continue
vec = _blob_to_vec(blob, dim)
score = float(q @ vec)
if score >= min_score:
scored.append((kind, id_, "", score))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vouch/index_db.py` around lines 309 - 319, In search_embedding (the
non-sqlite-vec fallback loop) you must skip rows whose stored vector
dimensionality doesn't match the query vector q to avoid NumPy matmul errors:
after converting blob -> vec via _blob_to_vec and before computing score =
float(q @ vec), compare dim (the stored dimension) with q.shape[0] (or len(q))
and continue/skip if they differ; optionally record or surface the mismatch
using existing helpers like embedding_meta/detect_mismatch/backfill_embeddings
instead of scoring. Ensure the check is applied to the loop that builds scored
(variables: kind, id_, blob, dim, q, min_score, _blob_to_vec) so only
matching-dimension vectors are scored and appended.
| try: | ||
| embedder = get_embedder() | ||
| except KeyError: | ||
| return | ||
| vec = embedder.encode(text) | ||
| with _index_db.open_db(self.kb_dir) as conn: | ||
| _index_db.put_embedding( | ||
| conn, kind=kind, id=id, vec=vec, content_hash=h, | ||
| model=embedder.name, model_version=embedder.version, | ||
| dim=embedder.dim, | ||
| ) | ||
| _index_db.set_embedding_meta( | ||
| self.kb_dir, model=embedder.name, | ||
| version=embedder.version, dim=embedder.dim, | ||
| ) | ||
| try: | ||
| # NB: dedup module is added in a later phase; ignore missing-stub | ||
| # / missing-module noise in CI's [dev]-only mypy run. | ||
| from .embeddings.dedup import ( # type: ignore[import-not-found,import-untyped,unused-ignore] | ||
| check_and_log, | ||
| ) | ||
| check_and_log(self.kb_dir, kind=kind, id=id, vec=vec) | ||
| except ImportError: | ||
| pass |
There was a problem hiding this comment.
Keep the embedding hook best-effort all the way through.
The docstring says this path is optional, but Lines 490-507 still let encode(), put_embedding(), set_embedding_meta(), and check_and_log() abort the caller. That means a transient embedding/dedup failure can break put_source, put_claim, put_page, etc., even though the file write already succeeded.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vouch/storage.py` around lines 486 - 509, The embedding/dedup steps after
get_embedder() (embedder.encode, _index_db.put_embedding,
_index_db.set_embedding_meta, and check_and_log) must be best-effort and must
not raise back to callers like put_source/put_claim/put_page; wrap the entire
sequence (from vec = embedder.encode(text) through the dedup check_and_log call)
in a broad try/except that catches Exception, logs the failure with context
(embedder.name/version, kind, id, and error), and otherwise continues, so
transient embedding or dedup errors don't abort the successful file write.
| }) | ||
| assert resp["ok"] is True | ||
| assert resp["result"] | ||
| assert resp["result"][0]["backend"] in ("embedding", "fts5", "substring") |
There was a problem hiding this comment.
Allow hybrid in backend assertions.
These assertions can reject valid behavior now that backend selection includes "hybrid".
Suggested patch
- assert resp["result"][0]["backend"] in ("embedding", "fts5", "substring")
+ assert resp["result"][0]["backend"] in ("embedding", "fts5", "substring", "hybrid")
@@
- assert result["backend"] in ("embedding", "fts5", "substring")
+ assert result["backend"] in ("embedding", "fts5", "substring", "hybrid")Also applies to: 127-127
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/embeddings/test_search.py` at line 113, The test currently asserts
backend membership only in ("embedding", "fts5", "substring") which fails when
backend == "hybrid"; update the assertions in tests/embeddings/test_search.py
that check resp["result"][0]["backend"] (and the similar assertion at the other
occurrence) to include "hybrid" in the allowed tuple so the test accepts hybrid
as a valid backend value.
|
Really clean phase breakdown — modules are well-isolated and the test Two questions outside the diff:
Happy to re-review once the migration cleanup lands. 🙏 |
…eview) a reranker returning the wrong number of scores used to silently drop or misalign results via zip(strict=False). now raises ValueError on a count mismatch and on negative top_k instead of producing a corrupted ranking.
skip blank lines in the queries JSONL instead of crashing on JSONDecodeError, and reject unknown metric names up front rather than silently returning 0.0 for a typo.
…v#42 review) the per-kind try/except AttributeError also swallowed errors raised inside the loop body, then set_embedding_meta still ran as if migration succeeded. now only missing list-methods are skipped (via getattr) and any real error propagates. also drops the unused force parameter.
…dev#42 review) after a model migration the embedding_index can hold vectors of different dims for the same kind; the dot product would raise and abort the whole scan. skip incomparable shape pairs instead.
a reranker returning the wrong number of scores used to silently drop or misalign results via zip(strict=False). now raises ValueError on a count mismatch and on negative top_k instead of producing a corrupted ranking.
skip blank lines in the queries JSONL instead of crashing on JSONDecodeError, and reject unknown metric names up front rather than silently returning 0.0 for a typo.
…iew) the per-kind try/except AttributeError also swallowed errors raised inside the loop body, then set_embedding_meta still ran as if migration succeeded. now only missing list-methods are skipped (via getattr) and any real error propagates. also drops the unused force parameter.
…eview) after a model migration the embedding_index can hold vectors of different dims for the same kind; the dot product would raise and abort the whole scan. skip incomparable shape pairs instead.
…pabilities feat(embeddings): rerank, HyDE, dedup, scorer, model-identity migration
Summary
Phase 6 layers the "smarter" retrieval and operational capabilities on top of the basic semantic backend:
embedding_modelinindex_metadiffers from the registered default, exposes a backfill pathTracks Tasks 18-22 in the plan.
Commits
4befafbembeddings/rerank.pyRerankerABC +CrossEncoderReranker(sentence-transformersms-marco-MiniLM-L6-v2) +rerank(query, hits, reranker, top_k)e9d2dceembeddings/hyde.pyexpand_query_template(query, min_chars=20)(pads short queries with a "documents about X" preamble) +expand_query_with_llm(query, llm)(optional LLM-backed expansion via caller-supplied callable)ae73191embeddings/dedup.pycheck_and_log(called from_embed_and_storesince Phase 3) +list_duplicates+scan_all; logs toembedding_dupestablee06b69dembeddings/scorer.pyrecall_at_k/mrr/ndcgmetric fns +evaluate(queries_file, k, metrics)runnere5d220aembeddings/migration.pydetect_mismatch(kb_dir)returns mismatch info or None;backfill_embeddings(store)re-encodes every artifact under the current adapter743e492style:cleanup for B905, F401, I001Design notes
Lazy heavy deps.
CrossEncoderReranker.__init__defersfrom sentence_transformers import CrossEncoderso the base install stays light. Same lazy-load pattern as the embedding adapters in Phase 1.HyDE is template-only by default — no LLM dependency, just a preamble template applied to short queries. The
expand_query_with_llmvariant takes a caller-suppliedCallable[[str], str]so vouch doesn't pin a specific LLM client.Dedup hook was already wired in Phase 3.
KBStore._embed_and_storecallsembeddings.dedup.check_and_loginside atry/except ImportError. With this PR the import resolves and dedup runs on every write.Migration
backfill_embeddingsiteratesstore.list_claims(),list_pages(),list_sources(),list_entities(),list_relations(),list_evidence()(guarded withtry/except AttributeErrorper kind in case a future codebase variant drops a list method).Schema / API deviations from plan body
Relation.type→Relation.relation(enum field;.relation.valuefor string)Evidence.excerpt→Evidence.quote_HashEmbedder(sha256 bytes scaled into[0,1]) instead ofMockEmbedderfrom_fakes.py. Reason: under Python 3.14 + recent numpy, the existingMockEmbedder's raw-float32 unpacking can produceinf/nanvalues, collapsing the unit-normalized vector to zero and breaking the ≥0.95 cosine assertion._HashEmbedderproduces valid non-zero vectors. The production adapters are unaffected.Test Plan
.venv/bin/pytest tests/embeddings -v→ 50 passed, 4 deselected (+16 new tests across the 5 new modules).venv/bin/python -m ruff check src/vouch/embeddings tests/embeddings→ cleanWhat's NOT in this PR
vouch eval embedding,vouch dedup,vouch reindex --embeddings,kb_dedup_scan, etc.build_context_packstill uses the old retrieval path — Phase 7.Summary by CodeRabbit
New Features
Tests