Motivation
kb.search (and build_context_pack, which fans out from it) currently has two backends:
fts5 — BM25 lexical match via src/vouch/index_db.py
substring — fallback scanner in KBStore.search_substring
Both are token-overlap-based. An agent searching for "how do we authenticate users?" gets nothing if the claim text reads "login flow uses session cookies signed by the API" — the concepts overlap, the tokens don't. This is the most common retrieval miss in agent workflows that compose claims into context packs.
Adding an embedding backend would close that gap while leaving the existing FTS5/substring chain intact as a precision-mode complement.
Proposed approach
Introduce a third backend slot, parallel to FTS5/substring:
src/vouch/embeddings.py (new) — Lazy-loaded local model (sentence-transformers/all-MiniLM-L6-v2 or fastembed's BAAI/bge-small-en-v1.5). One encode(texts: list[str]) -> np.ndarray entrypoint, batched.
src/vouch/index_db.py — Add an embeddings(kind, id, vec BLOB, dim INT) table. Two implementation options:
- MVP: pure NumPy cosine over rows loaded into memory. Simple, no extra deps, fine for KBs under ~10k claims.
- Scale-up:
sqlite-vec extension for ANN. Defer until someone hits the NumPy ceiling.
- Indexing hooks —
KBStore.put_claim / put_source / put_page compute and store the embedding on write. Backfill via vouch index --rebuild.
- Search dispatch —
kb.search accepts backend: "fts5" | "substring" | "embedding" | "hybrid". Hybrid = reciprocal rank fusion of FTS5 + embedding results. Default stays fts5 so existing callers don't shift.
- Optional dep — Ship under
pip install vouch[embeddings] so the base install stays lean. CI matrix runs both with and without the extra.
Scope
In scope (this issue):
embeddings.py module
embeddings table + reads/writes in index_db.py
put_claim / put_source / put_page indexing hooks
backend="embedding" and backend="hybrid" paths in MCP + JSONL search handlers
vouch index --rebuild regenerates embeddings
- Regression test: a claim that's semantically related but lexically disjoint from a query is retrievable
- Optional dep wiring in
pyproject.toml
Out of scope (follow-ups):
- Multi-model support / pluggable backends — single hardcoded model for now
- ANN index (
sqlite-vec, FAISS, etc.) — NumPy brute force is enough until proven otherwise
- Embedding cache invalidation on claim update — start with insert-only; address when
update_claim lands
- Cross-lingual or domain-finetuned models
Open questions
- Default backend for
build_context_pack — leave as FTS5, or default to hybrid once embeddings are available? Argument for hybrid: that's where agents live. Argument for FTS5: it's a behavior change for everyone who pulls.
- Model identity in
state.db — should the embeddings table record the model name + version so a mismatch on next read triggers a re-index? Yes, almost certainly — saves a footgun later.
- Where does the model cache live? —
~/.cache/vouch/models/ vs .vouch/models/. The former is cross-project (good for multiple KBs), the latter keeps the KB self-contained (matches the "files are source of truth" principle).
- Embedding-as-citation — if two claims have ≥0.95 cosine similarity at ingest time, do we surface a "possible duplicate" warning at
put_claim? Defer, but worth noting it's cheap to add later.
Acceptance criteria
Motivation
kb.search(andbuild_context_pack, which fans out from it) currently has two backends:fts5— BM25 lexical match viasrc/vouch/index_db.pysubstring— fallback scanner inKBStore.search_substringBoth are token-overlap-based. An agent searching for "how do we authenticate users?" gets nothing if the claim text reads "login flow uses session cookies signed by the API" — the concepts overlap, the tokens don't. This is the most common retrieval miss in agent workflows that compose claims into context packs.
Adding an embedding backend would close that gap while leaving the existing FTS5/substring chain intact as a precision-mode complement.
Proposed approach
Introduce a third backend slot, parallel to FTS5/substring:
src/vouch/embeddings.py(new) — Lazy-loaded local model (sentence-transformers/all-MiniLM-L6-v2orfastembed'sBAAI/bge-small-en-v1.5). Oneencode(texts: list[str]) -> np.ndarrayentrypoint, batched.src/vouch/index_db.py— Add anembeddings(kind, id, vec BLOB, dim INT)table. Two implementation options:sqlite-vecextension for ANN. Defer until someone hits the NumPy ceiling.KBStore.put_claim/put_source/put_pagecompute and store the embedding on write. Backfill viavouch index --rebuild.kb.searchacceptsbackend: "fts5" | "substring" | "embedding" | "hybrid". Hybrid = reciprocal rank fusion of FTS5 + embedding results. Default staysfts5so existing callers don't shift.pip install vouch[embeddings]so the base install stays lean. CI matrix runs both with and without the extra.Scope
In scope (this issue):
embeddings.pymoduleembeddingstable + reads/writes inindex_db.pyput_claim/put_source/put_pageindexing hooksbackend="embedding"andbackend="hybrid"paths in MCP + JSONL search handlersvouch index --rebuildregenerates embeddingspyproject.tomlOut of scope (follow-ups):
sqlite-vec, FAISS, etc.) — NumPy brute force is enough until proven otherwiseupdate_claimlandsOpen questions
build_context_pack— leave as FTS5, or default to hybrid once embeddings are available? Argument for hybrid: that's where agents live. Argument for FTS5: it's a behavior change for everyone who pulls.state.db— should the embeddings table record the model name + version so a mismatch on next read triggers a re-index? Yes, almost certainly — saves a footgun later.~/.cache/vouch/models/vs.vouch/models/. The former is cross-project (good for multiple KBs), the latter keeps the KB self-contained (matches the "files are source of truth" principle).put_claim? Defer, but worth noting it's cheap to add later.Acceptance criteria
vouch search --semantic "how do we authenticate users"returns a claim containing "login flow uses session cookies signed by the API" in a KB that has no lexical overlap between the two.kb.searchover MCP and JSONL both accept the newbackendvalues and route through the shared handler.pip install vouch(no extras) still works and uses FTS5/substring;pip install vouch[embeddings]enables the new backend with no other code changes required from callers.vouch index --rebuildregenerates the embeddings table from disk; running twice is idempotent.tests/test_embeddings.pyproves the lexical-disjoint case is now retrievable.