feat(search): add embedding-based semantic retrieval as a third backend

## Motivation

`kb.search` (and `build_context_pack`, which fans out from it) currently has two backends:

- `fts5` — BM25 lexical match via `src/vouch/index_db.py`
- `substring` — fallback scanner in `KBStore.search_substring`

Both are token-overlap-based. An agent searching for *"how do we authenticate users?"* gets nothing if the claim text reads *"login flow uses session cookies signed by the API"* — the concepts overlap, the tokens don't. This is the most common retrieval miss in agent workflows that compose claims into context packs.

Adding an embedding backend would close that gap while leaving the existing FTS5/substring chain intact as a precision-mode complement.

## Proposed approach

Introduce a third backend slot, parallel to FTS5/substring:

1. **`src/vouch/embeddings.py` (new)** — Lazy-loaded local model (`sentence-transformers/all-MiniLM-L6-v2` or `fastembed`'s `BAAI/bge-small-en-v1.5`). One `encode(texts: list[str]) -> np.ndarray` entrypoint, batched.
2. **`src/vouch/index_db.py`** — Add an `embeddings(kind, id, vec BLOB, dim INT)` table. Two implementation options:
   - **MVP**: pure NumPy cosine over rows loaded into memory. Simple, no extra deps, fine for KBs under ~10k claims.
   - **Scale-up**: `sqlite-vec` extension for ANN. Defer until someone hits the NumPy ceiling.
3. **Indexing hooks** — `KBStore.put_claim` / `put_source` / `put_page` compute and store the embedding on write. Backfill via `vouch index --rebuild`.
4. **Search dispatch** — `kb.search` accepts `backend: "fts5" | "substring" | "embedding" | "hybrid"`. Hybrid = reciprocal rank fusion of FTS5 + embedding results. Default stays `fts5` so existing callers don't shift.
5. **Optional dep** — Ship under `pip install vouch[embeddings]` so the base install stays lean. CI matrix runs both with and without the extra.

## Scope

**In scope (this issue):**
- `embeddings.py` module
- `embeddings` table + reads/writes in `index_db.py`
- `put_claim` / `put_source` / `put_page` indexing hooks
- `backend="embedding"` and `backend="hybrid"` paths in MCP + JSONL search handlers
- `vouch index --rebuild` regenerates embeddings
- Regression test: a claim that's semantically related but lexically disjoint from a query is retrievable
- Optional dep wiring in `pyproject.toml`

**Out of scope (follow-ups):**
- Multi-model support / pluggable backends — single hardcoded model for now
- ANN index (`sqlite-vec`, FAISS, etc.) — NumPy brute force is enough until proven otherwise
- Embedding cache invalidation on claim *update* — start with insert-only; address when `update_claim` lands
- Cross-lingual or domain-finetuned models

## Open questions

1. **Default backend for `build_context_pack`** — leave as FTS5, or default to hybrid once embeddings are available? Argument for hybrid: that's where agents live. Argument for FTS5: it's a behavior change for everyone who pulls.
2. **Model identity in `state.db`** — should the embeddings table record the model name + version so a mismatch on next read triggers a re-index? Yes, almost certainly — saves a footgun later.
3. **Where does the model cache live?** — `~/.cache/vouch/models/` vs `.vouch/models/`. The former is cross-project (good for multiple KBs), the latter keeps the KB self-contained (matches the "files are source of truth" principle).
4. **Embedding-as-citation** — if two claims have ≥0.95 cosine similarity at ingest time, do we surface a "possible duplicate" warning at `put_claim`? Defer, but worth noting it's cheap to add later.

## Acceptance criteria

- [ ] `vouch search --semantic "how do we authenticate users"` returns a claim containing *"login flow uses session cookies signed by the API"* in a KB that has no lexical overlap between the two.
- [ ] `kb.search` over MCP and JSONL both accept the new `backend` values and route through the shared handler.
- [ ] `pip install vouch` (no extras) still works and uses FTS5/substring; `pip install vouch[embeddings]` enables the new backend with no other code changes required from callers.
- [ ] `vouch index --rebuild` regenerates the embeddings table from disk; running twice is idempotent.
- [ ] Regression test in `tests/test_embeddings.py` proves the lexical-disjoint case is now retrievable.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(search): add embedding-based semantic retrieval as a third backend #35

Motivation

Proposed approach

Scope

Open questions

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

feat(search): add embedding-based semantic retrieval as a third backend #35

Description

Motivation

Proposed approach

Scope

Open questions

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions