Skip to content

fix(sessions): index crystallize summary page into FTS5 (#60)#61

Merged
plind-junior merged 3 commits into
vouchdev:mainfrom
YB0y:fix/60-crystallize-fts5-index-summary-page
May 25, 2026
Merged

fix(sessions): index crystallize summary page into FTS5 (#60)#61
plind-junior merged 3 commits into
vouchdev:mainfrom
YB0y:fix/60-crystallize-fts5-index-summary-page

Conversation

@YB0y

@YB0y YB0y commented May 22, 2026

Copy link
Copy Markdown
Contributor

What changed

sessions.crystallize() now indexes the session-summary Page into FTS5
the same way proposals.approve() does for an approved PAGE proposal.

src/vouch/sessions.py:109 previously wrote the summary via
store.put_page(page) only, which writes the markdown to disk and
populates the embedding index but does not touch pages_fts. Mirror
the existing PAGE branch of proposals.approve() by opening an
index_db connection and calling index_db.index_page(...) immediately
after the page write.

store.put_page(page)
with index_db.open_db(store.kb_dir) as conn:
    index_db.index_page(
        conn, id=page.id, title=page.title, body=page.body,
        type=page.type.value, tags=page.tags,
    )

Why

vouch search, kb.search, and the FTS5 branch of kb.context all
short-circuit on the first non-empty FTS5 result (see
src/vouch/context.py:29-40). The substring fallback only runs when FTS5
returns nothing — so on any KB with a populated state.db, every
session-<id> summary page was silently absent from search results, even
though it lives on disk and is embedded. That contradicts the README's
positioning of crystallize as the way to make a session's durable parts
permanent and retrievable.

rg "index_page" src/vouch/ before this change showed only
proposals.approve() (PAGE branch) and health.py's rebuilder
(used by vouch index) — confirming the FTS5 path had no auto-hook from
storage. After this change, sessions.py joins them.

Closes #60.

What would break for someone with an existing .vouch/ directory?

  • No on-disk layout, schema, or audit-log change. Existing .vouch/
    directories work unchanged.
  • Summary pages created before this fix remain absent from FTS5
    until the next vouch index rebuild. This is the documented
    reconciliation path for a stale state.db and matches behaviour after
    any prior FTS5-only fix.
  • No change to the kb.* MCP/JSONL surface, the bundle format, or the
    object model — no VEP required.

Tests

  • New regression test test_crystallize_summary_page_is_fts5_indexed
    in tests/test_sessions.py — proposes a claim in a session,
    crystallizes, asserts index_db.search(...) returns the summary
    page id. Fails on main (AssertionError), passes on this branch.
  • make check clean: ruff + mypy + pytest (74 passed, 1 skipped).

Notes

PR #43 (feat/semantic-search-phase-7-context-pack) currently touches
sessions.py for an unrelated session_end refactor, but does not
modify the crystallize summary-page write block — trivial rebase if
both land.

Summary by CodeRabbit

  • Bug Fixes

    • Session-summary pages are now automatically indexed when a session is finalized, so summaries appear in search and context results without needing a manual index rebuild
  • Tests

    • Added a test ensuring session-summary pages are indexed and discoverable via search
  • Documentation

    • Changelog updated to document the indexing fix and prior behavior

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 22, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4c5bf9a5-4da5-4e55-be24-af718b46ccfe

📥 Commits

Reviewing files that changed from the base of the PR and between fa11255 and 7e01df3.

📒 Files selected for processing (2)
  • CHANGELOG.md
  • tests/test_sessions.py
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md

📝 Walkthrough

Walkthrough

Crystallize now indexes the session-summary page into FTS5 immediately after storing it; the PR adds a changelog entry and a test that verifies the summary page is discoverable via FTS5 search.

Changes

Session crystallize FTS5 indexing

Layer / File(s) Summary
Session summary page FTS5 indexing
CHANGELOG.md, src/vouch/sessions.py, tests/test_sessions.py
Changelog documents that vouch crystallize now indexes its session-summary page into FTS5 immediately after storage. sessions.py imports index_db and calls index_db.index_page() after store.put_page() in crystallize(). New test test_crystallize_summary_page_is_fts5_indexed verifies the summary page is discoverable via FTS5 search.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A session closed, a summary penned so neat,
Now FTS5 hums and makes the finding sweet.
No more hidden pages in the DB's deep night,
Search hops, finds, and brings the page to light. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(sessions): index crystallize summary page into FTS5 (#60)' accurately and concisely describes the primary change: ensuring session-summary pages are indexed into FTS5 during crystallize, directly matching the changeset and PR objectives.
Linked Issues check ✅ Passed The PR successfully implements the primary objective from issue #60: adding FTS5 indexing to session-summary pages in crystallize by calling index_db.index_page() after store.put_page(), mirroring the approach in proposals.approve(). The change is validated by a new regression test.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing issue #60: CHANGELOG.md documents the fix, sessions.py implements FTS5 indexing during crystallize, and test_sessions.py adds a regression test. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

`crystallize()` wrote the session-summary Page via `store.put_page()` only,
which embeds but does not populate `pages_fts`. On KBs with a populated
`state.db`, FTS5 returns hits for other artifacts and short-circuits the
substring fallback, so `session-<id>` pages were silently absent from
`vouch search` / `kb.search` / `kb.context`. Mirror the PAGE branch of
`proposals.approve()` by opening an `index_db` connection and calling
`index_db.index_page(...)` immediately after the page write.

Closes vouchdev#60.
@YB0y YB0y force-pushed the fix/60-crystallize-fts5-index-summary-page branch from 70b9c69 to 29635de Compare May 22, 2026 21:28

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/vouch/sessions.py`:
- Around line 110-114: The index write in crystallize() currently calls
index_db.open_db(...) and index_db.index_page(...) after approvals and
store.put_page(page) and any exception there will abort the whole crystallize
flow; wrap the index write in a safe, non-fatal try/except so index errors are
logged but do not raise (e.g., try: with index_db.open_db(store.kb_dir) as conn:
index_db.index_page(conn, id=page.id, ...) except Exception as e:
logger.error("index_page failed for page=%s: %s", page.id, e) ), ensuring
approvals and store.put_page remain successful even if index_db.open_db or
index_db.index_page fails and use the existing logger to record the failure for
later investigation.

In `@tests/test_sessions.py`:
- Around line 62-63: The FTS assertion uses an unindexed field (summary_id) so
change the search query to use the indexed session identifier used in the
summary text (sess.id) to make the test stable: call index_db.search with
sess.id (instead of summary_id) against store.kb_dir and keep the assertion that
checks for kind == "page" and hid == summary_id (or update the expected hid if
necessary) so the test relies on an indexed query term rather than unindexed
pages_fts.id.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b225021e-33db-4050-b56b-72ee161feb29

📥 Commits

Reviewing files that changed from the base of the PR and between 70b9c69 and 29635de.

📒 Files selected for processing (3)
  • CHANGELOG.md
  • src/vouch/sessions.py
  • tests/test_sessions.py
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md

Comment thread src/vouch/sessions.py
Comment on lines +110 to +114
with index_db.open_db(store.kb_dir) as conn:
index_db.index_page(
conn, id=page.id, title=page.title, body=page.body,
type=page.type.value, tags=page.tags,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle index-write failures without aborting crystallization.

On Line 110, an indexing DB error will raise out of crystallize() after approvals and store.put_page(page) have already succeeded. That makes this secondary index write a hard failure for the whole operation.

Suggested hardening
+import sqlite3
...
-        with index_db.open_db(store.kb_dir) as conn:
-            index_db.index_page(
-                conn, id=page.id, title=page.title, body=page.body,
-                type=page.type.value, tags=page.tags,
-            )
+        try:
+            with index_db.open_db(store.kb_dir) as conn:
+                index_db.index_page(
+                    conn, id=page.id, title=page.title, body=page.body,
+                    type=page.type.value, tags=page.tags,
+                )
+        except sqlite3.Error:
+            logger.exception(
+                "crystallize: failed to index summary page %s", page.id
+            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with index_db.open_db(store.kb_dir) as conn:
index_db.index_page(
conn, id=page.id, title=page.title, body=page.body,
type=page.type.value, tags=page.tags,
)
try:
with index_db.open_db(store.kb_dir) as conn:
index_db.index_page(
conn, id=page.id, title=page.title, body=page.body,
type=page.type.value, tags=page.tags,
)
except sqlite3.Error:
logger.exception(
"crystallize: failed to index summary page %s", page.id
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/sessions.py` around lines 110 - 114, The index write in
crystallize() currently calls index_db.open_db(...) and index_db.index_page(...)
after approvals and store.put_page(page) and any exception there will abort the
whole crystallize flow; wrap the index write in a safe, non-fatal try/except so
index errors are logged but do not raise (e.g., try: with
index_db.open_db(store.kb_dir) as conn: index_db.index_page(conn, id=page.id,
...) except Exception as e: logger.error("index_page failed for page=%s: %s",
page.id, e) ), ensuring approvals and store.put_page remain successful even if
index_db.open_db or index_db.index_page fails and use the existing logger to
record the failure for later investigation.

Comment thread tests/test_sessions.py Outdated
…vouchdev#61 review)

`pages_fts.id` is declared UNINDEXED, so the previous query relied on
tokenization side effects of summary_id matching the title. Switch the
search term to sess.id, which is explicitly present in the indexed title
and body, so the test contract is stable against future title-format
changes.

Also drop the test-body comment that restated what the test name and
assertions already convey.
@plind-junior

Copy link
Copy Markdown
Collaborator

Fix MR conflict

…crystallize-fts5-index-summary-page

# Conflicts:
#	CHANGELOG.md
#	tests/test_sessions.py
@YB0y

YB0y commented May 25, 2026

Copy link
Copy Markdown
Contributor Author

@plind-junior Thanks for your review.
Can you review my PR again please?

@plind-junior

Copy link
Copy Markdown
Collaborator

LGTM!

@plind-junior plind-junior merged commit cf6f5f7 into vouchdev:main May 25, 2026
1 check passed
@YB0y YB0y deleted the fix/60-crystallize-fts5-index-summary-page branch May 25, 2026 13:38
@YB0y

YB0y commented May 25, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for your work.

galuis116 added a commit to galuis116/vouch that referenced this pull request May 25, 2026
Resolves conflicts in CHANGELOG.md, tests/test_sessions.py, and
auto-merges src/vouch/sessions.py with the recently-merged:

  - vouchdev#61 (FTS5 indexing of crystallize summary page) — adds
    index_db.index_page(...) right after store.put_page(...) inside
    crystallize. Composes cleanly with this PR's strict-derivation
    _build_summary_body and the summary_page_id audit fix.
  - vouchdev#62 (test_crystallize_collects_approval_failures) — preserved as
    a sibling test.
  - vouchdev#75 (bundle import sha256 verification) — CHANGELOG entry only.
  - vouchdev#73 (bundle POSIX separators) — CHANGELOG entry only.

Both regression tests added by this PR
(test_crystallize_summary_page_does_not_leak_agent_controlled_fields
and test_crystallize_audit_event_records_summary_page_id) still pass
against the merged sessions.crystallize.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: crystallize writes session-summary page without FTS5 indexing — pages invisible to kb.search

2 participants