feat: add metadata filters (project, session_id, git_branch) to search#63
feat: add metadata filters (project, session_id, git_branch) to search#63
Conversation
The exchanges table already has indexed columns for project, session_id, and git_branch, but the search API only exposes time-based filters (after/before). This adds metadata filtering to enable project-specific and branch-specific searches. Changes: - SearchOptions: add project, session_id, git_branch fields - search.ts: extend WHERE clause for both vector and text search - search.ts: validate metadata inputs (regex + length check) - search.ts: over-fetch 3x for vector search with metadata filters (vec0 applies KNN before WHERE post-filter) - search.ts: include session_id, git_branch in SELECT and result mapping - mcp-server.ts: add Zod schema + JSON schema for new parameters - search-cli.ts: add --project, --session-id, --git-branch flags - integration.test.ts: 9 new tests for metadata filtering Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis pull request adds optional metadata filtering capabilities (project, session_id, git_branch) across the system stack. The filters are introduced in the MCP server schemas, CLI arguments, and search implementation, with corresponding validation and database query integration. Changes
Sequence DiagramsequenceDiagram
actor CLI
participant MCP as MCP Server
participant Search as Search Engine
participant DB as Database
CLI->>MCP: search with metadata filters<br/>(project, session_id, git_branch)
MCP->>MCP: validate filter fields<br/>against schema
MCP->>Search: searchConversations(queries,<br/>options with filters)
Search->>Search: validateMetadataFilter<br/>for each filter
Search->>Search: construct combined<br/>filterClause with<br/>metadata + time constraints
Search->>DB: SELECT with filterClause<br/>+ effectiveK (limit × 3)
DB-->>Search: rows with session_id,<br/>git_branch metadata
Search->>Search: post-filter results<br/>to requested limit
Search->>Search: map session_id/git_branch<br/>to sessionId/gitBranch
Search-->>MCP: filtered results with<br/>metadata fields
MCP-->>CLI: formatted results
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/search-cli.ts (1)
69-74: Guard against missing values for new flags.If a user forgets the value, the parser will swallow the next flag (or set undefined) silently. Consider validating the next token and exiting with a helpful error (same guard can optionally apply to other flags too).
Suggested tweak
} else if (arg === '--project') { - project = args[++i]; + const value = args[++i]; + if (!value || value.startsWith('--')) { + console.error('Missing value for --project'); + process.exit(1); + } + project = value; } else if (arg === '--session-id') { - sessionId = args[++i]; + const value = args[++i]; + if (!value || value.startsWith('--')) { + console.error('Missing value for --session-id'); + process.exit(1); + } + sessionId = value; } else if (arg === '--git-branch') { - gitBranch = args[++i]; + const value = args[++i]; + if (!value || value.startsWith('--')) { + console.error('Missing value for --git-branch'); + process.exit(1); + } + gitBranch = value; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/search-cli.ts` around lines 69 - 74, The flag parsing for --project, --session-id, and --git-branch (variables project, sessionId, gitBranch) can consume a missing value or the next flag; update the argument-parsing logic to validate the next token after each flag exists and is not another flag (e.g., not starting with '-') before assigning args[++i]; if validation fails, print a clear error like "Missing value for --project/--session-id/--git-branch" and exit (or throw) so the CLI fails fast; apply the same guard pattern used elsewhere in the parser loop so other flags are protected too.src/search.ts (1)
58-65: Prefer parameterized filters over string interpolation.Even with regex validation, binding parameters is more robust and future-proof (e.g., if validation rules ever loosen).
Possible refactor (parameterized filters)
- const filters: string[] = []; - if (after) filters.push(`e.timestamp >= '${after}'`); - if (before) filters.push(`e.timestamp <= '${before}'`); - if (project) filters.push(`e.project = '${project}'`); - if (session_id) filters.push(`e.session_id = '${session_id}'`); - if (git_branch) filters.push(`e.git_branch = '${git_branch}'`); - const filterClause = filters.length > 0 ? `AND ${filters.join(' AND ')}` : ''; + const filters: string[] = []; + const filterParams: string[] = []; + if (after) { filters.push('e.timestamp >= ?'); filterParams.push(after); } + if (before) { filters.push('e.timestamp <= ?'); filterParams.push(before); } + if (project) { filters.push('e.project = ?'); filterParams.push(project); } + if (session_id) { filters.push('e.session_id = ?'); filterParams.push(session_id); } + if (git_branch) { filters.push('e.git_branch = ?'); filterParams.push(git_branch); } + const filterClause = filters.length > 0 ? `AND ${filters.join(' AND ')}` : ''; - results = stmt.all( - Buffer.from(new Float32Array(queryEmbedding).buffer), - effectiveK - ); + results = stmt.all( + Buffer.from(new Float32Array(queryEmbedding).buffer), + effectiveK, + ...filterParams + ); - const textResults = textStmt.all(`%${query}%`, `%${query}%`, limit); + const textResults = textStmt.all(`%${query}%`, `%${query}%`, ...filterParams, limit);Also applies to: 93-100, 124-130
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/search.ts` around lines 58 - 65, The code builds SQL filterClause via string interpolation using variables (after, before, project, session_id, git_branch) stored in the filters array and filterClause — change this to use parameterized bindings: instead of embedding values into filters, push condition templates like "e.timestamp >= $1" (or "?" depending on your DB client) and collect corresponding values into a params array, then join conditions into filterClause and pass params to the query execution; apply the same refactor to the other similar blocks mentioned (the filters usage around the sections that build filters at 93-100 and 124-130) so all dynamic values are bound rather than interpolated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/search-cli.ts`:
- Around line 69-74: The flag parsing for --project, --session-id, and
--git-branch (variables project, sessionId, gitBranch) can consume a missing
value or the next flag; update the argument-parsing logic to validate the next
token after each flag exists and is not another flag (e.g., not starting with
'-') before assigning args[++i]; if validation fails, print a clear error like
"Missing value for --project/--session-id/--git-branch" and exit (or throw) so
the CLI fails fast; apply the same guard pattern used elsewhere in the parser
loop so other flags are protected too.
In `@src/search.ts`:
- Around line 58-65: The code builds SQL filterClause via string interpolation
using variables (after, before, project, session_id, git_branch) stored in the
filters array and filterClause — change this to use parameterized bindings:
instead of embedding values into filters, push condition templates like
"e.timestamp >= $1" (or "?" depending on your DB client) and collect
corresponding values into a params array, then join conditions into filterClause
and pass params to the query execution; apply the same refactor to the other
similar blocks mentioned (the filters usage around the sections that build
filters at 93-100 and 124-130) so all dynamic values are bound rather than
interpolated.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (4)
dist/mcp-server.jsis excluded by!**/dist/**dist/search-cli.jsis excluded by!**/dist/**dist/search.d.tsis excluded by!**/dist/**dist/search.jsis excluded by!**/dist/**
📒 Files selected for processing (4)
src/mcp-server.tssrc/search-cli.tssrc/search.tstest/integration.test.ts
Summary
The
exchangestable already has indexed columns forproject,session_id, andgit_branch, but the search API only exposes time-based filters (after/before). This PR adds metadata filtering to both the MCP tool and the CLI, enabling project-specific and branch-specific conversation searches.Motivation: Multi-project users frequently need to scope search results to a specific project or git branch. The data and indexes already exist in the database — this change simply exposes them through the existing search interfaces.
Changes
src/search.tsSearchOptionsinterface withproject,session_id,git_branchvalidateMetadataFilter()— regex validation + length check, matching the existingvalidateISODate()patterntimeFilterarray → generalfiltersarray for WHERE clause constructionsession_id,git_branchto SELECT columns and result mappingsrc/mcp-server.tsmin(1)+.optional())ListToolsRequestSchemahandlersrc/search-cli.ts--project,--session-id,--git-branchCLI flagstest/integration.test.tsdescribe('Metadata Filtering')block with 9 tests:Design decisions
validateISODate). A full parameterized query refactor is out of scope for this PR.=(exact match). Prefix/fuzzy matching can be added in a follow-up if needed.Backward compatibility
All new parameters are optional. When omitted, behavior is identical to the current version — the
filtersarray remains empty and produces the same SQL as the previoustimeFilterapproach.Test plan
npm run build— compiles without errorsnpm test— all new tests pass; no regressions in existing testsSummary by CodeRabbit
Release Notes