-
Notifications
You must be signed in to change notification settings - Fork 0
Open
0 / 20 of 2 issues completedOpen
0 / 20 of 2 issues completed
Copy link
Labels
ai/llmAI workflows, agents, promptsAI workflows, agents, promptsfeatureNew capabilityNew capabilityscannerProject detection, tech stack scanningProject detection, tech stack scanning
Description
Summary
Add ColGREP as an optional semantic code search tool alongside an improved ripgrep-based search engine. Makes the baseline ripgrep search more robust first, then layers ColGREP as an optional upgrade for richer codebase understanding during scan and spec phases.
Problem / Context
Current search has several limitations:
- Two separate ripgrep wrappers (
interview-tools.tsandtools.ts) with duplicated logic - Restrictive result limits (3/file, 20 total, 200-char truncation, 0 context lines)
- Missing exclusions for common noise dirs (
.next,coverage,__pycache__,.turbo, etc.) hasRipgrep()callswhich rgsynchronously on every invocation — no caching- Pattern-only search misses conceptual queries ("error handling middleware", "auth flow")
ColGREP is a semantic code search CLI that uses multi-vector embeddings (ColBERT) with Tree-sitter AST parsing. It wins 70% of head-to-head comparisons vs grep and reduces token usage by ~15%. It runs fully locally with a bundled 17M-param model — no API keys needed.
Proposed Solution
Architecture: Layered Search Module
src/ai/tools/search.ts ← shared engine (executeSearch, validateSearchPath, hasRipgrep cache)
src/ai/tools/colgrep.ts ← ColGREP detection, index management, semantic search wrapper
interview-tools.ts ← wraps engine as search_codebase + semantic_search tools
tools.ts ← wraps engine as searchCode + semanticSearch tools
Two Distinct AI Tools
search_codebase/searchCode— ripgrep-based, for exact patterns, regex, identifierssemantic_search/semanticSearch— ColGREP-based, for natural language conceptual queries- AI chooses which to use based on query type
ColGREP Integration
- Detection: cached
hasColgrep()binary check, stored in.ralph/ralph.config.cjs - Index build: during
wiggum init, in parallel with AI analysis (60s timeout) - Index sync: quick incremental at
wiggum newstart (15s timeout) - Degradation: if not installed or index fails, silently fall back to ripgrep only
Ripgrep Improvements
- Shared
executeSearch()engine replacing two duplicated implementations - Comprehensive exclusion list (dirs + globs)
- Cached
hasRipgrep()— checked once per process - Better defaults: 5/file, 25-50 total, 500-char content, 2 context lines
- Robust result parsing with proper regex instead of first-colon split
.gitignorerespected (ripgrep default behavior preserved)
Files to Modify
New Files
| File | Purpose |
|---|---|
src/ai/tools/search.ts |
Core search engine |
src/ai/tools/colgrep.ts |
ColGREP detection, index, search |
src/ai/tools/__tests__/search.test.ts |
Engine tests |
src/ai/tools/__tests__/colgrep.test.ts |
ColGREP tests |
src/ai/tools/__tests__/search-integration.test.ts |
Tool wrapper tests |
Modified Files
| File | Changes |
|---|---|
src/ai/tools/index.ts |
Export new modules |
src/ai/tools.ts |
Replace inline ripgrep with executeSearch(), add semanticSearch |
src/ai/conversation/interview-tools.ts |
Replace inline ripgrep/grep with executeSearch(), add semantic_search |
src/ai/enhancer.ts |
Detect colgrep, parallel syncIndex(), pass availability |
src/ai/agents/codebase-analyzer.ts |
Pass colgrepAvailable to tools |
src/tui/orchestration/interview-orchestrator.ts |
Quick colgrep sync, pass availability |
src/ai/conversation/spec-generator.ts |
Same as orchestrator for CLI path |
src/utils/tui.ts |
Add semantic_search to TOOL_ICONS |
src/tui/hooks/useSpecGenerator.ts |
Add format case for semantic_search |
src/generator/config.ts |
Add tools.colgrep to config |
src/ai/prompts.ts |
Mention semanticSearch when available |
Acceptance Criteria
- Shared
executeSearch()engine with comprehensive exclusions and cached binary detection - ColGREP detection, index build/sync, and semantic search wrapper
-
semantic_searchtool registered conditionally when ColGREP is available - Index built in parallel during
wiggum init, synced duringwiggum new - Graceful degradation: ColGREP optional → ripgrep always → grep fallback
- Improved ripgrep defaults (500-char content, 2 context lines, better limits)
- TUI displays semantic search tool calls with icon
- Unit tests for engine, ColGREP module, and tool registration
- ColGREP availability persisted in
.ralph/ralph.config.cjs
Design Doc
Reactions are currently unavailable
Sub-issues
Metadata
Metadata
Assignees
Labels
ai/llmAI workflows, agents, promptsAI workflows, agents, promptsfeatureNew capabilityNew capabilityscannerProject detection, tech stack scanningProject detection, tech stack scanning