Skip to content

feat: ColGREP semantic search integration #111

@federiconeri

Description

@federiconeri

Summary

Add ColGREP as an optional semantic code search tool. Detect if installed, manage index lifecycle, and expose semantic_search / semanticSearch as AI tools alongside the existing ripgrep-based search.

Depends on: #110 (shared search engine must be in place first)

Problem / Context

Pattern-based search (ripgrep) is great for exact matches but misses conceptual queries. When the AI wants to find "authentication middleware" or "error handling patterns", it has to guess keywords. ColGREP uses multi-vector embeddings with Tree-sitter AST parsing to enable semantic code search — it runs locally with no API keys needed.

Proposed Solution

1. ColGREP module: src/ai/tools/colgrep.ts

Detection & availability:

  • hasColgrep() — cached binary check (same pattern as hasRipgrep())
  • getIndexStatus(projectRoot)'ready' | 'stale' | 'missing'
  • canUseColgrep(projectRoot){ installed, indexStatus }

Index management:

  • syncIndex(projectRoot, options?) — build or incremental sync
    • wiggum init: parallel with AI analysis, 60s timeout, progress callback for TUI
    • wiggum new: 15s timeout, quick incremental sync
    • Never blocks — timeout/failure means proceed without semantic search

Semantic search:

  • executeSemanticSearch(options: SemanticSearchOptions)SemanticSearchResult
  • Shells out to colgrep --json <query> with flags
  • Parses JSON output, trims heavy fields (call graph, data flow, control flow)
  • Keeps: file, line, content, signature, score, unitType

2. Conditional tool registration

Interview tools (interview-tools.ts):

  • createInterviewTools(projectRoot, options?: { colgrepAvailable?: boolean })
  • When colgrepAvailable: true, adds semantic_search tool (🧠 icon)

Exploration tools (tools.ts):

  • Same pattern, adds semanticSearch tool

3. Integration with init/new phases

src/ai/enhancer.ts (wiggum init):

  1. canUseColgrep() — fast check
  2. syncIndex() in parallel with runCodebaseAnalyzer()
  3. Store tools.colgrep: true/false in .ralph/ralph.config.cjs

src/tui/orchestration/interview-orchestrator.ts (wiggum new):

  1. Read tools.colgrep from config
  2. If true, syncIndex() with 15s timeout
  3. Pass colgrepAvailable to createInterviewTools()

src/ai/conversation/spec-generator.ts (CLI path):

  • Same pattern as interview-orchestrator

4. TUI display

  • src/utils/tui.ts: add semantic_search / semanticSearch → 🧠 icon
  • src/tui/hooks/useSpecGenerator.ts: format case for semantic_search display

5. Graceful degradation

ColGREP + index ready     →  semantic_search + search_codebase
ColGREP + index fails     →  search_codebase only (warn)
ColGREP not installed     →  search_codebase only (silent)

Runtime ColGREP errors return helpful message directing AI to search_codebase.

Files to Modify

File Changes
src/ai/tools/colgrep.ts NEW — detection, index, search
src/ai/tools/__tests__/colgrep.test.ts NEW — ColGREP module tests
src/ai/tools/index.ts Export colgrep module
src/ai/conversation/interview-tools.ts Accept colgrepAvailable, add semantic_search tool
src/ai/tools.ts Accept colgrepAvailable, add semanticSearch tool
src/ai/enhancer.ts Detect colgrep, parallel syncIndex(), pass availability
src/ai/agents/codebase-analyzer.ts Pass colgrepAvailable to tools
src/tui/orchestration/interview-orchestrator.ts Quick sync, pass colgrepAvailable
src/ai/conversation/spec-generator.ts Same as orchestrator for CLI path
src/utils/tui.ts Add semantic_search to TOOL_ICONS and format
src/tui/hooks/useSpecGenerator.ts Add format case for semantic_search
src/generator/config.ts Add tools.colgrep to config output
src/ai/prompts.ts Mention semanticSearch when available

Acceptance Criteria

  • ColGREP detection with cached binary check
  • Index status check (ready/stale/missing)
  • syncIndex() with timeout, progress callback, never blocks
  • executeSemanticSearch() parses ColGREP JSON, trims to lean output
  • semantic_search tool registered conditionally in interview tools
  • semanticSearch tool registered conditionally in exploration tools
  • Index built in parallel during wiggum init (60s timeout)
  • Index synced during wiggum new (15s timeout)
  • Availability persisted in .ralph/ralph.config.cjs
  • TUI shows 🧠 icon for semantic search tool calls
  • Graceful degradation: failures → ripgrep only, no crashes
  • Unit tests: detection, index lifecycle, search parsing, field trimming, errors
  • All existing tests still pass

Parent Issue

Part of #109

Metadata

Metadata

Assignees

Labels

ai/llmAI workflows, agents, promptsfeatureNew capabilityscannerProject detection, tech stack scanning

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions