A high-performance code repository indexing and retrieval system using metadata-only RAG (Retrieval-Augmented Generation) for efficient semantic search.
Scythe Context Engine indexes code repositories by extracting functions, classes, and other code structures, then creates searchable embeddings based on metadata rather than full code. This approach significantly reduces embedding costs and improves retrieval speed while maintaining high-quality results.
- Metadata-Only RAG: Embeddings are created from function names, docstrings, and AI-generated summaries instead of full code
- Efficient Storage: Full code is stored separately on disk and loaded only when needed
- Multi-Language Support: Python, JavaScript, TypeScript, Java, C/C++, Go, and Rust
- Smart Reranking: LLM-based reranking of search results for improved relevance
- Semantic Caching: Caches refined context to speed up repeated queries
- Parallel Processing: Multi-threaded indexing and embedding for fast processing
- Batch Processing: Optional Groq Batch API support for cost-effective indexing (up to 50% cost reduction)
- File Collection: Scans repository for supported code files
- AST Parsing: Uses tree-sitter to extract functions and classes
- Metadata Extraction: Extracts function names, docstrings, and line numbers
- Summarization: Generates AI summaries of each function
- Chunk Storage: Saves full code to
full_chunks/directory - Embedding: Creates embeddings from metadata (name + docstring + summary)
- Index Creation: Builds FAISS vector index for fast similarity search
- Query Embedding: Converts search query to vector
- Initial Retrieval: Finds top-k similar chunks using FAISS
- Reranking: LLM scores chunks based on metadata relevance
- Code Loading: Loads full code from disk for top-ranked chunks
- Context Refinement: LLM extracts essential context for the query
- Caching: Stores refined context for future identical queries
uv pip install -e .Standard (Real-time) Indexing:
uv run python index_repo.py /path/to/repo --output repo_indexBatch Indexing (Cost-Effective for Large Repos):
uv run python index_repo.py /path/to/repo --output repo_index --batchThe --batch flag uses Groq's Batch API for summarization, which:
- Reduces costs by up to 50%
- Takes longer (minutes to hours depending on batch completion window)
- Is ideal for initial indexing of large repositories
See Groq Batch Usage Guide for details.
Output Files:
repo_index/index.faiss- FAISS vector indexrepo_index/chunks.pkl- Chunk metadatarepo_index/meta.json- Index metadatarepo_index/full_chunks/- Directory containing full code for each chunk
uv run python query_context.py "your search query" --index repo_indexOptions:
--top-k N- Number of chunks to retrieve initially (default: 20)--output-k N- Number of chunks in final output (default: 5)--no-cache- Disable semantic caching
Edit config/config.py to configure:
- Provider: Choose between
openrouterorollama - Models: Set embedding and summarization models
- API Keys: Configure OpenRouter API key
- Ignored Paths: Customize which directories/files to skip during indexing
This version introduces a breaking change in how chunks are stored and retrieved.
- Old Behavior: Full code was embedded and stored in the vector index
- New Behavior: Only metadata (function name, docstring, summary) is embedded; full code is stored separately
If you have existing indexes, you must re-index your repositories:
# Delete old index
rm -rf repo_index/
# Re-index with new system
uv run python index_repo.py /path/to/repo --output repo_index- Cost Reduction: Embedding metadata is 10-100x cheaper than embedding full code
- Better Retrieval: Metadata provides clearer semantic signals for matching
- Flexibility: Full code can be loaded selectively, reducing memory usage
- Scalability: Enables indexing of much larger codebases
Each code chunk has the following metadata:
chunk_id: Unique identifier (hash of file path + line numbers)function_name: Name of the function/classfile_path: Relative path to source filestart_line: Starting line numberend_line: Ending line numberdocstring: Extracted docstring (if available)summary: AI-generated summary of the functionfull_code_path: Path to the stored full code filenode_type: AST node type (e.g., function_definition, class_definition)
- ~100-500 files/minute (depends on file size and model speed)
- Parallel processing with 8 worker threads for file processing
- Parallel embedding with 32 worker threads
- Initial retrieval: <100ms (FAISS search)
- Reranking: 1-3s (LLM scoring)
- Context refinement: 2-5s (LLM extraction)
- Cache hit: <10ms
The summarization prompt can be customized in indexer/summarizer.py:
def summarize_function(code: str, function_name: str, file_path: str) -> str:
prompt = f"""Your custom prompt here..."""
# ... rest of functionChunk storage logic is in indexer/chunk_storage.py. You can modify:
generate_chunk_id()- Change how chunk IDs are generatedsave_full_chunk()- Change storage format or locationload_full_chunk()- Change loading logic
Reranking logic is in query_context/reranking.py:
_build_rerank_prompt()- Customize the reranking prompt_score_chunks_with_model()- Change scoring logic
Solution: Ensure the index was created with the same output prefix you're querying
Solution:
- Reduce number of worker threads in
file_processor.py - Use a faster summarization model
- Skip summarization for small functions
Solution:
- Increase
--top-kto retrieve more candidates - Adjust the similarity threshold in
query_context/query.py(line 376) - Use a better embedding model
This is a personal project, but suggestions and bug reports are welcome.
MIT License - See LICENSE file for details