Turn any codebase into semantically-aware, searchable knowledge for AI-powered workflows.
- AST-Powered Chunking - Extract functions, classes, and methods from 23+ programming languages
- Parent-Child Relationships - Maintain hierarchical chunk-context for complete understanding
- Semantic Search - Find relevant code using natural language queries
- Multiple Search Modes - Semantic, symbol-based, pattern matching, and hybrid search
- Smart Deduplication - Hash-based detection of duplicate code
- TOON Format Export - Token-efficient output format for LLM prompts (40-60% token savings)
- Full Pipeline Automation - One command to chunk, embed, and store
- Docker-Ready - ChromaDB server included
| Agentic AI Systems | RAG Applications | Code Intelligence |
|---|---|---|
| Dynamic code retrieval for autonomous coding agents | High-precision code retrieval for question answering | Cross-repository code search and discovery |
| Context provision for code generation | Context injection for code explanation and documentation | Duplicate and similar code detection |
| Multi-step reasoning over large codebases | Semantic code search across repositories | Legacy codebase analysis and understanding |
| Tool integration for agent frameworks | Parent-child relationship tracking for complete context | MCP-compliant async architecture |
- Python 3.11 or higher
- Docker (for ChromaDB)
- OpenAI API key (for embeddings)
pip install contextinatorVerify the installation (requiers chromadb & openai api key setup):
contextinator --helpFor detailed setup and configuration, see USAGE.md
- Index a repository:
contextinator chunk-embed-store-embeddings \
--repo-url https://github.com/user/repo \
--save \
--collection-name MyRepo- Search your codebase:
# Natural language semantic search
contextinator search "authentication logic" -c MyRepo
# Find specific functions
contextinator symbol authenticate_user -c MyRepo
# Export results in TOON format for LLM consumption
contextinator search "error handling" -c MyRepo --toon results.jsonFor comprehensive CLI and Python API documentation, see USAGE.md
Built with and inspired by amazing open-source projects:
- tree-sitter - Incremental parsing system for AST generation
- ChromaDB - AI-native embedding database
- OpenAI - Embedding generation API
- Serena - Code intelligence and semantic search
- Continue - AI-powered code assistant
- Tabby - Self-hosted AI coding assistant
- Semantic Code Search - Code search and retrieval
- Aider - AI pair programming in the terminal
- VS Code Copilot Chat - Conversational AI for code
Licensed under the Apache License, Version 2.0. See LICENSE for details.
Contextinator is a code intelligence tool that uses Abstract Syntax Tree (AST) parsing to extract semantic code chunks, generates embeddings, and stores them in a vector database. This enables AI systems to understand, navigate, and reason about codebases with precision.
