Local-first Retrieval Augmented Generation for AI agents - Privacy-focused with automatic indexing
Local models β’ Automatic indexing β’ ChromaDB vectors β’ 5 MCP tools
Quick Start β’ Installation β’ Tools
Enable your AI agents with powerful Retrieval Augmented Generation (RAG) capabilities using local models. This Model Context Protocol (MCP) server automatically indexes your project documents and provides relevant context to enhance LLM responses.
The Problem:
Traditional RAG solutions:
- Cloud-based (privacy concerns) β
- Complex setup (multiple services) β
- Manual indexing (time-consuming) β
- Expensive API costs (per query) β
The Solution:
RAG Server MCP:
- Local-first (Ollama + ChromaDB) β
- Docker Compose (one command) β
- Automatic indexing (on startup) β
- Free local models (zero API costs) β
Result: Privacy-focused, zero-cost RAG with automatic context retrieval for your AI agents.
| Feature | Cloud RAG | RAG Server MCP |
|---|---|---|
| Data Privacy | β Sent to cloud | β 100% local |
| Model Control | β Fixed models | β Any Ollama model |
| Vector Storage | β Cloud service | β Local ChromaDB |
| Cost | β Pay per query | β Free (local) |
| Customization | β Full control |
- Automatic Indexing - Scans project on startup, no manual work
- Persistent Vectors - ChromaDB stores embeddings between sessions
- Hierarchical Chunking - Smart markdown splitting (text + code blocks)
- Multiple File Types -
.txt,.md, code files,.json,.csv - Local Embeddings - Ollama
nomic-embed-text(no API calls)
Run the server and all dependencies (ChromaDB, Ollama) in isolated containers.
Prerequisites:
- Docker Desktop or Docker Engine
- Ports
8000(ChromaDB) and11434(Ollama) available
Setup:
# Clone repository
git clone https://github.com/SylphxAI/rag-server-mcp.git
cd rag-server-mcp
# Start all services
docker-compose up -d --build
# Pull embedding model (first run only)
docker exec ollama ollama pull nomic-embed-textIf you already have ChromaDB and Ollama running:
# Set environment variables
export CHROMA_URL=http://localhost:8000
export OLLAMA_HOST=http://localhost:11434
# Run via npx
npx @sylphlab/mcp-rag-server# Clone and install
git clone https://github.com/SylphxAI/rag-server-mcp.git
cd rag-server-mcp
npm install
# Build
npm run build
# Start (requires ChromaDB + Ollama)
npm startAdd to your MCP client configuration (e.g., Claude Desktop, Cline):
{
"mcpServers": {
"rag-server": {
"command": "npx",
"args": ["@sylphlab/mcp-rag-server"],
"env": {
"CHROMA_URL": "http://localhost:8000",
"OLLAMA_HOST": "http://localhost:11434",
"INDEX_PROJECT_ON_STARTUP": "true"
}
}
}
}Note: With Docker Compose, the server runs in a container. You may need to expose the MCP port or configure network settings for external client access.
Once configured, your AI agent can use RAG tools:
<!-- Index project documents -->
<use_mcp_tool>
<server_name>rag-server</server_name>
<tool_name>indexDocuments</tool_name>
<arguments>{"path": "./docs"}</arguments>
</use_mcp_tool>
<!-- Query for relevant context -->
<use_mcp_tool>
<server_name>rag-server</server_name>
<tool_name>queryDocuments</tool_name>
<arguments>{"query": "how to configure embeddings", "topK": 5}</arguments>
</use_mcp_tool>
<!-- List indexed documents -->
<use_mcp_tool>
<server_name>rag-server</server_name>
<tool_name>listDocuments</tool_name>
</use_mcp_tool>| Tool | Description | Parameters |
|---|---|---|
| indexDocuments | Index file or directory | path, forceReindex? |
| queryDocuments | Retrieve relevant chunks | query, topK?, filter? |
| listDocuments | List all indexed sources | None |
| removeDocument | Remove document by path | sourcePath |
| removeAllDocuments | Clear entire index | None |
indexDocuments
{
path: string; // File or directory path
forceReindex?: boolean; // Re-index if already indexed
}queryDocuments
{
query: string; // Search query
topK?: number; // Number of results (default: 5)
filter?: object; // Metadata filters
}Supported File Types:
- Text:
.txt,.md - Code:
.ts,.js,.py,.java,.go, etc. - Data:
.json,.jsonl,.csv
Configure via environment variables (set in docker-compose.yml or CLI):
| Variable | Default | Description |
|---|---|---|
| CHROMA_URL | http://chromadb:8000 |
ChromaDB service URL |
| OLLAMA_HOST | http://ollama:11434 |
Ollama service URL |
| INDEX_PROJECT_ON_STARTUP | true |
Auto-index on server start |
| GENKIT_ENV | production |
Environment mode |
| LOG_LEVEL | info |
Logging level |
| Variable | Default | Description |
|---|---|---|
| INDEXING_EXCLUDE_PATTERNS | **/node_modules/**,**/.git/** |
Glob patterns to exclude |
Example Custom Config:
# docker-compose.yml
services:
rag-server:
environment:
- INDEX_PROJECT_ON_STARTUP=true
- INDEXING_EXCLUDE_PATTERNS=**/node_modules/**,**/.git/**,**/dist/**
- LOG_LEVEL=debug| Component | Technology | Purpose |
|---|---|---|
| Framework | Google Genkit | RAG orchestration |
| Vector Store | ChromaDB | Persistent embeddings |
| Embeddings | Ollama | Local embedding models |
| Protocol | Model Context Protocol | AI agent integration |
| Language | TypeScript | Type-safe development |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. Document Indexing (Startup or Manual) β
β β’ Scan project directory β
β β’ Chunk documents hierarchically β
β β’ Generate embeddings via Ollama β
β β’ Store vectors in ChromaDB β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 2. Query Processing (AI Agent Request) β
β β’ Receive query from MCP client β
β β’ Generate query embedding β
β β’ Search ChromaDB for similar vectors β
β β’ Return top-K relevant chunks β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 3. Context Enhancement (AI Agent Uses Results) β
β β’ Relevant context injected into prompt β
β β’ LLM generates informed response β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Codebase understanding - Query project architecture
- API documentation - Find relevant API docs
- Code examples - Retrieve similar code patterns
- Dependency info - Search package documentation
- Documentation search - Find relevant docs instantly
- Technical notes - Index personal knowledge base
- Meeting notes - Search past discussions
- Research papers - Index and query papers
- Onboarding - Help new developers understand codebase
- Code review - Find related code for context
- Bug fixing - Search for similar issues
- Feature development - Discover existing patterns
1. Local-First
- All processing happens on your machine
- No data sent to cloud services
- Use your own hardware and models
2. Simplicity
- One-command Docker Compose setup
- Automatic indexing by default
- Sensible defaults for all settings
3. Modularity
- Genkit flows organize RAG logic
- Pluggable embedding models
- Extensible file type support
4. Privacy
- Your documents never leave your machine
- Local embedding generation
- Local vector storage
# Install dependencies
npm install
# Build
npm run build
# Watch mode
npm run watch# Lint code
npm run lint
# Format code
npm run format
# Run tests
npm test
# Test with coverage
npm run test:cov
# Validate all (format + lint + test)
npm run validate# Dev server
npm run docs:dev
# Build docs
npm run docs:build
# Preview docs
npm run docs:previewβ Completed
- MCP server implementation
- ChromaDB integration
- Ollama local embeddings
- Automatic indexing on startup
- Hierarchical markdown chunking
- Docker Compose setup
- 5 core MCP tools
π Planned
- Advanced code file chunking (AST-based)
- PDF file support
- Enhanced query filtering
- Multiple embedding model support
- Performance benchmarks
- Semantic caching
- Re-ranking for better relevance
- Web UI for index management
Contributions are welcome! Please follow these guidelines:
- Open an issue - Discuss changes before implementing
- Fork the repository
- Create a feature branch -
git checkout -b feature/my-feature - Follow coding standards - Run
npm run validate - Write tests - Ensure good coverage
- Submit a pull request
- Follow TypeScript strict mode
- Use ESLint and Prettier (auto-configured)
- Add tests for new features
- Update documentation
- Follow commit conventions
- π Bug Reports
- π¬ Discussions
- π§ Email
- π MCP Documentation
Show Your Support: β Star β’ π Watch β’ π Report bugs β’ π‘ Suggest features β’ π Contribute
MIT Β© Sylphx
Built with:
- Model Context Protocol - AI agent standard
- Google Genkit - RAG framework
- ChromaDB - Vector database
- Ollama - Local LLM runtime
- TypeScript - Type safety
Special thanks to the MCP and Genkit communities β€οΈ
Local. Private. Powerful.
RAG capabilities for AI agents with zero cloud dependencies
sylphx.com β’
@SylphxAI β’
hi@sylphx.com