Skip to content

SylphxAI/rag-server-mcp

RAG Server MCP πŸ“š

Local-first Retrieval Augmented Generation for AI agents - Privacy-focused with automatic indexing

npm version CI Status License

Local models β€’ Automatic indexing β€’ ChromaDB vectors β€’ 5 MCP tools

Quick Start β€’ Installation β€’ Tools

RAG Server MCP

πŸš€ Overview

Enable your AI agents with powerful Retrieval Augmented Generation (RAG) capabilities using local models. This Model Context Protocol (MCP) server automatically indexes your project documents and provides relevant context to enhance LLM responses.

The Problem:

Traditional RAG solutions:
- Cloud-based (privacy concerns) ❌
- Complex setup (multiple services) ❌
- Manual indexing (time-consuming) ❌
- Expensive API costs (per query) ❌

The Solution:

RAG Server MCP:
- Local-first (Ollama + ChromaDB) βœ…
- Docker Compose (one command) βœ…
- Automatic indexing (on startup) βœ…
- Free local models (zero API costs) βœ…

Result: Privacy-focused, zero-cost RAG with automatic context retrieval for your AI agents.


⚑ Key Advantages

Privacy & Control

Feature Cloud RAG RAG Server MCP
Data Privacy ❌ Sent to cloud βœ… 100% local
Model Control ❌ Fixed models βœ… Any Ollama model
Vector Storage ❌ Cloud service βœ… Local ChromaDB
Cost ❌ Pay per query βœ… Free (local)
Customization ⚠️ Limited βœ… Full control

Performance & Efficiency

  • Automatic Indexing - Scans project on startup, no manual work
  • Persistent Vectors - ChromaDB stores embeddings between sessions
  • Hierarchical Chunking - Smart markdown splitting (text + code blocks)
  • Multiple File Types - .txt, .md, code files, .json, .csv
  • Local Embeddings - Ollama nomic-embed-text (no API calls)

πŸ“¦ Installation

Method 1: Docker Compose (Recommended)

Run the server and all dependencies (ChromaDB, Ollama) in isolated containers.

Prerequisites:

  • Docker Desktop or Docker Engine
  • Ports 8000 (ChromaDB) and 11434 (Ollama) available

Setup:

# Clone repository
git clone https://github.com/SylphxAI/rag-server-mcp.git
cd rag-server-mcp

# Start all services
docker-compose up -d --build

# Pull embedding model (first run only)
docker exec ollama ollama pull nomic-embed-text

Method 2: npx (Requires External Services)

If you already have ChromaDB and Ollama running:

# Set environment variables
export CHROMA_URL=http://localhost:8000
export OLLAMA_HOST=http://localhost:11434

# Run via npx
npx @sylphlab/mcp-rag-server

Method 3: Local Development

# Clone and install
git clone https://github.com/SylphxAI/rag-server-mcp.git
cd rag-server-mcp
npm install

# Build
npm run build

# Start (requires ChromaDB + Ollama)
npm start

πŸš€ Quick Start

MCP Client Configuration

Add to your MCP client configuration (e.g., Claude Desktop, Cline):

{
  "mcpServers": {
    "rag-server": {
      "command": "npx",
      "args": ["@sylphlab/mcp-rag-server"],
      "env": {
        "CHROMA_URL": "http://localhost:8000",
        "OLLAMA_HOST": "http://localhost:11434",
        "INDEX_PROJECT_ON_STARTUP": "true"
      }
    }
  }
}

Note: With Docker Compose, the server runs in a container. You may need to expose the MCP port or configure network settings for external client access.

Basic Usage

Once configured, your AI agent can use RAG tools:

<!-- Index project documents -->
<use_mcp_tool>
  <server_name>rag-server</server_name>
  <tool_name>indexDocuments</tool_name>
  <arguments>{"path": "./docs"}</arguments>
</use_mcp_tool>

<!-- Query for relevant context -->
<use_mcp_tool>
  <server_name>rag-server</server_name>
  <tool_name>queryDocuments</tool_name>
  <arguments>{"query": "how to configure embeddings", "topK": 5}</arguments>
</use_mcp_tool>

<!-- List indexed documents -->
<use_mcp_tool>
  <server_name>rag-server</server_name>
  <tool_name>listDocuments</tool_name>
</use_mcp_tool>

πŸ› οΈ MCP Tools

Document Management

Tool Description Parameters
indexDocuments Index file or directory path, forceReindex?
queryDocuments Retrieve relevant chunks query, topK?, filter?
listDocuments List all indexed sources None
removeDocument Remove document by path sourcePath
removeAllDocuments Clear entire index None

Tool Details

indexDocuments

{
  path: string;          // File or directory path
  forceReindex?: boolean; // Re-index if already indexed
}

queryDocuments

{
  query: string;    // Search query
  topK?: number;    // Number of results (default: 5)
  filter?: object;  // Metadata filters
}

Supported File Types:

  • Text: .txt, .md
  • Code: .ts, .js, .py, .java, .go, etc.
  • Data: .json, .jsonl, .csv

βš™οΈ Configuration

Configure via environment variables (set in docker-compose.yml or CLI):

Core Settings

Variable Default Description
CHROMA_URL http://chromadb:8000 ChromaDB service URL
OLLAMA_HOST http://ollama:11434 Ollama service URL
INDEX_PROJECT_ON_STARTUP true Auto-index on server start
GENKIT_ENV production Environment mode
LOG_LEVEL info Logging level

Indexing Configuration

Variable Default Description
INDEXING_EXCLUDE_PATTERNS **/node_modules/**,**/.git/** Glob patterns to exclude

Example Custom Config:

# docker-compose.yml
services:
  rag-server:
    environment:
      - INDEX_PROJECT_ON_STARTUP=true
      - INDEXING_EXCLUDE_PATTERNS=**/node_modules/**,**/.git/**,**/dist/**
      - LOG_LEVEL=debug

πŸ—οΈ Architecture

Technology Stack

Component Technology Purpose
Framework Google Genkit RAG orchestration
Vector Store ChromaDB Persistent embeddings
Embeddings Ollama Local embedding models
Protocol Model Context Protocol AI agent integration
Language TypeScript Type-safe development

How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 1. Document Indexing (Startup or Manual)               β”‚
β”‚    β€’ Scan project directory                            β”‚
β”‚    β€’ Chunk documents hierarchically                    β”‚
β”‚    β€’ Generate embeddings via Ollama                    β”‚
β”‚    β€’ Store vectors in ChromaDB                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 2. Query Processing (AI Agent Request)                 β”‚
β”‚    β€’ Receive query from MCP client                     β”‚
β”‚    β€’ Generate query embedding                          β”‚
β”‚    β€’ Search ChromaDB for similar vectors              β”‚
β”‚    β€’ Return top-K relevant chunks                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 3. Context Enhancement (AI Agent Uses Results)         β”‚
β”‚    β€’ Relevant context injected into prompt             β”‚
β”‚    β€’ LLM generates informed response                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🎯 Use Cases

AI Code Assistants

  • Codebase understanding - Query project architecture
  • API documentation - Find relevant API docs
  • Code examples - Retrieve similar code patterns
  • Dependency info - Search package documentation

Knowledge Management

  • Documentation search - Find relevant docs instantly
  • Technical notes - Index personal knowledge base
  • Meeting notes - Search past discussions
  • Research papers - Index and query papers

Development Workflows

  • Onboarding - Help new developers understand codebase
  • Code review - Find related code for context
  • Bug fixing - Search for similar issues
  • Feature development - Discover existing patterns

πŸ“Š Design Philosophy

Core Principles

1. Local-First

  • All processing happens on your machine
  • No data sent to cloud services
  • Use your own hardware and models

2. Simplicity

  • One-command Docker Compose setup
  • Automatic indexing by default
  • Sensible defaults for all settings

3. Modularity

  • Genkit flows organize RAG logic
  • Pluggable embedding models
  • Extensible file type support

4. Privacy

  • Your documents never leave your machine
  • Local embedding generation
  • Local vector storage

πŸ”§ Development

Setup

# Install dependencies
npm install

# Build
npm run build

# Watch mode
npm run watch

Quality Checks

# Lint code
npm run lint

# Format code
npm run format

# Run tests
npm test

# Test with coverage
npm run test:cov

# Validate all (format + lint + test)
npm run validate

Documentation

# Dev server
npm run docs:dev

# Build docs
npm run docs:build

# Preview docs
npm run docs:preview

πŸ—ΊοΈ Roadmap

βœ… Completed

  • MCP server implementation
  • ChromaDB integration
  • Ollama local embeddings
  • Automatic indexing on startup
  • Hierarchical markdown chunking
  • Docker Compose setup
  • 5 core MCP tools

πŸš€ Planned

  • Advanced code file chunking (AST-based)
  • PDF file support
  • Enhanced query filtering
  • Multiple embedding model support
  • Performance benchmarks
  • Semantic caching
  • Re-ranking for better relevance
  • Web UI for index management

🀝 Contributing

Contributions are welcome! Please follow these guidelines:

  1. Open an issue - Discuss changes before implementing
  2. Fork the repository
  3. Create a feature branch - git checkout -b feature/my-feature
  4. Follow coding standards - Run npm run validate
  5. Write tests - Ensure good coverage
  6. Submit a pull request

Development Guidelines

  • Follow TypeScript strict mode
  • Use ESLint and Prettier (auto-configured)
  • Add tests for new features
  • Update documentation
  • Follow commit conventions

🀝 Support

npm GitHub Issues

Show Your Support: ⭐ Star β€’ πŸ‘€ Watch β€’ πŸ› Report bugs β€’ πŸ’‘ Suggest features β€’ πŸ”€ Contribute


πŸ“„ License

MIT Β© Sylphx


πŸ™ Credits

Built with:

Special thanks to the MCP and Genkit communities ❀️


Local. Private. Powerful.
RAG capabilities for AI agents with zero cloud dependencies

sylphx.com β€’ @SylphxAI β€’ hi@sylphx.com

About

🧠 RAG Server MCP - local vector database with semantic search powered by ChromaDB and Ollama

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published