A fast RAG tool that indexes documents using your choice of embedding provider and stores them in LanceDB for efficient similarity search.
# Create a config
$ quickrag init
# Index documents
$ quickrag index gutenberg/ --output gutenberg.rag
β Parsing documents from gutenberg/... (using recursive-token chunker)
β Detecting embedding dimensions
β Initializing database
β Finding files to index
β Removing deleted files from index
β Preparing for indexing
β Indexing files
β Finalizing
Indexing complete! Processed 622 chunks across 2 files. Removed 1 deleted file.
Added 619 new chunks (3 already existed). Total chunks in database: 619
# Search
$ quickrag query gutenberg.rag "Who is Sherlock Holmes?"- Multiple embedding providers (VoyageAI, OpenAI, Ollama)
- Token-based recursive chunking (default) or character-based chunking
- LanceDB vector storage with persistent
.ragfiles - Idempotent indexing (tracks indexed files, skips unchanged)
- Automatic cleanup of deleted files from index
- UTF-8 sanitization for PDF conversions
- TypeScript & Bun
brew install statico/quickrag/quickrag# macOS (Apple Silicon)
curl -L https://github.com/statico/quickrag/releases/latest/download/quickrag-darwin-arm64 -o /usr/local/bin/quickrag
chmod +x /usr/local/bin/quickrag
# macOS (Intel)
curl -L https://github.com/statico/quickrag/releases/latest/download/quickrag-darwin-x64 -o /usr/local/bin/quickrag
chmod +x /usr/local/bin/quickrag
# Linux (ARM64)
curl -L https://github.com/statico/quickrag/releases/latest/download/quickrag-linux-arm64 -o /usr/local/bin/quickrag
chmod +x /usr/local/bin/quickrag
# Linux (x64)
curl -L https://github.com/statico/quickrag/releases/latest/download/quickrag-linux-x64 -o /usr/local/bin/quickrag
chmod +x /usr/local/bin/quickragNote: macOS binaries are not codesigned. You may need to run xattr -d com.apple.quarantine /usr/local/bin/quickrag to bypass Gatekeeper.
Requires Bun.
git clone https://github.com/statico/quickrag.git
cd quickrag
bun install
bun run dev --helpquickrag initThis creates ~/.config/quickrag/config.yaml:
provider: ollama
model: nomic-embed-text
baseUrl: http://localhost:11434
chunking:
strategy: recursive-token
chunkSize: 500
chunkOverlap: 50
minChunkSize: 50Edit ~/.config/quickrag/config.yaml to set API keys and preferences:
provider: openai
apiKey: sk-your-key-here
model: text-embedding-3-small
chunking:
strategy: recursive-token
chunkSize: 500
chunkOverlap: 50
minChunkSize: 50quickrag index ./documents --output my-docs.ragquickrag query my-docs.rag "What is the main topic?"Configuration Options:
provider: Embedding provider (openai,voyageai, orollama)apiKey: API key (can also use environment variables)model: Model name for the embedding providerbaseUrl: Base URL for Ollama (default:http://localhost:11434)chunking.strategy:recursive-token(default) orsimplechunking.chunkSize: Tokens (forrecursive-token, default: 500) or characters (forsimple, default: 1000)chunking.chunkOverlap: Tokens (forrecursive-token, default: 50) or characters (forsimple, default: 200)chunking.minChunkSize: Minimum chunk size in tokens (default: 50). Chunks smaller than this are filtered out to prevent tiny fragments.
Token-based splitting that respects semantic boundaries. Splits at paragraph breaks, line breaks, sentence endings, then word boundaries. Chunks are sized by estimated tokens (default: 500), aligning with embedding model expectations. Maintains configurable overlap (default: 50 tokens, ~10%).
Character-based chunking for backward compatibility. Chunks are sized by characters (default: 1000) with sentence boundary detection. Overlap is character-based (default: 200).
Benchmarked on test corpus (2 files: sherlock-holmes.txt, frankenstein.txt):
| Metric | Recursive Token | Simple |
|---|---|---|
| Chunks Created | 622 chunks | 2,539 chunks (4.1x more) |
| Indexing Time | ~19 seconds | ~37 seconds |
| Query Quality | β Better semantic matches, more context |
Recommendation: Use recursive-token for production. The indexing time difference is negligible compared to improved retrieval quality.
Most Use Cases:
strategy: recursive-tokenchunkSize: 400-512(tokens) - Research-backed sweet spot for 85-90% recallchunkOverlap: 50-100(tokens, ~10-20%)
Technical Documentation:
strategy: recursive-tokenchunkSize: 500-600(tokens)chunkOverlap: 75-100(tokens)
Narrative Text:
strategy: recursive-tokenchunkSize: 400-500(tokens)chunkOverlap: 50-75(tokens)
Academic Papers:
strategy: recursive-tokenchunkSize: 600-800(tokens)chunkOverlap: 100-150(tokens)
# Basic indexing
quickrag index ./documents --output my-docs.rag
# Override chunking parameters
quickrag index ./documents --chunker recursive-token --chunk-size 500 --chunk-overlap 50 --min-chunk-size 50 --output my-docs.rag
# Use different provider
quickrag index ./documents --provider openai --model text-embedding-3-small --output my-docs.rag
# Clear existing index
quickrag index ./documents --clear --output my-docs.ragNote: QuickRAG automatically detects and removes deleted files from the index. If a file was previously indexed but no longer exists in the source directory, it will be removed from the database during the next indexing run.
quickrag query my-docs.rag "What is the main topic?"quickrag interactive my-docs.ragprovider: voyageai
apiKey: your-voyage-api-key
model: voyage-3provider: openai
apiKey: sk-your-openai-key
model: text-embedding-3-smallprovider: ollama
model: nomic-embed-text
baseUrl: http://localhost:11434.txt- Plain text files.md- Markdown files.markdown- Markdown files
bun install
bun run dev index ./documents --provider ollama --output test.rag
bun run build
bun run typecheck- Bun >= 1.0.0
- TypeScript >= 5.0.0
- For Ollama: A running Ollama instance with an embedding model installed (e.g.,
ollama pull nomic-embed-text)
This is free and unencumbered software released into the public domain.
For more information, see UNLICENSE or visit https://unlicense.org