A complete Ruby implementation of Retrieval-Augmented Generation (RAG) pipeline using native Ruby ML/NLP gems.
Ragnar provides a production-ready RAG pipeline for Ruby applications, integrating:
- red-candle: LLM inference, embeddings, and reranking
- lancelot: Vector database with Lance columnar storage
- clusterkit: UMAP dimensionality reduction and clustering
- baran: Text chunking and splitting
graph TB
subgraph "Indexing Pipeline"
A[Documents] --> B[Chunker<br/>baran]
B --> C[Embedder<br/>red-candle]
C --> D[Vector DB<br/>lancelot]
D --> E[UMAP Training<br/>annembed]
E --> F[Reduced Embeddings]
end
subgraph "Query Pipeline"
LLMCache[LLM Manager<br/>Cached Instance]
Q[User Query] --> QR[Query Rewriter<br/>red-candle LLM]
QR --> QE[Query Embedder<br/>red-candle]
QE --> VS[Vector Search<br/>lancelot]
VS --> RRF[RRF Fusion]
RRF --> RR[Reranker<br/>red-candle]
RR --> RP[Context Repacker<br/>Deduplication & Organization]
RP --> LLM[Response Generation<br/>red-candle LLM]
LLM --> R[Answer]
LLMCache -.-> QR
LLMCache -.-> LLM
end
D -.-> VS
F -.-> VS
sequenceDiagram
participant User
participant CLI
participant Indexer
participant Chunker
participant Embedder
participant Database
User->>CLI: ragnar index ./documents
CLI->>Indexer: index_path(path)
loop For each file
Indexer->>Indexer: Read file
Indexer->>Chunker: split_text(content)
Chunker-->>Indexer: chunks[]
loop For each chunk
Indexer->>Embedder: embed(text)
Embedder-->>Indexer: embedding[768]
Indexer->>Database: add_document(chunk, embedding)
end
end
Database-->>CLI: stats
CLI-->>User: Indexed N documents
flowchart LR
A[High-Dim Embeddings<br/>768D] --> B[UMAP Training]
B --> C[Model]
C --> D[Low-Dim Embeddings<br/>2-50D]
B --> E[Parameters]
E --> F[n_neighbors]
E --> G[n_components]
E --> H[min_dist]
D --> I[Benefits]
I --> J[Faster Search]
I --> K[Less Memory]
I --> L[Visualization]
flowchart TB
Q[User Query] --> QA[Query Analysis<br/>w/ Cached LLM]
QA --> CI[Clarified Intent]
QA --> SQ[Sub-queries]
QA --> KT[Key Terms]
SQ --> EMB[Embed Each Query]
EMB --> VS[Vector Search]
VS --> RRF[RRF Fusion]
RRF --> RANK[Reranking]
RANK --> TOP[Top-K Documents]
TOP --> CTX[Context Preparation]
CTX --> REPACK[Context Repacking<br/>Deduplication<br/>Summarization<br/>Organization]
REPACK --> GEN[LLM Generation<br/>w/ Same Cached LLM]
CI --> GEN
GEN --> ANS[Final Answer]
gem install ragnar-cli
git clone https://github.com/scientist-labs/ragnar-cli.git
cd ragnar-cli
bundle install
gem build ragnar.gemspec
gem install ./ragnar-*.gem
# Index a directory of text files
ragnar index ./documents
# Index with custom settings
ragnar index ./documents \
--chunk-size 1000 \
--chunk-overlap 100
Reduce embedding dimensions for faster search:
# Train UMAP model (auto-adjusts parameters based on data)
ragnar train-umap \
--n-components 50 \
--n-neighbors 15
# Apply to all embeddings
ragnar apply-umap
Perform topic modeling to discover themes in your indexed documents:
# Basic topic extraction (requires minimum 20-30 indexed documents)
ragnar topics
# Adjust clustering parameters for smaller datasets
ragnar topics --min-cluster-size 3 # Allow smaller topics
ragnar topics --min-samples 2 # Less strict density requirements
# Export visualizations
ragnar topics --export html # Interactive D3.js visualization
ragnar topics --export json # JSON data for further processing
# Verbose mode for debugging
ragnar topics --verbose
Note: Topic modeling requires sufficient documents to identify meaningful patterns. For best results:
- Index at least 20-30 documents (ideally 50+)
- Ensure documents cover diverse topics
- Documents should be substantial (50+ words each)
The HTML export includes:
- Topic Bubbles: Interactive bubble chart showing topic sizes and coherence
- Embedding Scatter Plot: Visualization of all documents in embedding space, colored by cluster
# Basic query
ragnar query "What is the main purpose of this project?"
# Verbose mode shows all intermediate processing steps
ragnar query "How does the chunking process work?" --verbose
# Or use short form
ragnar query "How does the chunking process work?" -v
# JSON output for programmatic use
ragnar query "Explain the embedding model" --json
# Adjust number of retrieved documents
ragnar query "What are the key features?" --top-k 5
# Combine options for detailed analysis
ragnar query "Compare Ruby with Python" -v --top-k 5
When using --verbose
or -v
, you'll see:
- Query Analysis: Original query, clarified intent, sub-queries, and key terms
- Document Retrieval: Each sub-query's embedding and search results
- RRF Fusion: How multiple search results are combined
- Reranking: Top documents after relevance scoring
- Context Repacking: How retrieved chunks are organized and compressed
- Response Generation: The final LLM prompt and response
- Final Results: Confidence score and source attribution
ragnar stats
- Query Rewriting: Clarifies intent and generates sub-queries
- Multi-Query Search: Searches with multiple query variations
- RRF Fusion: Combines results using Reciprocal Rank Fusion
- Reranking: Uses cross-encoder for precise relevance scoring
- Context Repacking: Deduplicates and organizes retrieved chunks for optimal LLM consumption
- LLM Caching: Single LLM instance shared between query rewriting and response generation
- Contextual Response: Generates answers with LLM based on repacked context
- High-dimensional embeddings (768D) for semantic accuracy
- UMAP reduction to lower dimensions (2-50D) for efficiency
- Automatic parameter adjustment based on dataset size
- Batch processing for large document collections
- Lance columnar format for efficient storage
- Vector similarity search with configurable metrics
- Metadata tracking for source attribution
- Incremental indexing support
Ragnar uses a flexible YAML-based configuration system that allows you to customize all aspects of the RAG pipeline.
Ragnar looks for configuration files in the following order:
.ragnar.yml
in the current directory.ragnarrc.yml
in the current directoryragnar.yml
in the current directory.ragnar.yml
in your home directory- Built-in defaults
Generate a configuration file:
# Create local config (in current directory)
ragnar init-config
# Create global config (in home directory)
ragnar init-config --global
# Force overwrite existing config
ragnar init-config --force
Example .ragnar.yml
file:
# Storage paths (all support ~ expansion)
storage:
database_path: "~/.cache/ragnar/database" # Vector database location
models_dir: "~/.cache/ragnar/models" # Downloaded model files
history_file: "~/.cache/ragnar/history" # Interactive mode history
# Embedding configuration
embeddings:
model: jinaai/jina-embeddings-v2-base-en # Embedding model to use
chunk_size: 512 # Tokens per chunk
chunk_overlap: 50 # Token overlap between chunks
# UMAP dimensionality reduction
umap:
reduced_dimensions: 64 # Target dimensions (2-100)
n_neighbors: 15 # UMAP neighbors parameter
min_dist: 0.1 # UMAP minimum distance
model_filename: umap_model.bin # Saved model filename
# LLM configuration
llm:
default_model: TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF
default_gguf_file: tinyllama-1.1b-chat-v1.0.q4_k_m.gguf
# Query processing
query:
top_k: 3 # Number of documents to retrieve
enable_query_rewriting: true # Use LLM to improve queries
# Interactive mode
interactive:
prompt: 'ragnar> ' # Command prompt
quiet_mode: true # Suppress verbose output
# Output settings
output:
show_progress: true # Show progress bars during indexing
Check current configuration:
# Show all configuration settings
ragnar config
# Show LLM model information
ragnar model
In interactive mode:
ragnar interactive
ragnar> config # Show configuration
ragnar> model # Show model details
Configuration values can be overridden with environment variables:
XDG_CACHE_HOME
- Override default cache directory (~/.cache)
Embedding Models (via red-candle):
jinaai/jina-embeddings-v2-base-en
(default, 768 dimensions)BAAI/bge-base-en-v1.5
sentence-transformers/all-MiniLM-L6-v2
LLM Models (via red-candle, GGUF format):
TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF
(default, fast)TheBloke/Qwen2.5-1.5B-Instruct-GGUF
TheBloke/phi-2-GGUF
Reranker Models (via red-candle):
BAAI/bge-reranker-base
cross-encoder/ms-marco-MiniLM-L-6-v2
require 'ragnar'
# Initialize components
indexer = Ragnar::Indexer.new(
db_path: "my_database",
chunk_size: 1000
)
# Index documents
stats = indexer.index_path("./documents")
# Query the system
processor = Ragnar::QueryProcessor.new(db_path: "my_database")
result = processor.query(
"What is Ruby?",
top_k: 5,
verbose: true
)
puts result[:answer]
puts "Confidence: #{result[:confidence]}%"
Extract topics from your indexed documents:
# Example with sufficient documents for clustering (minimum ~20-30 needed)
documents = [
# Finance cluster
"Federal Reserve raises interest rates to combat inflation",
"Stock markets rally on positive earnings reports",
"Cryptocurrency markets show increased volatility",
"Corporate bonds yield higher returns this quarter",
"Central banks coordinate global monetary policy",
# Technology cluster
"AI breakthrough in natural language processing announced",
"Machine learning transforms healthcare diagnostics",
"Cloud computing adoption accelerates in enterprises",
"Quantum computing reaches new error correction milestone",
"Open source frameworks receive major updates",
# Healthcare cluster
"Clinical trials show promise for cancer immunotherapy",
"Telemedicine reshapes patient care delivery models",
"Gene editing advances treatment for rare diseases",
"Mental health awareness campaigns gain momentum",
"mRNA vaccine technology platform expands",
# Add more documents for better clustering...
# See TOPIC_MODELING_EXAMPLE.md for complete example
]
# Extract topics using Topical
database = Ragnar::Database.new("ragnar_database")
docs = database.get_all_documents_with_embeddings
embeddings = docs.map { |d| d[:embedding] }
texts = docs.map { |d| d[:chunk_text] }
topics = Topical.extract(
embeddings: embeddings,
documents: texts,
min_topic_size: 3 # Minimum docs per topic
)
topics.each do |topic|
puts "Topic: #{topic.label}"
puts "Terms: #{topic.terms.join(', ')}"
puts "Size: #{topic.size} documents\n\n"
end
For a complete working example with 40+ documents, see TOPIC_MODELING_EXAMPLE.md.
chunker = Ragnar::Chunker.new(
chunk_size: 1000,
chunk_overlap: 200,
separators: ["\n\n", "\n", ". ", " "]
)
chunks = chunker.chunk_text(document_text)
# For small datasets (<100 documents)
processor = Ragnar::UmapProcessor.new
processor.train(
n_components: 10, # Fewer components
n_neighbors: 5, # Fewer neighbors
min_dist: 0.05 # Tighter clusters
)
# For large datasets (>10,000 documents)
processor.train(
n_components: 50, # More components
n_neighbors: 30, # More neighbors
min_dist: 0.1 # Standard distance
)
- Indexing: ~100MB per 1000 documents (768D embeddings)
- UMAP Training: ~80MB for 10,000 vectors
- Query Processing: ~50MB overhead for models (reduced with LLM caching)
- LLM Caching: Single model instance (~500MB-2GB depending on model size)
- Indexing: ~10 documents/second (including embedding)
- UMAP Training: 30-60 seconds for 10,000 vectors
- Query Processing: 1-3 seconds per query (faster with cached LLM)
- Vector Search: <100ms for 100,000 vectors
- Context Repacking: <50ms for typical document sets
- LLM Loading: 2-5 seconds (only on first query with caching)
- Use UMAP for datasets >1000 documents
- Batch index large document collections
- Cache embeddings for repeated queries
- Adjust chunk size based on document type:
- Technical docs: 500-1000 tokens
- Narrative text: 200-500 tokens
- Q&A content: 100-300 tokens
UMAP fails with "index out of bounds"
- Cause: Too few samples for the requested parameters
- Solution: System auto-adjusts, but you can manually set lower n_neighbors
Slow indexing performance
- Try smaller chunk sizes
- Use batch processing
- Consider using a faster embedding model
Poor query results
- Index more documents (RAG works best with 100+ documents)
- Adjust chunk size and overlap
- Try different embedding models
# Install dependencies
bundle install
# Run tests
bundle exec rspec
# Build gem
gem build ragnar.gemspec
Component | Purpose | Key Methods |
---|---|---|
Chunker | Split text into semantic chunks | chunk_text() |
Embedder | Generate vector embeddings | embed_text() , embed_batch() |
Database | Store and search vectors | add_document() , search_similar() |
LLMManager | Cache and manage LLM instances | get_llm() , default_llm() |
ContextRepacker | Optimize retrieved context | repack() , repack_with_summary() |
QueryRewriter | Analyze and expand queries | rewrite() |
QueryProcessor | Orchestrate query pipeline | query() |
UmapProcessor | Reduce embedding dimensions | train() , apply() |
- Documents → Chunker → Text chunks
- Text chunks → Embedder → Embeddings (768D)
- Embeddings → Database → Stored vectors
- Stored vectors → UMAP → Reduced vectors (2-50D)
- Query → Rewriter (w/ cached LLM) → Sub-queries
- Sub-queries → Embedder → Query vectors
- Query vectors → Database → Similar documents
- Documents → Reranker → Top results
- Top results → Context Repacker → Optimized context
- Optimized context → LLM (same cached instance) → Final answer
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
MIT License - see LICENSE file for details
This project integrates several excellent Ruby gems:
- red-candle - Ruby ML/LLM toolkit
- lancelot - Lance database bindings
- clusterkit - UMAP and clustering implementation
- parsekit - Content extraction
- baran - Text splitting utilities