Retrieval-Augmented Generation Generalized Architecture for Enterprise
A multipurpose local RAG system for processing and analyzing documents (tenders, CVs, reports) with semantic search, hybrid retrieval, and NLI-based compliance scoring.
If you want to install it locally or learn principles, please read/follow these smoke tests first.
- Overview
- Key Features
- Architecture
- Installation
- Usage
- Core Concepts
- Extension Points
- Testing
- Development
- Performance Considerations
- Troubleshooting
- Contributing
- License
- Authors
- Acknowledgments
RAGGAE is a production-ready, modular Retrieval-Augmented Generation (RAG) system designed to run entirely on local infrastructure. It combines:
- Dense embeddings (bi-encoders like E5, GTE, BGE)
- Sparse retrieval (BM25 for exact term matching)
- Hybrid fusion (linear combination of dense and sparse scores)
- Cross-encoder re-ranking (optional, for precision at the top)
- Natural Language Inference (NLI) for compliance checking via local LLMs (Ollama)
- Traceability with provenance tracking (document, page, block, bounding box)
The system is designed with a document-agnostic semantic core and pluggable adapters for different document types (PDFs, DOCX, ODT, TXT, MD), making it suitable for:
- Tender analysis (requirements extraction, compliance scoring)
- CV/Resume processing (skills matching, experience extraction)
- Technical reports (semantic search, section extraction)
- Multi-document batch processing
β¨ Fully Local: No external APIs requiredβruns on CPU or GPU (8GB VRAM sufficient)
π Hybrid Retrieval: Dense (FAISS) + Sparse (BM25) with configurable fusion
π Multi-Format Support: PDF, DOCX, ODT, TXT, MD with layout-aware parsing
π― NLI Compliance: Automatic requirement satisfaction checking via Ollama (Mistral, Llama3)
π Fit Scoring: Weighted requirement verdicts with exportable audit trails (JSON, CSV)
π Web UI: Modern, responsive interface for upload, index, search, and scoring
π RESTful API: FastAPI backend for integration with existing workflows
π§ͺ Fully Tested: Comprehensive test suite with mocked NLI for CI/CD
π Multilingual: FR/EN support with E5 embeddings; extensible to other languages
π¦ Extensible: Pluggable document adapters, embedding providers, and scoring strategies
graph TB
subgraph "Document Input"
DOC[Documents: PDF, DOCX, TXT, ODT, MD]
end
subgraph "Parsing Layer"
PDF[PDF Parser<br/>PyMuPDF]
TXT[Text Loaders<br/>DOCX/ODT/TXT/MD]
DOC --> PDF
DOC --> TXT
end
subgraph "Semantic Core"
EMBED[Embedding Provider<br/>STBiEncoder<br/>multilingual-e5-small]
FAISS[FAISS Index<br/>Inner Product<br/>Cosine Similarity]
BM25[BM25Okapi<br/>Sparse Retrieval]
HYBRID["Hybrid Retriever<br/>Ξ±Β·dense + (1-Ξ±)Β·sparse"]
PDF --> EMBED
TXT --> EMBED
EMBED --> FAISS
EMBED --> BM25
FAISS --> HYBRID
BM25 --> HYBRID
end
subgraph "Intelligence Layer"
NLI[NLI Client<br/>Ollama: Mistral/Llama3]
SCORE[Fit Scorer<br/>Weighted Verdicts]
HYBRID --> NLI
NLI --> SCORE
end
subgraph "Interface Layer"
CLI[CLI Tools<br/>index_doc, search, quickscore]
API[FastAPI<br/>RESTful Endpoints]
WEB[Web UI<br/>HTML5 + Vanilla JS]
HYBRID --> CLI
SCORE --> CLI
HYBRID --> API
SCORE --> API
API --> WEB
end
subgraph "Output"
RESULTS[Search Results<br/>Scored Hits<br/>Provenance]
AUDIT[Audit Trail<br/>JSON/CSV Export]
CLI --> RESULTS
API --> RESULTS
SCORE --> AUDIT
end
style EMBED fill:#4CAF50
style HYBRID fill:#FF9802
style NLI fill:#2196F3
style SCORE fill:#9C27B0
graph LR
subgraph "Core Modules"
EM[embeddings.py<br/>EmbeddingProvider<br/>STBiEncoder]
IDX[index_faiss.py<br/>FaissIndex<br/>Record]
RET[retriever.py<br/>HybridRetriever<br/>Hit]
SCR[scoring.py<br/>FitScorer<br/>RequirementVerdict]
NLI[nli_ollama.py<br/>NLIClient<br/>NLIResult]
end
subgraph "IO Modules"
PDF[pdf.py<br/>PDFBlock<br/>extract_blocks]
TXT[textloaders.py<br/>TextBlock<br/>load_*]
end
subgraph "CLI"
IDXCLI[index_doc.py]
SEARCH[search.py]
QUICK[quickscore.py]
APP[demo_app.py<br/>FastAPI]
end
subgraph "Web"
HTML[index.html]
JS[script.js]
CSS[styles.css]
end
PDF --> IDXCLI
TXT --> IDXCLI
EM --> IDXCLI
IDX --> IDXCLI
EM --> SEARCH
IDX --> SEARCH
EM --> QUICK
IDX --> QUICK
NLI --> QUICK
SCR --> QUICK
EM --> APP
IDX --> APP
RET --> APP
NLI --> APP
SCR --> APP
PDF --> APP
TXT --> APP
APP --> HTML
HTML --> JS
HTML --> CSS
EM -.->|uses| IDX
RET -.->|uses| EM
RET -.->|uses| IDX
sequenceDiagram
participant U as User
participant W as Web UI
participant A as FastAPI
participant P as Parser
participant E as Embeddings
participant F as FAISS
participant B as BM25
participant H as Hybrid
participant N as NLI/Ollama
participant S as Scorer
Note over U,S: Phase 1: Indexing
U->>W: Upload documents
W->>A: POST /upload (files)
A->>A: Save to uploads/
A-->>W: Return key
U->>W: Submit index request
W->>A: POST /index {key, index_path}
A->>P: Parse documents
P-->>A: List[Block]
A->>E: Embed texts
E-->>A: Embeddings (384-dim)
A->>F: Build FAISS index
A->>B: Build BM25 index
F-->>A: Index saved
A-->>W: {indexed: N, files: [...]}
Note over U,S: Phase 2: Search
U->>W: Enter query
W->>A: POST /search {query, k}
A->>E: Embed query
E-->>A: Query vector
A->>F: Dense search (top-K)
F-->>A: Dense hits
A->>B: Sparse scores
B-->>A: BM25 scores
A->>H: Fuse scores (Ξ±Β·dense + (1-Ξ±)Β·sparse)
H-->>A: Ranked hits
A-->>W: {hits: [{score, page, snippet}]}
W-->>U: Display results
Note over U,S: Phase 3: Quickscore
U->>W: Enter requirements
W->>A: POST /quickscore {requirements, topk}
A->>E: Embed each requirement
E-->>A: Requirement vectors
loop For each requirement
A->>F: Search top-K clauses
F-->>A: Candidate clauses
loop For each clause
A->>N: NLI check (clause, req)
N-->>A: {label, rationale}
break If label is Yes
Note over A,N: stop checking further clauses
end
end
end
A->>S: Compute fit score
S-->>A: Weighted score (0-100)
A-->>W: {fit_score, verdicts[]}
W-->>U: Display verdicts + audit trail
RAGGAE/
βββ core/ # Semantic core modules
β βββ embeddings.py # Embedding providers (E5, GTE, etc.)
β βββ index_faiss.py # FAISS vector index + metadata
β βββ retriever.py # Hybrid retrieval (dense + sparse)
β βββ scoring.py # Fit scoring from NLI verdicts
β βββ nli_ollama.py # Local NLI via Ollama
βββ io/ # Document parsers
β βββ pdf.py # PDF parsing (PyMuPDF)
β βββ tables.py # Table extraction (future)
β βββ textloaders.py # DOCX, ODT, TXT, MD loaders
βββ adapters/ # Domain-specific adapters (future)
β βββ tenders.py # Tender-specific logic
β βββ cv.py # CV/resume parsing
β βββ reports.py # Technical report adapters
βββ cli/ # Command-line tools
β βββ index_doc.py # Index PDFs into FAISS
β βββ search.py # Semantic search CLI
β βββ quickscore.py # NLI-based scoring CLI
β βββ demo_app.py # FastAPI web application
βββ web/ # Frontend UI
β βββ index.html # Single-page app
β βββ script.js # Vanilla JS (no framework)
β βββ styles.css # Modern dark/light theme
βββ tests/ # Test suite
β βββ conftest.py # Pytest fixtures
β βββ test_core.py # Core module tests
β βββ test_core_embeddings.py # Embedding tests
β βββ test_core_index_retriever.py
β βββ test_scoring.py
β βββ test_nli_mock.py # Mocked NLI tests
βββ data/ # Data files
β βββ labels/ # Few-shot seeds (future)
βββ uploads/ # Upload storage (auto-created)
βββ examples/ # Example documents (optional)
βββ index.md # Original design document
βββ README.md # This file
βββ LICENSE # MIT License
βββ requirements.txt # Python dependencies (if using pip)
- Python 3.12+ (tested on 3.12)
- 8GB RAM minimum (16GB recommended)
- GPU with 8GB VRAM (optional, but recommended for faster embeddings)
- Ollama (for NLI/compliance checks): ollama.com
# Create environment
mamba env create -f env-adservio-raggae.yml
mamba activate adservio-raggae
# Or create manually
mamba create -n adservio-raggae -c conda-forge -c pytorch -c nvidia \
python=3.12 \
pytorch pytorch-cuda=12.1 \
faiss-cpu sentence-transformers \
pymupdf pypdf python-docx odfpy \
fastapi uvicorn pydantic \
numpy scipy scikit-learn tqdm rich \
pytest
# Install BM25 and Ollama client via pip
pip install rank-bm25 ollamaEnvironment file (env-adservio-raggae.yml):
name: adservio-raggae
channels:
- pytorch
- nvidia
- conda-forge
dependencies:
- python=3.12
# Core ML stack
- pytorch>=2.4
- pytorch-cuda=12.1
- torchvision
- torchaudio
# RAG / retrieval
- faiss-cpu
- sentence-transformers
- numpy
- scipy
- scikit-learn
- tqdm
# PDF / text parsing
- pymupdf
- pypdf
- python-docx
- odfpy
# Web API
- fastapi
- uvicorn
- pydantic
# Testing
- pytest
# Utils
- rich
- pip
- pip:
- rank-bm25
- ollamapython3.12 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install --upgrade pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install faiss-cpu sentence-transformers
pip install pymupdf pypdf python-docx odfpy
pip install fastapi uvicorn pydantic
pip install numpy scipy scikit-learn tqdm rich
pip install rank-bm25 ollama
pip install pytestIf you have a CUDA-capable GPU:
# Check CUDA availability
python -c "import torch; print('CUDA:', torch.cuda.is_available(), 'Device:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None')"
# If CUDA is False, reinstall PyTorch with CUDA support
mamba install -c pytorch -c nvidia pytorch=2.5.* pytorch-cuda=12.1 torchvision torchaudio
# For FAISS GPU acceleration (optional, requires faiss-gpu)
mamba install -c pytorch faiss-gpuCore:
sentence-transformersβ Embedding models (E5, GTE, BGE)faiss-cpu/faiss-gpuβ Vector similarity searchrank-bm25β Sparse retrieval (BM25)ollamaβ Local LLM client (Mistral, Llama3)
Parsing:
pymupdf(fitz) β PDF parsing with layoutpypdfβ Fallback PDF readerpython-docxβ DOCX parsingodfpyβ ODT parsing
Web:
fastapiβ API frameworkuvicornβ ASGI serverpydanticβ Data validation
Testing:
pytestβ Test framework
python -m cli.index_doc \
--pdf /path/to/tender.pdf \
--out ./tender.idx \
--model intfloat/multilingual-e5-small \
--e5Output:
Indexed 342 chunks β ./tender.idx.faiss + ./tender.idx.jsonl
intfloat/multilingual-e5-small [cuda] dim=384 (e5)
Supported flags:
--pdfβ Path to PDF document--outβ Output index prefix (creates.faissand.jsonlfiles)--modelβ HuggingFace model ID (default:intfloat/multilingual-e5-small)--e5β Use E5-style prefixes (passage:/query:)
python -m cli.search \
--index ./tender.idx \
--model intfloat/multilingual-e5-small \
--e5 \
--query "Plateforme MLOps avec MLflow sur Kubernetes" \
--k 10Output:
Top-10 for: 'Plateforme MLOps avec MLflow sur Kubernetes'
β’ 0.8423 (p.3, b12) La plateforme MLOps repose sur MLflow dΓ©ployΓ© sur un cluster Kubernetesβ¦
β’ 0.7891 (p.5, b23) L'orchestration des workflows ML utilise Argo Workflows sur K8sβ¦
β’ 0.7654 (p.8, b45) Monitoring des modΓ¨les via Prometheus et Grafana sur Kubernetesβ¦
...
python -m cli.quickscore \
--index ./tender.idx \
--model intfloat/multilingual-e5-small \
--e5 \
--req "Provider must be ISO 27001 certified" \
--req "Platform uses MLflow for MLOps" \
--req "Deployments on Kubernetes with GitOps" \
--topk 5Output:
Fit score: 83.3/100
- Provider must be ISO 27001 certified: Yes
- Platform uses MLflow for MLOps: Yes
- Deployments on Kubernetes with GitOps: Partial
Prerequisites: Ollama must be running with a model (e.g., mistral)
# Start Ollama daemon (if not running)
ollama serve
# Pull model
ollama pull mistral:latest
# Or use Llama3
ollama pull llama3:8buvicorn cli.demo_app:app --host 0.0.0.0 --port 8000 --reloadOpen http://localhost:8000 in your browser.
Features:
- Index Tab: Upload documents (PDF, DOCX, TXT, ODT, MD, or ZIP), configure indexing parameters
- Search Tab: Semantic search with provenance (file, page, block, score)
- Quickscore Tab: NLI-based compliance checking with audit trail export (JSON/CSV)
Keyboard shortcuts:
Cmd/Ctrl + Kβ Focus search inputEscβ Clear current form
Base URL: http://localhost:8000
curl http://localhost:8000/healthResponse:
{
"ok": true,
"service": "raggae",
"version": "0.1.2"
}Single file or ZIP:
curl -F "file=@/path/to/tender.pdf" http://localhost:8000/uploadResponse:
{
"ok": true,
"type": "pdf",
"key": "20251031-143022/tender.pdf",
"size": 2458123
}Multiple files:
curl -F "files=@tender1.pdf" -F "files=@tender2.docx" http://localhost:8000/upload-multiResponse:
{
"ok": true,
"key": "20251031-143022",
"files": ["20251031-143022/tender1.pdf", "20251031-143022/tender2.docx"]
}curl -X POST http://localhost:8000/index \
-H "Content-Type: application/json" \
-d '{
"key": "20251031-143022",
"index_path": "./tender.idx",
"model": "intfloat/multilingual-e5-small",
"e5": true,
"min_chars": 40,
"extensions": ["pdf", "docx", "txt"]
}'Response:
{
"indexed": 342,
"files": ["tender1.pdf", "tender2.docx"],
"index_path": "./tender.idx",
"encoder": "intfloat/multilingual-e5-small [cuda] dim=384 (e5)"
}curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{
"index_path": "./tender.idx",
"model": "intfloat/multilingual-e5-small",
"e5": true,
"query": "MLflow sur Kubernetes ISO 27001",
"k": 5
}' | jqResponse:
{
"query": "MLflow sur Kubernetes ISO 27001",
"k": 5,
"hits": [
{
"score": 0.8423,
"page": 3,
"block": 12,
"file": "tender1.pdf",
"ext": "pdf",
"snippet": "La plateforme MLOps repose sur MLflow dΓ©ployΓ© sur un cluster Kubernetes avec conformitΓ© ISO 27001β¦"
},
...
]
}curl -X POST http://localhost:8000/quickscore \
-H "Content-Type: application/json" \
-d '{
"index_path": "./tender.idx",
"model": "intfloat/multilingual-e5-small",
"e5": true,
"requirements": [
"Provider must be ISO 27001 certified",
"Platform uses MLflow for MLOps",
"Deployments on Kubernetes with GitOps"
],
"topk": 5,
"ollama_model": "mistral",
"nli_lang": "auto"
}' | jqResponse:
{
"fit_score": 83.3,
"verdicts": [
{
"requirement": "Provider must be ISO 27001 certified",
"verdict": "Yes",
"rationale": "The document explicitly states ISO/IEC 27001:2022 certification.",
"evidence": {
"file": "tender1.pdf",
"ext": "pdf",
"page": 5,
"block": 23,
"snippet": "Le prestataire dΓ©tient la certification ISO/IEC 27001:2022 pourβ¦",
"score": 0.7654
},
"evaluated": [...]
},
...
],
"summary": [
{"requirement": "Provider must be ISO 27001 certified", "label": "Yes"},
{"requirement": "Platform uses MLflow for MLOps", "label": "Yes"},
{"requirement": "Deployments on Kubernetes with GitOps", "label": "Partial"}
]
}# JSON export
curl -X POST http://localhost:8000/quickscore/export \
-H "Content-Type: application/json" \
-d '{
"index_path": "./tender.idx",
"requirements": ["ISO 27001 certified"],
"format": "json"
}' > quickscore.json
# CSV export
curl -X POST http://localhost:8000/quickscore/export \
-H "Content-Type: application/json" \
-d '{
"index_path": "./tender.idx",
"requirements": ["ISO 27001 certified", "MLflow on K8s"],
"format": "csv"
}' > quickscore.csvRAGGAE combines dense (semantic) and sparse (lexical) retrieval:
- Dense: Sentence-Transformers bi-encoder (e.g., E5-small) β 384-dim vectors β FAISS inner-product search
- Sparse: BM25 on tokenized text (exact term matching)
- Fusion:
score = Ξ±Β·dense + (1-Ξ±)Β·sparse(default Ξ±=0.6)
Why hybrid?
- Dense: captures semantic similarity ("MLOps platform" β "machine learning operations")
- Sparse: preserves exact matches (acronyms, IDs, legal clauses)
from cli.core.embeddings import STBiEncoder
from cli.core.retriever import HybridRetriever
# Build index
encoder = STBiEncoder("intfloat/multilingual-e5-small", prefix_mode="e5")
texts = ["MLOps with MLflow on K8s", "ISO 27001 certification required"]
retriever = HybridRetriever.build(encoder, texts)
# Search
hits = retriever.search("MLflow on Kubernetes", k=10, alpha=0.6)
for h in hits:
print(h.score, h.text)Natural Language Inference (NLI) determines if a clause satisfies a requirement:
- Input: (clause, requirement) pair
- Output:
{"label": "Yes|No|Partial", "rationale": "..."} - Model: Local LLM via Ollama (Mistral, Llama3, etc.)
Example:
from cli.core.nli_ollama import NLIClient, NLIConfig
nli = NLIClient(NLIConfig(model="mistral", lang="auto"))
result = nli.check(
clause="Le prestataire est certifiΓ© ISO/IEC 27001:2022.",
requirement="Provider must be ISO 27001 certified"
)
# result.label = "Yes"
# result.rationale = "The clause explicitly states ISO/IEC 27001:2022 certification."Robustness:
- Language auto-detection: Retries with fallback language if rationale is invalid
- JSON parsing: Handles malformed LLM outputs gracefully
- Label sanitization: Ensures
label β {"Yes", "No", "Partial"}
Aggregate compliance across multiple requirements:
from cli.core.scoring import FitScorer, RequirementVerdict
verdicts = [
RequirementVerdict("ISO 27001", "Yes", weight=1.5),
RequirementVerdict("MLflow on K8s", "Partial", weight=1.0),
RequirementVerdict("Data in EU", "No", weight=1.0),
]
scorer = FitScorer()
score = scorer.fit_score(verdicts) # 0.56
percentage = scorer.to_percent(score) # 56.0Weights:
- Reflect requirement importance (e.g., mandatory vs. optional)
- Default: 1.0 for all requirements
Adapters translate document-specific formats into a unified Block abstraction:
# PDF
from cli.io.pdf import extract_blocks
blocks = extract_blocks("tender.pdf", min_chars=40)
# β List[PDFBlock(text, page, block, bbox)]
# DOCX / ODT / TXT / MD
from cli.io.textloaders import load_blocks_any
blocks = load_blocks_any("report.docx", min_chars=20)
# β List[TextBlock(text, page=1, block, bbox=(0,0,0,0))]Future adapters (in adapters/):
TenderAdapter: Extract lots, requirements (MUST/SHALL), deadlinesCVAdapter: Parse roles, skills, certifications, experience periodsReportAdapter: Section hierarchy, methods, results, annexes
from cli.core.embeddings import EmbeddingProvider, EmbeddingInfo
import numpy as np
class MyCustomEncoder(EmbeddingProvider):
@property
def info(self) -> EmbeddingInfo:
return EmbeddingInfo(model_name="my-model", device="cpu", dimension=512)
def embed_texts(self, texts) -> np.ndarray:
# Your embedding logic
return np.random.rand(len(texts), 512).astype("float32")
def embed_query(self, text: str) -> np.ndarray:
return self.embed_texts([text])[0]from cli.core.scoring import FitScorer, RequirementVerdict
class CustomScorer(FitScorer):
def fit_score(self, verdicts, extra_signals=None):
# Custom weighting logic
base = super().fit_score(verdicts, extra_signals)
penalty = 0.1 if any(v.label == "No" for v in verdicts if v.weight > 1.0) else 0
return max(0, base - penalty)from dataclasses import dataclass
from typing import List, Dict
@dataclass
class TenderBlock:
text: str
page: int
block: int
section: str # e.g., "Lot 1", "Annex A"
requirement_type: str # "MUST" | "SHALL" | "SHOULD"
def as_metadata(self) -> Dict:
return {
"page": self.page,
"block": self.block,
"section": self.section,
"req_type": self.requirement_type
}
def parse_tender(path: str) -> List[TenderBlock]:
# Your custom tender parsing logic
pass# Stage 1: Hybrid retrieval (top-100)
hits = retriever.search(query, k_dense=100, k=100)
# Stage 2: Cross-encoder re-ranking (top-20)
from sentence_transformers import CrossEncoder
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
pairs = [(query, h.text) for h in hits]
scores = reranker.predict(pairs)
reranked = sorted(zip(hits, scores), key=lambda x: x[1], reverse=True)[:20]# Current: FAISS (embedded)
from cli.core.index_faiss import FaissIndex
# Future: Qdrant (server-based, with filters)
import qdrant_client
class QdrantIndex:
def __init__(self, client, collection_name):
self.client = client
self.collection = collection_name
def add(self, vectors, texts, metadatas):
# Insert into Qdrant
pass
def search(self, query_vec, k):
# Search with filters
pass# Install pytest
mamba install -c conda-forge pytest
# Run all tests
pytest -q
# Run with coverage
pytest --cov=cli --cov-report=html
# Run specific test file
pytest tests/test_core_embeddings.py -v
# Run tests in parallel (requires pytest-xdist)
mamba install -c conda-forge pytest-xdist
pytest -n autoTest structure:
tests/
βββ conftest.py # Fixtures (sample data, mocked NLI)
βββ test_core.py # Core abstractions
βββ test_core_embeddings.py # Embedding providers
βββ test_core_index_retriever.py # FAISS + hybrid retrieval
βββ test_scoring.py # Fit scoring
βββ test_nli_mock.py # Mocked NLI (CI-friendly)
Mocking Ollama for CI:
# tests/conftest.py
@pytest.fixture
def mock_nli(monkeypatch):
def fake_check(clause, req):
if "ISO" in clause and "ISO" in req:
return NLIResult(label="Yes", rationale="ISO mentioned")
return NLIResult(label="No", rationale="No match")
monkeypatch.setattr("cli.core.nli_ollama.NLIClient.check", fake_check)- PEP 8 compliance (use
blackfor formatting) - Type hints for all public APIs
- Docstrings (Google style)
# Format code
pip install black
black cli/ tests/
# Type checking
pip install mypy
mypy cli/
# Linting
pip install flake8
flake8 cli/ --max-line-length=120All modules, classes, and public functions include docstrings:
"""
Brief one-line summary.
Extended description with usage notes.
Parameters
----------
param1 : type
Description.
Returns
-------
type
Description.
Examples
--------
>>> from cli.core.embeddings import STBiEncoder
>>> enc = STBiEncoder("intfloat/multilingual-e5-small")
>>> enc.embed_query("test")
array([0.1, 0.2, ...], dtype=float32)
"""Semantic versioning: MAJOR.MINOR.PATCH
- MAJOR: Breaking API changes
- MINOR: New features (backward-compatible)
- PATCH: Bug fixes
| Model | Dim | CPU (docs/sec) | GPU (docs/sec) | VRAM (8GB) |
|---|---|---|---|---|
multilingual-e5-small |
384 | ~30 | ~200 | β |
multilingual-e5-base |
768 | ~15 | ~120 | β |
gte-base-en-v1.5 |
768 | ~18 | ~150 | β |
Optimization:
- Use
batch_size=64for bulk encoding - Cache embeddings on disk if re-indexing frequently
- Consider
faiss-gpufor multi-million document collections
| Type | Search Speed | Memory | Accuracy |
|---|---|---|---|
IndexFlatIP |
Fast (exact) | High | 100% |
IndexIVFFlat |
Very fast | Medium | ~99% |
IndexHNSWFlat |
Fastest | Highest | ~98% |
When to upgrade:
>100K documents: UseIndexIVFFlatwithnlist=sqrt(N)>1M documents: UseIndexHNSWFlator quantized index
| Model | Quantization | Latency (per check) | VRAM |
|---|---|---|---|
mistral:7b |
Q4_K_M | ~2-3s | 4-5GB |
llama3:8b |
Q4_K_M | ~3-4s | 5-6GB |
phi-3:mini |
Q4_K_M | ~1-2s | 2-3GB |
Optimization:
- Batch NLI checks in parallel (Ollama supports concurrent requests)
- Use smaller models (Phi-3 mini) for faster scoring
- Cache NLI results for repeated requirements
Symptom: torch.cuda.is_available() == False
Solution:
mamba activate adservio-raggae
mamba remove -y pytorch torchvision torchaudio cpuonly
python -m pip uninstall -y torch torchvision torchaudio
mamba install -y -c pytorch -c nvidia pytorch=2.5.* pytorch-cuda=12.1 torchvision torchaudioVerify:
python -c "import torch; print('CUDA:', torch.cuda.is_available())"Symptom: requests.exceptions.ConnectionError: Ollama not running
Solution:
# Start Ollama daemon
ollama serve
# In another terminal, pull a model
ollama pull mistral:latest
# Test
ollama run mistral "Hello"Symptom: AttributeError: module 'numpy' has no attribute 'broadcast_to'
Solution:
mamba activate adservio-raggae
python -m pip uninstall -y numpy
mamba install -y -c conda-forge "numpy>=1.26"Symptom: AssertionError: d == index.d
Cause: Embedding model changed between indexing and search.
Solution:
- Re-index with the correct model
- Or ensure
--modelmatches the original indexing model
Symptom: 404 Not Found or blank page
Solution:
# Ensure FastAPI is serving static files
# Check that web/ directory exists:
ls -la web/
# Restart server with --reload
uvicorn cli.demo_app:app --host 0.0.0.0 --port 8000 --reload
# Access via http://localhost:8000 (not /app)Contributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Add tests for new functionality
- Ensure tests pass:
pytest - Format code:
black cli/ tests/ - Commit:
git commit -m "Add amazing feature" - Push:
git push origin feature/amazing-feature - Open a Pull Request
Code review checklist:
- Tests pass (
pytest) - Code formatted (
black) - Type hints added (
mypy) - Docstrings updated
- README updated (if API changed)
This project is licensed under the MIT License - see the LICENSE file for details.
Dr. Olivier Vitrac, PhD, HDR
- Email: olivier.vitrac@adservio.com
- Organization: Adservio
- Date: October 31, 2025
- Sentence-Transformers (Nils Reimers, UKP Lab) β Embedding models
- FAISS (Facebook AI Research) β Vector similarity search
- Ollama β Local LLM inference
- FastAPI (SebastiΓ‘n RamΓrez) β Modern Python web framework
- PyMuPDF β Robust PDF parsing
- Hugging Face β Model hosting and ecosystem
Inspirations:
- LangChain, LlamaIndex (RAG frameworks)
- ColBERT, SPLADE (advanced retrieval)
- MS MARCO, BEIR (retrieval benchmarks)
If you use RAGGAE in your research or production systems, please cite:
@software{raggae2025,
author = {Vitrac, Olivier},
title = {RAGGAE: Retrieval-Augmented Generation Generalized Architecture for Enterprise},
year = {2025},
publisher = {GitHub},
url = {https://github.com/adservio/raggae}
}graph TD
subgraph "External"
ST[sentence-transformers]
FAISS[faiss-cpu/gpu]
BM25[rank-bm25]
OLLAMA[ollama]
FITZ[PyMuPDF]
end
subgraph "Core"
EMB[embeddings.py]
IDX[index_faiss.py]
RET[retriever.py]
SCR[scoring.py]
NLI[nli_ollama.py]
end
subgraph "IO"
PDF[pdf.py]
TXT[textloaders.py]
end
subgraph "CLI"
IDXCLI[index_doc.py]
SEARCH[search.py]
QUICK[quickscore.py]
APP[demo_app.py]
end
ST -->|used by| EMB
FAISS -->|used by| IDX
BM25 -->|used by| RET
OLLAMA -->|used by| NLI
FITZ -->|used by| PDF
EMB -->|provides| IDX
IDX -->|provides| RET
RET -->|provides| APP
NLI -->|provides| SCR
SCR -->|provides| QUICK
PDF -->|provides| IDXCLI
TXT -->|provides| APP
EMB --> IDXCLI
EMB --> SEARCH
EMB --> QUICK
EMB --> APP
IDX --> IDXCLI
IDX --> SEARCH
IDX --> QUICK
IDX --> APP
NLI --> QUICK
NLI --> APP
SCR --> QUICK
SCR --> APP
gantt
title RAGGAE Roadmap
dateFormat YYYY-MM
section Core
Hybrid retrieval (dense + sparse) :done, 2025-10, 1M
NLI-based compliance checking :done, 2025-10, 1M
Fit scoring with weights :done, 2025-10, 1M
Cross-encoder re-ranking :active, 2025-11, 1M
Domain-tuned embeddings (fine-tune) :2025-12, 2M
section Adapters
PDF + DOCX + TXT loaders :done, 2025-10, 1M
TenderAdapter (lots, requirements) :2025-11, 1M
CVAdapter (skills, experience) :2025-12, 1M
ReportAdapter (sections, tables) :2026-01, 1M
section Infra
FAISS embedded index :done, 2025-10, 1M
Qdrant server integration :2025-12, 1M
Persistent caching (Redis) :2026-01, 1M
section UI/UX
Web UI (upload, search, score) :done, 2025-10, 1M
Export audit trails (JSON/CSV) :done, 2025-10, 1M
Bulk batch processing :2025-11, 1M
Advanced filters (date, tags) :2025-12, 1M
End of README
For questions, issues, or feature requests, please open an issue on GitHub or contact olivier.vitrac@adservio.com.
