Retrivis.AI — Multimodal RAG System

Production-deployed RAG backend supporting text, image, and document queries — powered by GPT-4o LLM with configurable retrieval strategies.

👆 Click to explore Retrivis.AI live

For demo video click here 👉

📌 Overview

This is the backend service of the RetrIVis.AI RAG platform — a production-grade, multimodal Retrieval-Augmented Generation (RAG) system built to handle intelligent querying over text, images, and complex documents. It exposes a structured API that the frontend consumes to run document ingestion, retrieval, and generation workflows.

Key capabilities:

📄 Document ingestion — PDFs, images, and text files with OCR support via Poppler and Tesseract
🌐 URL-based RAG — Web page ingestion via ScrapingBee, enabling RAG over live URLs without manual document upload
🔍 Four retrieval modes — Vector Search, Hybrid Search, Multi-Query Vector, and Multi-Query Hybrid, selectable per request
🤖 GPT-4o generation — Structured prompt construction with retrieved context, powered by gpt-4o
🧮 text-embedding-3-large — OpenAI's highest-quality embedding model for dense vector indexing
⚙️ Async task processing — Celery workers + Redis as the message broker for background ingestion jobs
🔭 LangSmith observability — End-to-end tracing of every RAG chain execution

🚀 Live App link here 👇

🔗 retrivis-ai-client.vercel.app

🏗️ Architecture

System Overview

The system is split into three independently running services: the FastAPI server (handles API requests), the Celery worker (processes background document ingestion tasks), and Redis (acts as the message broker between the two). Supabase (PostgreSQL + pgvector) serves as the persistent store for both document metadata and vector embeddings.

Ingestion Pipeline

When a document is uploaded, the API immediately enqueues an ingestion task via Redis. The Celery worker picks up the task and runs the full pipeline:

File type detection via libmagic
PDF parsing via Poppler (pdf2image, pdfplumber) or OCR via Tesseract for scanned/image-based files
Text chunking by title and context using Unstructured — documents are split into semantically meaningful sections with atomic elements preserving heading hierarchy and content boundaries, rather than fixed-size character windows
Embedding generation via OpenAI text-embedding-3-large
Upsert to Supabase pgvector — both the raw chunk text and its vector are persisted
Metadata indexing — document title, page number, chunk index, and source type stored for filtering

This async design ensures the API remains non-blocking even for large, multi-page documents.

Retrieval Pipeline

The retrieval stage is user-configurable. Clients can select one of four strategies per query:

Mode	Description
Vector Search	Dense cosine similarity search over pgvector using the query embedding. Powered by the custom Postgres function `vector_search_document_chunks`.
Hybrid Search	Combines dense vector search with sparse BM25-style full-text keyword search (`keyword_search_document_chunks`), then merges results via Reciprocal Rank Fusion (RRF).
Multi-Query Vector	Uses GPT-4o to generate multiple semantically diverse reformulations of the original query, runs vector search for each, and deduplicates results. Improves recall on ambiguous queries.
Multi-Query Hybrid	Combines multi-query reformulation with the hybrid retrieval strategy — each reformulated query runs through both vector and keyword search. Highest recall, best for complex or multi-faceted questions.

All strategies retrieve top-k chunks from Supabase and pass them as structured context to the generation stage.

Generation Pipeline

The generation stage is agent-driven. When a user submits a query, the backend routes it to one of two agent modes based on the user's settings.

Simple RAG routes the query directly through a lightweight agent that runs the configured retrieval strategy, then passes the context through guardrails — checking for toxicity, prompt injection, and PII — before generating the final response with gpt-4o.

Agentic RAG hands off to a supervisor agent that decides at runtime whether to invoke the Web Search tool (for current or external knowledge) or the RAG tool (for retrieval over ingested documents). The result passes through the same guardrail layer before generation. This mode is suited for complex, multi-step, or open-domain queries that go beyond the ingested document corpus.

All generation calls are traced end-to-end in LangSmith for debugging and evaluation.

🔧 Tech Stack

Layer	Technology
API Server	FastAPI (Python 3.11+)
RAG Framework	LangChain
Chat LLM	GPT-4o (`gpt-4o`)
Embedding Model	OpenAI `text-embedding-3-large`
Vector DB	Supabase (PostgreSQL + pgvector)
Retrieval Modes	Vector, Hybrid, Multi-Query Vector, Multi-Query Hybrid
Document Parsing	Poppler (PDF), Tesseract (OCR), libmagic (type detection)
Web Scraping	ScrapingBee
Message Broker	Redis
Task Queue	Celery
Evaluation	RAGAS
Observability	LangSmith tracing
Containerization	Docker + Docker Compose
Dependency Management	Poetry
Database Migrations	Supabase CLI
Agents	LangGraph

📊 Evaluation Metrics

RetrIVis.AI is benchmarked using the RAGAS framework over a test set of ~25 real queries. Evaluation scripts are located in the /evaluation directory.

Metric	Description	Score
Faithfulness	Are generated answers grounded in the retrieved context, with no hallucination?	0.95
Answer Relevancy	How directly does the generated answer address the original question?	0.81
Context Precision	Is the retrieved context focused — i.e. are the top-k chunks actually relevant?	0.88
Context Recall	Does the retrieved context contain all information needed to answer the question?	0.82

All scores are on a scale of 0–1 (higher is better). The evaluation was run over ~25 real queries using the RAGAS framework.

🛠️ Setup

Prerequisites

Python 3.10+
Docker & Docker Compose
AWS credentials configured
Supabase CLI (for local DB setup)

Installation

1. Clone the repo

git clone https://github.com/jawahar-singamsetty/retrivis.ai-server.git
cd retrivis.ai-server

2. Configure environment

cp .env.sample .env
# Fill in your keys (see Environment Variables below)

3. Start all services

All services are managed via the Makefile. Run each in a separate terminal:

make redis       # Terminal 1 — starts Redis (message broker)
make worker      # Terminal 2 — starts Celery worker (background ingestion)
make server      # Terminal 3 — starts FastAPI backend

Or check available commands:

make help

Server runs at http://localhost:8000 · Docs at http://localhost:8000/docs

Environment Variables

Variable	Description	How to get it
`OPENAI_API_KEY`	Used for GPT-4o chat and `text-embedding-3-large`	platform.openai.com/api-keys
`SUPABASE_API_URL`	Your Supabase project REST URL	Supabase dashboard → Project Settings → API → Project URL
`SUPABASE_SECRET_KEY`	Supabase service role key (full DB access)	Supabase dashboard → Project Settings → API → Secret Key
`CLERK_SECRET_KEY`	Backend auth key for Clerk	clerk.com → Your App → API Keys → Secret Key
`AWS_ACCESS_KEY_ID`	AWS IAM access key for S3	AWS Console → IAM → Users → Security credentials
`AWS_SECRET_ACCESS_KEY`	AWS IAM secret key for S3	Same as above (shown once on creation)
`AWS_ENDPOINT_URL_S3`	S3 endpoint URL (use default or custom)	Default: `https://s3.<region>.amazonaws.com`
`S3_BUCKET_NAME`	Name of your S3 bucket for file storage	AWS Console → S3 → your bucket name
`SCRAPINGBEE_API_KEY`	Used for URL-based web page ingestion	scrapingbee.com → Dashboard → API Key
`REDIS_URL`	Redis connection URL	Local default: `redis://localhost:6379/0`
`DOMAIN`	Frontend origin for CORS	Default: `http://localhost:3000`
`LANGSMITH_TRACING`	Enable LangSmith trace logging	Set to `true`
`LANGSMITH_ENDPOINT`	LangSmith ingestion endpoint	`https://api.smith.langchain.com`
`LANGSMITH_API_KEY`	LangSmith API key	smith.langchain.com → Settings → API Keys
`LANGSMITH_PROJECT`	Project name for grouping traces	Any string, e.g. `retrivis-ai`
`TAVILY_API_KEY`	Used by the Agentic RAG web search tool	tavily.com → Dashboard → API Key

🙏 Acknowledgements

Built and maintained by Jawahar Singamsetty. Open to AI Engineer roles — feel free to reach out via LinkedIn

Built with ❤️ by Jawahar Singamsetty

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
assets		assets
evaluation/scripts		evaluation/scripts
redis		redis
src		src
supabase		supabase
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
start_redis.sh		start_redis.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retrivis.AI — Multimodal RAG System

📌 Overview

🚀 Live App link here 👇

🏗️ Architecture

System Overview

Ingestion Pipeline

Retrieval Pipeline

Generation Pipeline

🔧 Tech Stack

📊 Evaluation Metrics

🛠️ Setup

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Retrivis.AI — Multimodal RAG System

📌 Overview

🚀 Live App link here 👇

🏗️ Architecture

System Overview

Ingestion Pipeline

Retrieval Pipeline

Generation Pipeline

🔧 Tech Stack

📊 Evaluation Metrics

🛠️ Setup

🙏 Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages