💬 ChatWithDocs

AI-powered document Q&A — upload any document, ask anything, get cited answers.

ChatWithDocs is a full-stack RAG (Retrieval-Augmented Generation) application that lets you chat with your documents using natural language. Powered by Google Gemini and Pinecone vector search, every answer comes with exact source citations so you always know where the information came from.

🔗 Live Demo

✨ Features

📤 Multi-format upload — PDF, DOCX, XLSX, PPTX, TXT, CSV, MD, HTML, RTF, ODT
🔍 Semantic search — Pinecone vector DB with Gemini embeddings
💬 Conversational AI — warm, helpful answers with full conversation history
📌 Source citations — every answer cites exact document + page number
📊 Analytics dashboard — usage stats, activity charts, top documents
🕓 History browser — browse, search, and continue past conversations
🔒 Auth — JWT-based register/login, per-user data isolation
🐳 Docker ready — production-grade Nginx + multi-stage build

📸 Screenshots

Dashboard

Chat Interface

Document Library

Analytics

Conversation History

🏗️ Architecture

User
 │
 ▼
React Frontend (Vite + TailwindCSS)
 │  JWT auth · axios interceptor
 ▼
FastAPI Backend
 │
 ├── Auth Router      → JWT · bcrypt
 ├── Documents Router → upload · parse · chunk · embed
 ├── Chat Router      → semantic search · LLM generation · citations
 └── Analytics Router → usage stats · activity data
      │
      ├── PostgreSQL (Supabase)   → users · documents · conversations · messages
      ├── Pinecone                → vector embeddings (768-dim, cosine)
      └── Google Gemini           → embeddings + conversational generation

🛠️ Tech Stack

Layer	Technology
Frontend	React 18 · Vite · TailwindCSS · Recharts
Backend	Python 3.11 · FastAPI · SQLAlchemy
LLM	Google Gemini (gemini-3.1-flash-lite)
Embeddings	Gemini Embedding 001 (768 dims)
Vector DB	Pinecone (serverless · cosine similarity)
Database	PostgreSQL via Supabase
Auth	JWT (python-jose) · bcrypt (passlib)
DevOps	Docker · Nginx · pytest (19/19 tests)

🚀 Quick Start

Prerequisites

Python 3.11+
Node.js 18+
A Supabase project (free tier works)
A Pinecone account (free tier works)
A Google AI Studio API key

1. Clone & configure

git clone https://github.com/Saanchi-Itkelwar/ChatWithDocs
cd chatwithDocs

cp .env.example .env
# Open .env and fill in your keys (see Environment Variables below)

2. Backend

cd backend
python -m venv venv

source venv/bin/activate        # Mac / Linux
# venv\Scripts\activate         # Windows

pip install -r requirements.txt
uvicorn main:app --reload --port 8000

API available at: http://localhost:8000
Swagger docs at: http://localhost:8000/docs

3. Frontend

cd frontend
npm install
npm run dev

App available at: http://localhost:5173

🐳 Docker (Production)

docker compose up --build

Runs the full stack — FastAPI backend + Nginx-served React frontend.

⚙️ Environment Variables

Create a .env file in the backend folder:

DATABASE_URL=postgresql://postgres:[password]@db.[ref].supabase.co:5432/postgres
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_INDEX_NAME=chatwithdocs
GEMINI_API_KEY=your_gemini_api_key
JWT_SECRET=your_random_secret_string_at_least_32_chars

Variable	Where to get it
`DATABASE_URL`	Supabase → Project Settings → Database → Connection string
`PINECONE_API_KEY`	Pinecone console → API Keys
`PINECONE_INDEX_NAME`	Must be lowercase, e.g. `chatwithdocs`
`GEMINI_API_KEY`	Google AI Studio
`JWT_SECRET`	Any random string — `openssl rand -hex 32`

🧪 Tests

cd backend
source venv/bin/activate
pytest tests/ -v

📁 Project Structure

chatwithDocs/
├── backend/
│   ├── core/
│   │   ├── config.py        # Pydantic settings — env vars
│   │   ├── database.py      # SQLAlchemy engine + session
│   │   ├── logger.py        # Structured logging
│   │   ├── security.py      # bcrypt + JWT
│   │   └── utils.py         # UUID helpers
│   ├── models/              # SQLAlchemy ORM models
│   │   ├── user.py
│   │   ├── document.py
│   │   ├── chunk.py
│   │   └── conversation.py
│   ├── routers/             # FastAPI route handlers
│   │   ├── auth.py
│   │   ├── documents.py
│   │   ├── chat.py
│   │   └── analytics.py
│   ├── schemas/             # Pydantic request/response schemas
│   ├── services/            # Core RAG pipeline
│   │   ├── parser.py        # Multi-format document parsing
│   │   ├── chunker.py       # Paragraph-aware overlapping chunks
│   │   ├── embedder.py      # Gemini embedding-001 (768 dims)
│   │   ├── retriever.py     # Pinecone store + semantic search
│   │   ├── ingestion.py     # Full pipeline orchestrator
│   │   └── llm.py           # Gemini conversational generation
│   ├── tests/               # Pytest suite (19 tests)
│   ├── uploads/             # Uploaded files (gitignored)
│   ├── main.py              # App entry point
│   └── requirements.txt
├── frontend/
│   ├── src/
│   │   ├── components/      # AppLayout, Sidebar, ErrorBoundary, PrivateRoute
│   │   ├── context/         # AuthContext (JWT + user state)
│   │   ├── pages/           # Dashboard, Upload, Library, Chat, History, Analytics
│   │   └── services/        # Axios API client with JWT interceptor
│   ├── Dockerfile           # Multi-stage Nginx production build
│   └── nginx.conf
├── .env.example
├── docker-compose.yml
└── README.md

🔄 RAG Pipeline

Document Upload
      │
      ▼
   Parser          → extracts text from PDF/DOCX/XLSX/etc.
      │
      ▼
   Chunker         → paragraph-aware overlapping chunks (800 tokens, 120 overlap)
      │
      ▼
   Embedder        → Gemini embedding-001 → 768-dim vectors
      │
      ▼
   Pinecone        → stores vectors with document/user metadata
      │
      ▼  (on user question)
Semantic Search    → embed query → top-5 cosine similar chunks
      │
      ▼
  Gemini LLM       → conversational answer grounded in retrieved chunks
      │
      ▼
  Citations        → exact source document + page number returned

📄 Supported File Formats

Format	Extension
PDF	`.pdf`
Word	`.docx`
Excel	`.xlsx`
PowerPoint	`.pptx`
Plain Text	`.txt`
Markdown	`.md`
CSV	`.csv`
HTML	`.html` `.htm`
Rich Text	`.rtf`
OpenDocument	`.odt`

Max file size: 50MB

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
backend		backend
docs/screenshots		docs/screenshots
frontend		frontend
.gitlab-ci.yml		.gitlab-ci.yml
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💬 ChatWithDocs

✨ Features

📸 Screenshots

Dashboard

Chat Interface

Document Library

Analytics

Conversation History

🏗️ Architecture

🛠️ Tech Stack

🚀 Quick Start

Prerequisites

1. Clone & configure

2. Backend

3. Frontend

🐳 Docker (Production)

⚙️ Environment Variables

🧪 Tests

📁 Project Structure

🔄 RAG Pipeline

📄 Supported File Formats

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

💬 ChatWithDocs

✨ Features

📸 Screenshots

Dashboard

Chat Interface

Document Library

Analytics

Conversation History

🏗️ Architecture

🛠️ Tech Stack

🚀 Quick Start

Prerequisites

1. Clone & configure

2. Backend

3. Frontend

🐳 Docker (Production)

⚙️ Environment Variables

🧪 Tests

📁 Project Structure

🔄 RAG Pipeline

📄 Supported File Formats

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages