Skip to content

Saanchi-Itkelwar/ChatWithDocs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ’¬ ChatWithDocs

AI-powered document Q&A β€” upload any document, ask anything, get cited answers.

ChatWithDocs is a full-stack RAG (Retrieval-Augmented Generation) application that lets you chat with your documents using natural language. Powered by Google Gemini and Pinecone vector search, every answer comes with exact source citations so you always know where the information came from.

πŸ”— Live Demo


✨ Features

  • πŸ“€ Multi-format upload β€” PDF, DOCX, XLSX, PPTX, TXT, CSV, MD, HTML, RTF, ODT
  • πŸ” Semantic search β€” Pinecone vector DB with Gemini embeddings
  • πŸ’¬ Conversational AI β€” warm, helpful answers with full conversation history
  • πŸ“Œ Source citations β€” every answer cites exact document + page number
  • πŸ“Š Analytics dashboard β€” usage stats, activity charts, top documents
  • πŸ•“ History browser β€” browse, search, and continue past conversations
  • πŸ”’ Auth β€” JWT-based register/login, per-user data isolation
  • 🐳 Docker ready β€” production-grade Nginx + multi-stage build

πŸ“Έ Screenshots

Dashboard

Dashboard

Chat Interface

Chat

Document Library

Library

Analytics

Analytics

Conversation History

History


πŸ—οΈ Architecture

User
 β”‚
 β–Ό
React Frontend (Vite + TailwindCSS)
 β”‚  JWT auth Β· axios interceptor
 β–Ό
FastAPI Backend
 β”‚
 β”œβ”€β”€ Auth Router      β†’ JWT Β· bcrypt
 β”œβ”€β”€ Documents Router β†’ upload Β· parse Β· chunk Β· embed
 β”œβ”€β”€ Chat Router      β†’ semantic search Β· LLM generation Β· citations
 └── Analytics Router β†’ usage stats Β· activity data
      β”‚
      β”œβ”€β”€ PostgreSQL (Supabase)   β†’ users Β· documents Β· conversations Β· messages
      β”œβ”€β”€ Pinecone                β†’ vector embeddings (768-dim, cosine)
      └── Google Gemini           β†’ embeddings + conversational generation

πŸ› οΈ Tech Stack

Layer Technology
Frontend React 18 Β· Vite Β· TailwindCSS Β· Recharts
Backend Python 3.11 Β· FastAPI Β· SQLAlchemy
LLM Google Gemini (gemini-3.1-flash-lite)
Embeddings Gemini Embedding 001 (768 dims)
Vector DB Pinecone (serverless Β· cosine similarity)
Database PostgreSQL via Supabase
Auth JWT (python-jose) Β· bcrypt (passlib)
DevOps Docker Β· Nginx Β· pytest (19/19 tests)

πŸš€ Quick Start

Prerequisites

1. Clone & configure

git clone https://github.com/Saanchi-Itkelwar/ChatWithDocs
cd chatwithDocs

cp .env.example .env
# Open .env and fill in your keys (see Environment Variables below)

2. Backend

cd backend
python -m venv venv

source venv/bin/activate        # Mac / Linux
# venv\Scripts\activate         # Windows

pip install -r requirements.txt
uvicorn main:app --reload --port 8000

API available at: http://localhost:8000
Swagger docs at: http://localhost:8000/docs

3. Frontend

cd frontend
npm install
npm run dev

App available at: http://localhost:5173


🐳 Docker (Production)

docker compose up --build

Runs the full stack β€” FastAPI backend + Nginx-served React frontend.


βš™οΈ Environment Variables

Create a .env file in the backend folder:

DATABASE_URL=postgresql://postgres:[password]@db.[ref].supabase.co:5432/postgres
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_INDEX_NAME=chatwithdocs
GEMINI_API_KEY=your_gemini_api_key
JWT_SECRET=your_random_secret_string_at_least_32_chars
Variable Where to get it
DATABASE_URL Supabase β†’ Project Settings β†’ Database β†’ Connection string
PINECONE_API_KEY Pinecone console β†’ API Keys
PINECONE_INDEX_NAME Must be lowercase, e.g. chatwithdocs
GEMINI_API_KEY Google AI Studio
JWT_SECRET Any random string β€” openssl rand -hex 32

πŸ§ͺ Tests

cd backend
source venv/bin/activate
pytest tests/ -v

πŸ“ Project Structure

chatwithDocs/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ config.py        # Pydantic settings β€” env vars
β”‚   β”‚   β”œβ”€β”€ database.py      # SQLAlchemy engine + session
β”‚   β”‚   β”œβ”€β”€ logger.py        # Structured logging
β”‚   β”‚   β”œβ”€β”€ security.py      # bcrypt + JWT
β”‚   β”‚   └── utils.py         # UUID helpers
β”‚   β”œβ”€β”€ models/              # SQLAlchemy ORM models
β”‚   β”‚   β”œβ”€β”€ user.py
β”‚   β”‚   β”œβ”€β”€ document.py
β”‚   β”‚   β”œβ”€β”€ chunk.py
β”‚   β”‚   └── conversation.py
β”‚   β”œβ”€β”€ routers/             # FastAPI route handlers
β”‚   β”‚   β”œβ”€β”€ auth.py
β”‚   β”‚   β”œβ”€β”€ documents.py
β”‚   β”‚   β”œβ”€β”€ chat.py
β”‚   β”‚   └── analytics.py
β”‚   β”œβ”€β”€ schemas/             # Pydantic request/response schemas
β”‚   β”œβ”€β”€ services/            # Core RAG pipeline
β”‚   β”‚   β”œβ”€β”€ parser.py        # Multi-format document parsing
β”‚   β”‚   β”œβ”€β”€ chunker.py       # Paragraph-aware overlapping chunks
β”‚   β”‚   β”œβ”€β”€ embedder.py      # Gemini embedding-001 (768 dims)
β”‚   β”‚   β”œβ”€β”€ retriever.py     # Pinecone store + semantic search
β”‚   β”‚   β”œβ”€β”€ ingestion.py     # Full pipeline orchestrator
β”‚   β”‚   └── llm.py           # Gemini conversational generation
β”‚   β”œβ”€β”€ tests/               # Pytest suite (19 tests)
β”‚   β”œβ”€β”€ uploads/             # Uploaded files (gitignored)
β”‚   β”œβ”€β”€ main.py              # App entry point
β”‚   └── requirements.txt
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/      # AppLayout, Sidebar, ErrorBoundary, PrivateRoute
β”‚   β”‚   β”œβ”€β”€ context/         # AuthContext (JWT + user state)
β”‚   β”‚   β”œβ”€β”€ pages/           # Dashboard, Upload, Library, Chat, History, Analytics
β”‚   β”‚   └── services/        # Axios API client with JWT interceptor
β”‚   β”œβ”€β”€ Dockerfile           # Multi-stage Nginx production build
β”‚   └── nginx.conf
β”œβ”€β”€ .env.example
β”œβ”€β”€ docker-compose.yml
└── README.md

πŸ”„ RAG Pipeline

Document Upload
      β”‚
      β–Ό
   Parser          β†’ extracts text from PDF/DOCX/XLSX/etc.
      β”‚
      β–Ό
   Chunker         β†’ paragraph-aware overlapping chunks (800 tokens, 120 overlap)
      β”‚
      β–Ό
   Embedder        β†’ Gemini embedding-001 β†’ 768-dim vectors
      β”‚
      β–Ό
   Pinecone        β†’ stores vectors with document/user metadata
      β”‚
      β–Ό  (on user question)
Semantic Search    β†’ embed query β†’ top-5 cosine similar chunks
      β”‚
      β–Ό
  Gemini LLM       β†’ conversational answer grounded in retrieved chunks
      β”‚
      β–Ό
  Citations        β†’ exact source document + page number returned

πŸ“„ Supported File Formats

Format Extension
PDF .pdf
Word .docx
Excel .xlsx
PowerPoint .pptx
Plain Text .txt
Markdown .md
CSV .csv
HTML .html .htm
Rich Text .rtf
OpenDocument .odt

Max file size: 50MB


About

AI-powered document Q&A app that lets you upload files and chat with them using RAG, Gemini AI, and Pinecone vector search.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors