A Retrieval-Augmented Generation (RAG) system for document-based Q&A.
This system enables users to upload documents (PDF, TXT, URLs, or raw text) and ask questions with guaranteed source-based answers. The agent refuses to hallucinate - if information isn't in the provided documents, it clearly states so.
Built for: Documentation Support Agent Technical Assessment
- โ Multi-source ingestion: PDF, TXT files, web URLs, and raw text
- โ Semantic chunking: Uses LangChain's SemanticChunker for intelligent text splitting
- โ Pure semantic search: sentence-transformers embeddings + FAISS vector store
- โ Cosine similarity: Normalized vectors for meaning-based retrieval
- โ Zero hallucination: Multi-layer guardrails prevent made-up answers
- โ Source highlighting: Shows exact passages with similarity scores
- โ Web interface: Clean Streamlit UI with document management
User Query
โ
[Embedding Model] โ all-MiniLM-L6-v2 (384-dim vectors)
โ
[FAISS Search] โ Cosine similarity (IndexFlatIP)
โ
[Top-5 Chunks] โ Most relevant passages retrieved
โ
[Gemini LLM] โ Answer generation (temp=0.1, strict prompt)
โ
[Response] โ Answer + source citations + similarity scores
DocumentProcessor
- Extracts text from PDFs, TXT files, URLs
- Uses LangChain SemanticChunker for context-aware splitting
- Preserves semantic coherence across chunks
VectorStore
- sentence-transformers for embeddings
- FAISS IndexFlatIP for fast cosine similarity search
- Normalized vectors for semantic (not magnitude) comparison
AnswerGenerator
- Gemini 2.5 Flash with strict source-only prompting
- Temperature: 0.1 (low creativity = high factuality)
- Mandatory source citations in responses
ChatBot
- Orchestrates the full RAG pipeline
- Manages document lifecycle (ingest/clear)
- Coordinates retrieval and generation
- Python 3.8 or higher
- Gemini API key (Get free key)
- Install dependencies
pip install -r requirements.txt- Run the application
streamlit run doc_support_agent.py- Access the interface
- Opens automatically at
http://localhost:8501 - Enter your Gemini API key when prompted
Enter your Gemini API key in the text field. Wait for "โ Chatbot initialized successfully!"
Choose from three options:
- Upload PDF/TXT: Select local files
- Enter URL: Paste webpage URLs for scraping
- Paste Text: Directly input text content
Multiple documents can be added sequentially.
Type your question in the text field. The system will:
- Search for relevant chunks (semantic search)
- Generate answer using only those sources
- Display answer with source citations
- Show source excerpts with similarity scores
Click "๐๏ธ Clear All Documents" to remove all ingested data and start fresh.
Layer 1: Strict Prompting
"Answer ONLY using information from the sources below"
"DO NOT use any external knowledge"
"If sources don't contain enough info, say so clearly"
Layer 2: Low Temperature (0.1)
- Minimizes LLM creativity and randomness
- Ensures deterministic, grounded responses
- Reduces likelihood of invented information
Layer 3: Mandatory Citations
- LLM must reference [Source 1], [Source 2], etc.
- Makes grounding transparent and verifiable
- Easy to trace answers back to documents
Layer 4: Semantic Filtering
- Only retrieves chunks above relevance threshold
- Top-k retrieval (default: 5 chunks)
- Prevents irrelevant context from confusing LLM
Traditional fixed-size chunking (e.g., 1000 characters) often breaks mid-sentence or mid-thought. LangChain's SemanticChunker splits text based on semantic coherence:
# Traditional chunking problems:
"...Python supports OOP. |CHUNK BREAK| Python has simple syntax..."
# Context lost! Each chunk lacks full meaning.
# Semantic chunking preserves context:
"...Python supports OOP. Python has simple syntax..."
# Complete thoughts stay together.# Without normalization (Euclidean distance)
v1 = [0.5, 0.5] # Short vector
v2 = [5.0, 5.0] # Long vector, SAME direction
distance = 6.36 # Seems very different!
# With normalization (Cosine similarity)
faiss.normalize_L2(embeddings)
v1_norm = [0.707, 0.707]
v2_norm = [0.707, 0.707]
similarity = 1.0 # Correctly identifies as similar!Key insight: For text, we care about semantic direction (meaning), not vector magnitude (arbitrary scale). Normalization + IndexFlatIP gives us pure cosine similarity.
- Query encoding: Convert question to 384-dim embedding
- Normalization: L2-normalize query vector
- FAISS search: IndexFlatIP computes dot products (= cosine similarity for normalized vectors)
- Top-k selection: Return 5 most similar chunks
- Context building: Combine chunks for LLM
# Automatic semantic-based chunking
# No manual chunk_size or overlap needed
# LangChain determines optimal boundariesk = 5 # Number of chunks to retrieve
# Adjustable: chatbot.query(question, k=10)generation_config = {
"temperature": 0.1, # Low = factual, high = creative
"top_p": 0.9, # Nucleus sampling
"max_output_tokens": 1500 # Response length limit
}- Speed: Fast inference, only 384 dimensions
- Quality: Good semantic understanding for general text
- Size: 80MB model (reasonable download)
Alternative: all-mpnet-base-v2 (768-dim, better quality, slower)
- Performance: Millisecond search even with 100k+ vectors
- Memory efficient: Optimized C++ implementation
- Scalable: Supports billions of vectors
- Industry standard: Developed by Meta AI
Alternative: Pinecone, Weaviate (managed services, more features)
- Speed: Fast response times
- Quality: Good instruction following
- Cost: Free tier available
- Reliability: Handles strict prompting well
Alternative: GPT-4, Claude (better quality, higher cost) or Transformer - based Open Source Models such as LiquidAI/LFM2-1.2B-RAG
- Context preservation: Doesn't split mid-thought
- Semantic coherence: Uses embeddings to find boundaries
- Better retrieval: More meaningful chunks = better matches
Alternative: Fixed-size chunking (simpler, less accurate)
.
โโโ doc_support_agent.py # Main Streamlit application
โโโ requirements.txt # Python dependencies
โโโ README.md # This file
โโโ .gitignore # Git exclusions (API keys, etc.)
ChatBot
โโโ DocumentProcessor (ingestion + chunking)
โโโ VectorStore (embeddings + search)
โโโ AnswerGenerator (LLM interface)
โโโ Similar_source (formatting utilities)
-
In-memory storage: Documents cleared on app restart
- Production fix: Use persistent vector DB (Pinecone, Weaviate)
-
No conversation history: Each query is independent
- Production fix: Implement chat memory with context window
-
English-optimized: Model trained primarily on English
- Production fix: Use multilingual models (paraphrase-multilingual)
-
PDF quality dependent: Scanned PDFs won't extract text
- Production fix: Add OCR (pytesseract, AWS Textract)
-
Single-session: No user accounts or saved documents
- Production fix: Add authentication and database storage
- Re-ranking stage: Add cross-encoder for better precision
- Conversation memory: Track dialogue context
- Document versioning: Update docs without full re-index
- Batch upload: Process multiple files simultaneously
- Query caching: Store common question-answer pairs
- Advanced filters: Filter by document source, date, etc.
- Export functionality: Save Q&A pairs as markdown/PDF
- Analytics: Track popular queries, retrieval quality
-
Clear Answer Test
- Upload Python tutorial
- Ask: "What is a list comprehension?"
- โ Should get detailed answer with sources
-
Hallucination Prevention Test
- Same document
- Ask: "How do I use React hooks?"
- โ Should refuse (not in Python docs)
-
Multi-source Test
- Upload multiple documents
- Ask question spanning both
- โ Should synthesize from multiple sources
-
Edge Cases
- Empty query โ validation error
- No documents uploaded โ warning message
- Malformed PDF โ graceful error handling
- PDF files (PyPDF2)
- TXT files (native Python)
- URLs (BeautifulSoup + requests)
- Raw text (direct input)
- Intelligent chunking (SemanticChunker)
- HuggingFace model (sentence-transformers)
- Vector database (FAISS in-memory)
- Semantic search (cosine similarity)
- No keyword matching (pure embeddings)
- Question input
- Strictly source-based answers
- Source passage highlighting
- Similarity scores displayed
- Clean web UI (Streamlit)
- Strict prompting
- Low temperature (0.1)
- Mandatory citations
- Clear "insufficient information" responses
- No invented content
- Modular structure (4 main classes)
- Clear separation of concerns
- Type hints (Pydantic models)
- Error handling
- Clean, readable code
- Modern RAG: Uses current best practices (semantic chunking, normalized vectors)
- Zero to minimal hallucination: Multiple layers of prevention
- Clean code: Well-structured, typed, documented
For setup issues:
- Check
requirements.txt- all dependencies installed? - Python 3.8+ installed? Check with
python --version - Valid Gemini API key from https://ai.google.dev/
- First run downloads model (~80MB) - wait for completion
Documentation Support Agent - Siddhi Pandya
Built with โค๏ธ for accurate, trustworthy document Q&A
