A powerful retrieval-augmented generation (RAG) application that allows you to chat with your own documents (PDFs and FAQs) using Google Gemini and FAISS.
- PDF Upload: Upload any PDF and ask questions about its content.
- FAQ Support: Pre-loaded knowledge base from structured JSON FAQs.
- Local Embeddings: Fast, local vectorization using
sentence-transformers. - Smart Retrieval: Uses FAISS for efficient similarity search.
- Strict Answering: The AI only answers based on the provided documents to prevent "hallucinations."
- Source Attribution: See exactly which page or FAQ the AI used to generate the answer.
This project follows a modular RAG architecture. Here is a breakdown of "what is where":
app.py: The main Streamlit interface. It orchestrates the entire flow from document upload to chat interactions.
config.py: Handles environment variables and API key management usingpython-dotenv.loaders/:pdf_loader.py: Usespypdfto extract text from PDF files.text_loader.py: Processes JSON files into a structured document format.
utils/:text_cleaning.py: Normalizes extracted text and fixes common PDF character-spacing issues.chunking.py: Splits long documents into manageable chunks (700 chars with overlap) for better search accuracy.
embeddings/:embedder.py: Converts text into 384-dimensional vectors using theall-MiniLM-L6-v2model.
vectorstore/:faiss_store.py: Manages the FAISS index for high-speed vector similarity searching.
llm/:gemini_client.py: Wrapper for the Google Gemini API (gemini-2.5-flash-lite) to generate responses.
rag/:pipeline.py: The logic that builds the "Context + Question" prompt for the LLM.
- Frontend: Streamlit
- LLM: Google Gemini API
- Vector DB: FAISS
- Embeddings: Sentence-Transformers (
all-MiniLM-L6-v2) - Lang: Python 3.10+
-
Clone the repository:
git clone <repo-url> cd final-project
-
Set up virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Configuration: Create a
.envfile in the root directory and add your Gemini API Key:GEMINI_API_KEY=your_api_key_here
-
Run the App:
streamlit run app.py
- Open the URL provided by Streamlit (usually
http://localhost:8501). - The app will automatically load the default FAQ and sample PDF.
- Upload a new PDF using the sidebar/uploader to chat with specific documents.
- Type your question in the chat input at the bottom.
- Check the "Sources" dropdown below the AI's answer to verify its facts.