🤖 Personal AI Assistant (RAG Pipeline)

A powerful retrieval-augmented generation (RAG) application that allows you to chat with your own documents (PDFs and FAQs) using Google Gemini and FAISS.

🚀 Features

PDF Upload: Upload any PDF and ask questions about its content.
FAQ Support: Pre-loaded knowledge base from structured JSON FAQs.
Local Embeddings: Fast, local vectorization using sentence-transformers.
Smart Retrieval: Uses FAISS for efficient similarity search.
Strict Answering: The AI only answers based on the provided documents to prevent "hallucinations."
Source Attribution: See exactly which page or FAQ the AI used to generate the answer.

🏗️ Project Architecture (A-Z)

This project follows a modular RAG architecture. Here is a breakdown of "what is where":

Core Entry Point

app.py: The main Streamlit interface. It orchestrates the entire flow from document upload to chat interactions.

Source Code (`src/`)

config.py: Handles environment variables and API key management using python-dotenv.
loaders/:
- pdf_loader.py: Uses pypdf to extract text from PDF files.
- text_loader.py: Processes JSON files into a structured document format.
utils/:
- text_cleaning.py: Normalizes extracted text and fixes common PDF character-spacing issues.
- chunking.py: Splits long documents into manageable chunks (700 chars with overlap) for better search accuracy.
embeddings/:
- embedder.py: Converts text into 384-dimensional vectors using the all-MiniLM-L6-v2 model.
vectorstore/:
- faiss_store.py: Manages the FAISS index for high-speed vector similarity searching.
llm/:
- gemini_client.py: Wrapper for the Google Gemini API (gemini-2.5-flash-lite) to generate responses.
rag/:
- pipeline.py: The logic that builds the "Context + Question" prompt for the LLM.

🛠️ Tech Stack

Frontend: Streamlit
LLM: Google Gemini API
Vector DB: FAISS
Embeddings: Sentence-Transformers (all-MiniLM-L6-v2)
Lang: Python 3.10+

⚙️ Installation & Setup

Clone the repository:
```
git clone <repo-url>
cd final-project
```

Set up virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Configuration: Create a .env file in the root directory and add your Gemini API Key:
```
GEMINI_API_KEY=your_api_key_here
```
Run the App:
```
streamlit run app.py
```

📄 Usage

Open the URL provided by Streamlit (usually http://localhost:8501).
The app will automatically load the default FAQ and sample PDF.
Upload a new PDF using the sidebar/uploader to chat with specific documents.
Type your question in the chat input at the bottom.
Check the "Sources" dropdown below the AI's answer to verify its facts.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data/documents		data/documents
src		src
.gitignore		.gitignore
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test.py		test.py
test_pipeline.py		test_pipeline.py
test_vectorstore.py		test_vectorstore.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Personal AI Assistant (RAG Pipeline)

🚀 Features

🏗️ Project Architecture (A-Z)

Core Entry Point

Source Code (`src/`)

🛠️ Tech Stack

⚙️ Installation & Setup

📄 Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 Personal AI Assistant (RAG Pipeline)

🚀 Features

🏗️ Project Architecture (A-Z)

Core Entry Point

Source Code (src/)

🛠️ Tech Stack

⚙️ Installation & Setup

📄 Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Source Code (`src/`)

Packages