🤖 RAG Assistant with Ollama & Qdrant

A Retrieval-Augmented Generation (RAG) assistant that combines local AI models via Ollama with cloud-hosted vector storage using Qdrant Cloud. Built with Streamlit for an intuitive web interface.

🌟 Features

📁 Document Processing: Upload and process PDF, TXT, and CSV files with intelligent text extraction
🌐 Web Scraping: Extract content from web URLs using BeautifulSoup with smart content detection
🔍 Semantic Search: Advanced vector similarity search with session-based filtering
💬 Interactive Chat: Natural language querying with context-aware responses and source attribution
🏷️ Smart Tagging: Metadata-based document organization and retrieval
🔒 Session Isolation: Each session maintains separate document contexts for privacy
⚡ Local AI: Fast inference using Ollama with Mistral model (privacy-focused)
☁️ Cloud Vector Store: Scalable vector storage with Qdrant Cloud
🎨 Modern UI: Clean, responsive interface with loading indicators and visual feedback
🚀 Real-time Processing: Live document processing with progress indicators

🏗️ Architecture

┌───────────────────┐    ┌───────────────────┐    ┌─────────────────┐
│    Streamlit UI   │────│  RAG Pipeline     │────│  Qdrant Cloud   │
│  - File Upload    │    │  - Orchestration  │    │  - Vector Store │
│  - Chat Interface │    │  - Session Mgmt   │    │  - Similarity   │
│  - URL Input      │    │  - Error Handling │    │  - Search       │
└───────────────────┘    └───────────────────┘    └─────────────────┘
                                   │
                          ┌──────────────────┐
                          │  Ollama (Local)  │
                          │  - Mistral LLM   │
                          │  - nomic-embed   │
                          │  - Privacy First │
                          └──────────────────┘

🔧 Core Components

Document Processor (src/document_processor.py): Handles PDF, TXT, and CSV file processing with multiple fallback methods
Web Scraper (src/web_scraper.py): Extracts clean content from web URLs with intelligent content detection
Vector Store (src/vector_store.py): Qdrant integration with session-based filtering and metadata indexing
LLM Client (src/llm_client.py): Ollama integration for local AI inference with error handling
Embedding Client (src/embeddings.py): Generates vector embeddings using nomic-embed-text model
RAG Pipeline (src/rag_pipeline.py): Orchestrates the entire retrieval-augmented generation flow

🏷️ Session-Based Architecture

Each document chunk is stored with comprehensive metadata:

{
  "source_type": "document" | "web",
  "source_name": "filename_or_url",
  "session_id": "unique_session_identifier",
  "chunk_id": "chunk_index",
  "content": "actual_text_content"
}

When querying, only vectors matching the current session are retrieved, ensuring complete context isolation between different users or sessions.

🚀 Quick Start

Prerequisites

Python 3.8+ (Python 3.9+ recommended)
Ollama installed and running
Qdrant Cloud account (free tier available - 1GB storage, 100K vectors)
Git for cloning the repository

📋 Installation Guide

🐧 Linux/macOS Setup

Clone the Repository

git clone https://github.com/yourusername/rag-assistant-ollama.git
cd rag-assistant-ollama

Create and Activate Virtual Environment

# Create virtual environment
python3 -m venv venv

# Activate virtual environment
source venv/bin/activate  # Linux/macOS

Install Python Dependencies

# Upgrade pip first
pip install --upgrade pip

# Install all required packages
pip install -r requirements.txt

Install and Setup Ollama

# Install Ollama (if not already installed)
curl -fsSL https://ollama.ai/install.sh | sh

# Start Ollama service (keep this terminal open)
ollama serve

Download Required AI Models

# In a new terminal window, download the models
ollama pull mistral          # Main language model (~4GB)
ollama pull nomic-embed-text # Embedding model (~274MB)

# Verify models are installed
ollama list

Configure Environment Variables

# Copy the example environment file
cp .env.example .env

# Edit the .env file with your Qdrant credentials
nano .env  # or use your preferred editor

Run the Application

# Make sure your virtual environment is activated
streamlit run app.py

🪟 Windows Setup

Clone the Repository

git clone https://github.com/yourusername/rag-assistant-ollama.git
cd rag-assistant-ollama

Create and Activate Virtual Environment

# Create virtual environment
python -m venv venv

# Activate virtual environment
venv\Scripts\activate

Install Python Dependencies

# Upgrade pip first
python -m pip install --upgrade pip

# Install all required packages
pip install -r requirements.txt

Install and Setup Ollama
- Download Ollama from ollama.ai
- Install the downloaded executable
- Open Command Prompt as Administrator and run:
```
ollama serve
```

Download Required AI Models

# In a new Command Prompt window
ollama pull mistral
ollama pull nomic-embed-text

# Verify installation
ollama list

Configure Environment Variables

# Copy the example file
copy .env.example .env

# Edit .env with Notepad or your preferred editor
notepad .env

Run the Application

# Ensure virtual environment is activated
streamlit run app.py

⚙️ Configuration

🔐 Environment Variables Setup

Create a .env file in the project root with the following configuration:

# Qdrant Cloud Configuration (Required)
QDRANT_URL=https://your-cluster-url.qdrant.io
QDRANT_API_KEY=your-qdrant-api-key
COLLECTION_NAME=rag_documents

# Ollama Configuration (Local AI Models)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=mistral
EMBEDDING_MODEL=nomic-embed-text

☁️ Qdrant Cloud Setup

Create Account: Sign up at Qdrant Cloud
Create Cluster:
- Choose the Free Tier (1GB storage, 100K vectors)
- Select your preferred region
- Wait for cluster creation (usually 2-3 minutes)
Get Credentials:
- Copy your Cluster URL (looks like: https://xyz.qdrant.io)
- Copy your API Key from the cluster dashboard
Update Configuration: Add these credentials to your .env file

🤖 Ollama Model Configuration

The application uses two models:

mistral: Main language model for generating responses (~4GB)
nomic-embed-text: Embedding model for vector generation (~274MB)

You can change models by updating the .env file:

OLLAMA_MODEL=llama2          # Alternative: llama2, codellama, etc.
EMBEDDING_MODEL=all-minilm   # Alternative embedding models

📖 Usage Guide

🚀 Starting the Application

Activate Virtual Environment:

source venv/bin/activate  # Linux/macOS
# or
venv\Scripts\activate     # Windows

Start Ollama (in separate terminal):
```
ollama serve
```
Launch Application:
```
streamlit run app.py
```
Access Interface: Open your browser to http://localhost:8501

📁 Adding Content

Upload Documents

Supported Formats: PDF, TXT, CSV
Multiple Files: Upload several files at once
Auto-Processing: Files are automatically processed when uploaded
Progress Tracking: Visual progress bar shows processing status

Add Web Content

Enter URL: Paste any web URL in the input field
Click "Add URL": Button processes the content
Smart Extraction: Automatically extracts main content, ignoring navigation and ads
Loading Indicator: Shows progress while scraping and processing

💬 Chatting with Your Data

Ask Questions: Type natural language questions about your content
Press Enter: Submit questions using Enter key or the Send button
View Sources: See which documents contributed to each answer
Session Context: All questions are answered within your current session's context

🔄 Session Management

Automatic Sessions: Each browser session gets a unique ID
Clear All: Use the "Clear All" button to start fresh
Data Isolation: Your documents are never mixed with other users' data

🎯 Example Use Cases

📚 Research Assistant

Upload: Research papers (PDFs), articles (URLs)
Ask: "What are the main findings across these studies?"

📊 Business Intelligence

Upload: Reports (PDFs), company data (CSV)
Ask: "What trends do you see in our quarterly data?"

📖 Learning Companion

Upload: Textbooks (PDFs), online tutorials (URLs)
Ask: "Explain the key concepts from chapter 3"

🔍 Content Analysis

Upload: Multiple documents on a topic
Ask: "Compare the different perspectives presented"

🛠️ Development

📁 Project Structure

rag-assistant-ollama/
├── app.py                    # Main Streamlit application with UI
├── requirements.txt          # Python dependencies
├── .env.example             # Environment variables template
├── .gitignore               # Git ignore patterns
├── README.md                # This comprehensive guide
├── LICENSE                  # MIT License
└── src/                     # Source code modules
    ├── __init__.py          # Package initialization
    ├── rag_pipeline.py      # Main RAG orchestration logic
    ├── vector_store.py      # Qdrant integration and vector operations
    ├── llm_client.py        # Ollama LLM client with error handling
    ├── embeddings.py        # Embedding generation using nomic-embed-text
    ├── document_processor.py # Document processing with multiple formats
    └── web_scraper.py       # Web content extraction and cleaning

🔧 Key Design Decisions

Session-Based Isolation: Each user session maintains separate document contexts using unique session IDs
Modular Architecture: Clear separation of concerns for maintainability and testing
Comprehensive Error Handling: Graceful degradation and user-friendly error messages
Scalable Storage: Cloud-based vector storage for production scalability
Privacy-Focused: Local AI model inference keeps your data private
Responsive UI: Modern, clean interface that works on all devices

🧪 Testing Your Setup

Test Ollama Connection:
```
curl http://localhost:11434/api/tags
```
Test with Sample Content:
- Upload a simple text file
- Add a Wikipedia URL
- Ask: "What is this content about?"
Verify Vector Storage:
- Check Qdrant Cloud dashboard for stored vectors
- Verify session isolation by clearing and re-adding content

🚀 Production Deployment

🐳 Docker Deployment

Create a Dockerfile:

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
EXPOSE 8501

CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]

☁️ Cloud Deployment Options

Streamlit Cloud: Direct deployment from GitHub
Heroku: Easy deployment with buildpacks
AWS/GCP/Azure: Full control with container services
Railway/Render: Simple deployment platforms

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

⭐ If you find this project useful, please consider giving it a star!

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

License

sourabhmarne777/rag-assistant-ollama

Folders and files

Latest commit

History

Repository files navigation