Skip to content

A RAG assistant using Ollama (Mistral), Qdrant vector DB, and Streamlit UI. Upload documents, scrape web pages, and interact with your data using real-time, session-isolated chat.

License

Notifications You must be signed in to change notification settings

sourabhmarne777/rag-assistant-ollama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– RAG Assistant with Ollama & Qdrant

A Retrieval-Augmented Generation (RAG) assistant that combines local AI models via Ollama with cloud-hosted vector storage using Qdrant Cloud. Built with Streamlit for an intuitive web interface.

Python License Streamlit Ollama

🌟 Features

  • πŸ“ Document Processing: Upload and process PDF, TXT, and CSV files with intelligent text extraction
  • 🌐 Web Scraping: Extract content from web URLs using BeautifulSoup with smart content detection
  • πŸ” Semantic Search: Advanced vector similarity search with session-based filtering
  • πŸ’¬ Interactive Chat: Natural language querying with context-aware responses and source attribution
  • 🏷️ Smart Tagging: Metadata-based document organization and retrieval
  • πŸ”’ Session Isolation: Each session maintains separate document contexts for privacy
  • ⚑ Local AI: Fast inference using Ollama with Mistral model (privacy-focused)
  • ☁️ Cloud Vector Store: Scalable vector storage with Qdrant Cloud
  • 🎨 Modern UI: Clean, responsive interface with loading indicators and visual feedback
  • πŸš€ Real-time Processing: Live document processing with progress indicators

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    Streamlit UI   │────│  RAG Pipeline     │────│  Qdrant Cloud   β”‚
β”‚  - File Upload    β”‚    β”‚  - Orchestration  β”‚    β”‚  - Vector Store β”‚
β”‚  - Chat Interface β”‚    β”‚  - Session Mgmt   β”‚    β”‚  - Similarity   β”‚
β”‚  - URL Input      β”‚    β”‚  - Error Handling β”‚    β”‚  - Search       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                   β”‚
                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                          β”‚  Ollama (Local)  β”‚
                          β”‚  - Mistral LLM   β”‚
                          β”‚  - nomic-embed   β”‚
                          β”‚  - Privacy First β”‚
                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”§ Core Components

  1. Document Processor (src/document_processor.py): Handles PDF, TXT, and CSV file processing with multiple fallback methods
  2. Web Scraper (src/web_scraper.py): Extracts clean content from web URLs with intelligent content detection
  3. Vector Store (src/vector_store.py): Qdrant integration with session-based filtering and metadata indexing
  4. LLM Client (src/llm_client.py): Ollama integration for local AI inference with error handling
  5. Embedding Client (src/embeddings.py): Generates vector embeddings using nomic-embed-text model
  6. RAG Pipeline (src/rag_pipeline.py): Orchestrates the entire retrieval-augmented generation flow

🏷️ Session-Based Architecture

Each document chunk is stored with comprehensive metadata:

{
  "source_type": "document" | "web",
  "source_name": "filename_or_url",
  "session_id": "unique_session_identifier",
  "chunk_id": "chunk_index",
  "content": "actual_text_content"
}

When querying, only vectors matching the current session are retrieved, ensuring complete context isolation between different users or sessions.

πŸš€ Quick Start

Prerequisites

  • Python 3.8+ (Python 3.9+ recommended)
  • Ollama installed and running
  • Qdrant Cloud account (free tier available - 1GB storage, 100K vectors)
  • Git for cloning the repository

πŸ“‹ Installation Guide

🐧 Linux/macOS Setup

  1. Clone the Repository

    git clone https://github.com/yourusername/rag-assistant-ollama.git
    cd rag-assistant-ollama
  2. Create and Activate Virtual Environment

    # Create virtual environment
    python3 -m venv venv
    
    # Activate virtual environment
    source venv/bin/activate  # Linux/macOS
  3. Install Python Dependencies

    # Upgrade pip first
    pip install --upgrade pip
    
    # Install all required packages
    pip install -r requirements.txt
  4. Install and Setup Ollama

    # Install Ollama (if not already installed)
    curl -fsSL https://ollama.ai/install.sh | sh
    
    # Start Ollama service (keep this terminal open)
    ollama serve
  5. Download Required AI Models

    # In a new terminal window, download the models
    ollama pull mistral          # Main language model (~4GB)
    ollama pull nomic-embed-text # Embedding model (~274MB)
    
    # Verify models are installed
    ollama list
  6. Configure Environment Variables

    # Copy the example environment file
    cp .env.example .env
    
    # Edit the .env file with your Qdrant credentials
    nano .env  # or use your preferred editor
  7. Run the Application

    # Make sure your virtual environment is activated
    streamlit run app.py

πŸͺŸ Windows Setup

  1. Clone the Repository

    git clone https://github.com/yourusername/rag-assistant-ollama.git
    cd rag-assistant-ollama
  2. Create and Activate Virtual Environment

    # Create virtual environment
    python -m venv venv
    
    # Activate virtual environment
    venv\Scripts\activate
  3. Install Python Dependencies

    # Upgrade pip first
    python -m pip install --upgrade pip
    
    # Install all required packages
    pip install -r requirements.txt
  4. Install and Setup Ollama

    • Download Ollama from ollama.ai
    • Install the downloaded executable
    • Open Command Prompt as Administrator and run:
    ollama serve
  5. Download Required AI Models

    # In a new Command Prompt window
    ollama pull mistral
    ollama pull nomic-embed-text
    
    # Verify installation
    ollama list
  6. Configure Environment Variables

    # Copy the example file
    copy .env.example .env
    
    # Edit .env with Notepad or your preferred editor
    notepad .env
  7. Run the Application

    # Ensure virtual environment is activated
    streamlit run app.py

βš™οΈ Configuration

πŸ” Environment Variables Setup

Create a .env file in the project root with the following configuration:

# Qdrant Cloud Configuration (Required)
QDRANT_URL=https://your-cluster-url.qdrant.io
QDRANT_API_KEY=your-qdrant-api-key
COLLECTION_NAME=rag_documents

# Ollama Configuration (Local AI Models)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=mistral
EMBEDDING_MODEL=nomic-embed-text

☁️ Qdrant Cloud Setup

  1. Create Account: Sign up at Qdrant Cloud
  2. Create Cluster:
    • Choose the Free Tier (1GB storage, 100K vectors)
    • Select your preferred region
    • Wait for cluster creation (usually 2-3 minutes)
  3. Get Credentials:
    • Copy your Cluster URL (looks like: https://xyz.qdrant.io)
    • Copy your API Key from the cluster dashboard
  4. Update Configuration: Add these credentials to your .env file

πŸ€– Ollama Model Configuration

The application uses two models:

  • mistral: Main language model for generating responses (~4GB)
  • nomic-embed-text: Embedding model for vector generation (~274MB)

You can change models by updating the .env file:

OLLAMA_MODEL=llama2          # Alternative: llama2, codellama, etc.
EMBEDDING_MODEL=all-minilm   # Alternative embedding models

πŸ“– Usage Guide

πŸš€ Starting the Application

  1. Activate Virtual Environment:

    source venv/bin/activate  # Linux/macOS
    # or
    venv\Scripts\activate     # Windows
  2. Start Ollama (in separate terminal):

    ollama serve
  3. Launch Application:

    streamlit run app.py
  4. Access Interface: Open your browser to http://localhost:8501

πŸ“ Adding Content

Upload Documents

  • Supported Formats: PDF, TXT, CSV
  • Multiple Files: Upload several files at once
  • Auto-Processing: Files are automatically processed when uploaded
  • Progress Tracking: Visual progress bar shows processing status

Add Web Content

  • Enter URL: Paste any web URL in the input field
  • Click "Add URL": Button processes the content
  • Smart Extraction: Automatically extracts main content, ignoring navigation and ads
  • Loading Indicator: Shows progress while scraping and processing

πŸ’¬ Chatting with Your Data

  1. Ask Questions: Type natural language questions about your content
  2. Press Enter: Submit questions using Enter key or the Send button
  3. View Sources: See which documents contributed to each answer
  4. Session Context: All questions are answered within your current session's context

πŸ”„ Session Management

  • Automatic Sessions: Each browser session gets a unique ID
  • Clear All: Use the "Clear All" button to start fresh
  • Data Isolation: Your documents are never mixed with other users' data

🎯 Example Use Cases

πŸ“š Research Assistant

Upload: Research papers (PDFs), articles (URLs)
Ask: "What are the main findings across these studies?"

πŸ“Š Business Intelligence

Upload: Reports (PDFs), company data (CSV)
Ask: "What trends do you see in our quarterly data?"

πŸ“– Learning Companion

Upload: Textbooks (PDFs), online tutorials (URLs)
Ask: "Explain the key concepts from chapter 3"

πŸ” Content Analysis

Upload: Multiple documents on a topic
Ask: "Compare the different perspectives presented"

πŸ› οΈ Development

πŸ“ Project Structure

rag-assistant-ollama/
β”œβ”€β”€ app.py                    # Main Streamlit application with UI
β”œβ”€β”€ requirements.txt          # Python dependencies
β”œβ”€β”€ .env.example             # Environment variables template
β”œβ”€β”€ .gitignore               # Git ignore patterns
β”œβ”€β”€ README.md                # This comprehensive guide
β”œβ”€β”€ LICENSE                  # MIT License
└── src/                     # Source code modules
    β”œβ”€β”€ __init__.py          # Package initialization
    β”œβ”€β”€ rag_pipeline.py      # Main RAG orchestration logic
    β”œβ”€β”€ vector_store.py      # Qdrant integration and vector operations
    β”œβ”€β”€ llm_client.py        # Ollama LLM client with error handling
    β”œβ”€β”€ embeddings.py        # Embedding generation using nomic-embed-text
    β”œβ”€β”€ document_processor.py # Document processing with multiple formats
    └── web_scraper.py       # Web content extraction and cleaning

πŸ”§ Key Design Decisions

  1. Session-Based Isolation: Each user session maintains separate document contexts using unique session IDs
  2. Modular Architecture: Clear separation of concerns for maintainability and testing
  3. Comprehensive Error Handling: Graceful degradation and user-friendly error messages
  4. Scalable Storage: Cloud-based vector storage for production scalability
  5. Privacy-Focused: Local AI model inference keeps your data private
  6. Responsive UI: Modern, clean interface that works on all devices

πŸ§ͺ Testing Your Setup

  1. Test Ollama Connection:

    curl http://localhost:11434/api/tags
  2. Test with Sample Content:

    • Upload a simple text file
    • Add a Wikipedia URL
    • Ask: "What is this content about?"
  3. Verify Vector Storage:

    • Check Qdrant Cloud dashboard for stored vectors
    • Verify session isolation by clearing and re-adding content

πŸš€ Production Deployment

🐳 Docker Deployment

Create a Dockerfile:

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
EXPOSE 8501

CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]

☁️ Cloud Deployment Options

  • Streamlit Cloud: Direct deployment from GitHub
  • Heroku: Easy deployment with buildpacks
  • AWS/GCP/Azure: Full control with container services
  • Railway/Render: Simple deployment platforms

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

⭐ If you find this project useful, please consider giving it a star!

About

A RAG assistant using Ollama (Mistral), Qdrant vector DB, and Streamlit UI. Upload documents, scrape web pages, and interact with your data using real-time, session-isolated chat.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages