A Retrieval-Augmented Generation (RAG) assistant that combines local AI models via Ollama with cloud-hosted vector storage using Qdrant Cloud. Built with Streamlit for an intuitive web interface.
- π Document Processing: Upload and process PDF, TXT, and CSV files with intelligent text extraction
- π Web Scraping: Extract content from web URLs using BeautifulSoup with smart content detection
- π Semantic Search: Advanced vector similarity search with session-based filtering
- π¬ Interactive Chat: Natural language querying with context-aware responses and source attribution
- π·οΈ Smart Tagging: Metadata-based document organization and retrieval
- π Session Isolation: Each session maintains separate document contexts for privacy
- β‘ Local AI: Fast inference using Ollama with Mistral model (privacy-focused)
- βοΈ Cloud Vector Store: Scalable vector storage with Qdrant Cloud
- π¨ Modern UI: Clean, responsive interface with loading indicators and visual feedback
- π Real-time Processing: Live document processing with progress indicators
βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββ
β Streamlit UI ββββββ RAG Pipeline ββββββ Qdrant Cloud β
β - File Upload β β - Orchestration β β - Vector Store β
β - Chat Interface β β - Session Mgmt β β - Similarity β
β - URL Input β β - Error Handling β β - Search β
βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββ
β
ββββββββββββββββββββ
β Ollama (Local) β
β - Mistral LLM β
β - nomic-embed β
β - Privacy First β
ββββββββββββββββββββ
- Document Processor (
src/document_processor.py): Handles PDF, TXT, and CSV file processing with multiple fallback methods - Web Scraper (
src/web_scraper.py): Extracts clean content from web URLs with intelligent content detection - Vector Store (
src/vector_store.py): Qdrant integration with session-based filtering and metadata indexing - LLM Client (
src/llm_client.py): Ollama integration for local AI inference with error handling - Embedding Client (
src/embeddings.py): Generates vector embeddings using nomic-embed-text model - RAG Pipeline (
src/rag_pipeline.py): Orchestrates the entire retrieval-augmented generation flow
Each document chunk is stored with comprehensive metadata:
{
"source_type": "document" | "web",
"source_name": "filename_or_url",
"session_id": "unique_session_identifier",
"chunk_id": "chunk_index",
"content": "actual_text_content"
}When querying, only vectors matching the current session are retrieved, ensuring complete context isolation between different users or sessions.
- Python 3.8+ (Python 3.9+ recommended)
- Ollama installed and running
- Qdrant Cloud account (free tier available - 1GB storage, 100K vectors)
- Git for cloning the repository
-
Clone the Repository
git clone https://github.com/yourusername/rag-assistant-ollama.git cd rag-assistant-ollama -
Create and Activate Virtual Environment
# Create virtual environment python3 -m venv venv # Activate virtual environment source venv/bin/activate # Linux/macOS
-
Install Python Dependencies
# Upgrade pip first pip install --upgrade pip # Install all required packages pip install -r requirements.txt
-
Install and Setup Ollama
# Install Ollama (if not already installed) curl -fsSL https://ollama.ai/install.sh | sh # Start Ollama service (keep this terminal open) ollama serve
-
Download Required AI Models
# In a new terminal window, download the models ollama pull mistral # Main language model (~4GB) ollama pull nomic-embed-text # Embedding model (~274MB) # Verify models are installed ollama list
-
Configure Environment Variables
# Copy the example environment file cp .env.example .env # Edit the .env file with your Qdrant credentials nano .env # or use your preferred editor
-
Run the Application
# Make sure your virtual environment is activated streamlit run app.py
-
Clone the Repository
git clone https://github.com/yourusername/rag-assistant-ollama.git cd rag-assistant-ollama -
Create and Activate Virtual Environment
# Create virtual environment python -m venv venv # Activate virtual environment venv\Scripts\activate
-
Install Python Dependencies
# Upgrade pip first python -m pip install --upgrade pip # Install all required packages pip install -r requirements.txt
-
Install and Setup Ollama
- Download Ollama from ollama.ai
- Install the downloaded executable
- Open Command Prompt as Administrator and run:
ollama serve
-
Download Required AI Models
# In a new Command Prompt window ollama pull mistral ollama pull nomic-embed-text # Verify installation ollama list -
Configure Environment Variables
# Copy the example file copy .env.example .env # Edit .env with Notepad or your preferred editor notepad .env
-
Run the Application
# Ensure virtual environment is activated streamlit run app.py
Create a .env file in the project root with the following configuration:
# Qdrant Cloud Configuration (Required)
QDRANT_URL=https://your-cluster-url.qdrant.io
QDRANT_API_KEY=your-qdrant-api-key
COLLECTION_NAME=rag_documents
# Ollama Configuration (Local AI Models)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=mistral
EMBEDDING_MODEL=nomic-embed-text- Create Account: Sign up at Qdrant Cloud
- Create Cluster:
- Choose the Free Tier (1GB storage, 100K vectors)
- Select your preferred region
- Wait for cluster creation (usually 2-3 minutes)
- Get Credentials:
- Copy your Cluster URL (looks like:
https://xyz.qdrant.io) - Copy your API Key from the cluster dashboard
- Copy your Cluster URL (looks like:
- Update Configuration: Add these credentials to your
.envfile
The application uses two models:
- mistral: Main language model for generating responses (~4GB)
- nomic-embed-text: Embedding model for vector generation (~274MB)
You can change models by updating the .env file:
OLLAMA_MODEL=llama2 # Alternative: llama2, codellama, etc.
EMBEDDING_MODEL=all-minilm # Alternative embedding models-
Activate Virtual Environment:
source venv/bin/activate # Linux/macOS # or venv\Scripts\activate # Windows
-
Start Ollama (in separate terminal):
ollama serve
-
Launch Application:
streamlit run app.py
-
Access Interface: Open your browser to
http://localhost:8501
- Supported Formats: PDF, TXT, CSV
- Multiple Files: Upload several files at once
- Auto-Processing: Files are automatically processed when uploaded
- Progress Tracking: Visual progress bar shows processing status
- Enter URL: Paste any web URL in the input field
- Click "Add URL": Button processes the content
- Smart Extraction: Automatically extracts main content, ignoring navigation and ads
- Loading Indicator: Shows progress while scraping and processing
- Ask Questions: Type natural language questions about your content
- Press Enter: Submit questions using Enter key or the Send button
- View Sources: See which documents contributed to each answer
- Session Context: All questions are answered within your current session's context
- Automatic Sessions: Each browser session gets a unique ID
- Clear All: Use the "Clear All" button to start fresh
- Data Isolation: Your documents are never mixed with other users' data
Upload: Research papers (PDFs), articles (URLs)
Ask: "What are the main findings across these studies?"
Upload: Reports (PDFs), company data (CSV)
Ask: "What trends do you see in our quarterly data?"
Upload: Textbooks (PDFs), online tutorials (URLs)
Ask: "Explain the key concepts from chapter 3"
Upload: Multiple documents on a topic
Ask: "Compare the different perspectives presented"
rag-assistant-ollama/
βββ app.py # Main Streamlit application with UI
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variables template
βββ .gitignore # Git ignore patterns
βββ README.md # This comprehensive guide
βββ LICENSE # MIT License
βββ src/ # Source code modules
βββ __init__.py # Package initialization
βββ rag_pipeline.py # Main RAG orchestration logic
βββ vector_store.py # Qdrant integration and vector operations
βββ llm_client.py # Ollama LLM client with error handling
βββ embeddings.py # Embedding generation using nomic-embed-text
βββ document_processor.py # Document processing with multiple formats
βββ web_scraper.py # Web content extraction and cleaning
- Session-Based Isolation: Each user session maintains separate document contexts using unique session IDs
- Modular Architecture: Clear separation of concerns for maintainability and testing
- Comprehensive Error Handling: Graceful degradation and user-friendly error messages
- Scalable Storage: Cloud-based vector storage for production scalability
- Privacy-Focused: Local AI model inference keeps your data private
- Responsive UI: Modern, clean interface that works on all devices
-
Test Ollama Connection:
curl http://localhost:11434/api/tags
-
Test with Sample Content:
- Upload a simple text file
- Add a Wikipedia URL
- Ask: "What is this content about?"
-
Verify Vector Storage:
- Check Qdrant Cloud dashboard for stored vectors
- Verify session isolation by clearing and re-adding content
Create a Dockerfile:
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]- Streamlit Cloud: Direct deployment from GitHub
- Heroku: Easy deployment with buildpacks
- AWS/GCP/Azure: Full control with container services
- Railway/Render: Simple deployment platforms
This project is licensed under the MIT License - see the LICENSE file for details.
β If you find this project useful, please consider giving it a star!