This workshop focuses on enhancing the Retrieval component in Retrieval Augmented Generation (RAG) systems using Qdrant vector database. Through hands-on Jupyter notebooks, you'll learn advanced techniques for vector search, multiple embedding models, hybrid retrieval strategies, and practical optimizations to improve RAG system performance.
- Python 3.12 or higher
- uv - Ultra-fast Python package manager
- Docker (optional) - For running Qdrant locally
- LLM API access - Anthropic, OpenAI, or Hugging Face account
Tip
🎉 Free LLM Access Available! HuggingFace offers a generous free tier for their Inference API with access to thousands of specialized models across multiple AI tasks. Perfect for experimenting with this workshop without upfront costs!
git clone <repository-url>
cd workshop-improving-r-in-rag
On macOS and Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh
On Windows:
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
Alternative (using pip):
pip install uv
uv sync
You can use either a local Qdrant instance or Qdrant Cloud's free tier.
docker run -p "6333:6333" -p "6334:6334" -v "$(pwd)/qdrant_storage:/qdrant/storage:z" "qdrant/qdrant:v1.15.4"
Sign up for Qdrant Cloud and use the free 1GB cluster.
Create a .env
file in the notebooks/
directory with your API credentials:
# LLM provider settings
LLM_PROVIDER="huggingface"
HF_TOKEN="your-huggingface-token"
# Qdrant settings
QDRANT_URL="http://localhost:6333"
# QDRANT_API_KEY="your-cloud-api-key" # Only needed for Qdrant Cloud
To get your HuggingFace token:
- Create a free account at huggingface.co
- Go to Token Settings
- Create a new token with "Read" permissions
- Copy the token to your
.env
file
# For Anthropic
LLM_PROVIDER="anthropic"
ANTHROPIC_API_KEY="your-api-key-here"
# For OpenAI
LLM_PROVIDER="openai"
OPENAI_API_KEY="your-api-key-here"
# Qdrant settings
QDRANT_URL="http://localhost:6333"
# QDRANT_API_KEY="your-cloud-api-key" # Only needed for Qdrant Cloud
All Supported LLM providers:
- 🆓 Hugging Face (Recommended): Generous free tier with thousands of models - Set
LLM_PROVIDER="huggingface"
andHF_TOKEN
- Anthropic: Set
LLM_PROVIDER="anthropic"
andANTHROPIC_API_KEY
- OpenAI: Set
LLM_PROVIDER="openai"
andOPENAI_API_KEY
This workshop consists of 4 progressive notebooks:
- Environment setup and dependency installation
- Qdrant connectivity testing
- Introduction to the tech stack (Qdrant, FastEmbed, any-llm-sdk)
- Loading HackerNews dataset into Qdrant
- Building a basic RAG pipeline with dense vectors
- Implementing payload-based filtering
- Testing retrieval quality and LLM integration
- Setting up multiple vector representations per document
- Implementing sparse vectors (BM25) for keyword matching
- Multi-vector embeddings with ColBERT
- Hybrid search strategies: retrieval + reranking, and fusion methods
- Search result diversification using Maximal Marginal Relevance
- Applying business rules and constraints
- Additional optimization techniques for production RAG systems
Start Jupyter Lab:
uv run jupyter lab
Or Jupyter Notebook:
uv run jupyter notebook
Then navigate through the notebooks in order, starting with 00-set-up-environment.ipynb
.
- Qdrant - High-performance vector database for dense, sparse, and multi-vector retrieval
- FastEmbed - Efficient text-to-vector conversion with multiple embedding models
- any-llm-sdk - Unified interface for multiple LLM providers
The workshop uses a curated HackerNews submissions dataset containing tech discussions, startup ideas, and programming topics - perfect for demonstrating various retrieval challenges and solutions.
By completing this workshop, you'll understand:
- How to set up and configure Qdrant for production RAG systems
- Different embedding models and when to use each approach
- Hybrid retrieval strategies combining dense and sparse vectors
- Advanced reranking and fusion techniques
- Practical optimizations for improving retrieval relevance
- How to handle real-world RAG challenges like search diversity and business constraints