This project is a simple implementation with a RAG system with YouTube data using LlamaIndex for efficient data retrieval and Qdrant or Chroma as the VectorDB to store and search the vectors. It also includes an optional Web Research Workflow that leverages real-time web data.
- Replace
<query>
with your query and<youtube_url>
with the YouTube URL.yt-dlp -f bestaudio --extract-audio --audio-format mp3 <youtube_url> -o "audio/audio.mp3" cd src/rag uv run whisper.py # Stop here if only want to load the data uv run rag.py --query <query> --path "../../qdrant" --collection "yt" --qdrant
- Explaination of args for
rag.py
:--query
: The query you want to search for--path
: The path to the VectorDB on disk--collection
: The collection name in the VectorDB--qdrant
: Use Qdrant as the VectorDB (default)--chroma
: Use Chroma as the VectorDB
- if use Qdrant or Pinecone (Note: not supported yet)
Copy your QDRANT_API_KEY and QDRANT_URL to the .env file
cp .env.example .env
-
This workflow adds RAG to the workflow implemented in Ollama Deep Researcher, see it for more details.
-
The RAG system is used on hf_docs dataset to answer the query by default
-
Modify to use duckduckgo as the search API
-
Graph Workflow:
-
Spin up Ollama server:
ollama serve
NOTE: Pull the model you want first, for example:
ollama pull deepseek-r1:8b
-
See Ollama Deep Researcher for details on the environment variables.
cp .env.example .env
-
If want to use your YouTube data as the dataset for the RAG system, follow the steps in the RAG-Only Usage section to load the data first.
DON'T run the
rag.py
script. -
Run the workflow:
uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.11 langgraph dev
NOTE: in
graph.py
, in therag_research
function, see comments if you want to use mock rag data instead of the real data.
-
RAG-Only Usage: Uses the HF Docs dataset
cd src/rag uv run hf_docs.py uv run rag.py --query "How to create a pipeline object?" --path "../../qdrant" --collection "hf_docs" --qdrant
See llama3.1_hf_qdrant.txt for the output.
-
Web Research Workflow:
- uses
deepseek-r1:8b
model
ollama pull deepseek-r1:8b ollama serve uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.11 langgraph dev
- Prompt 1: What's Model Context Protocol?
- See output_What's Model Context Protocol?.md for the output.
- Prompt 2: What are the FAANG companies?
- See output_What are the FAANG companies?.md for the output.
- Prompt 3: How to create a custom huggingface pipeline object?
- See output_How to create a custom huggingface pipeline object?.md for the output.
- uses
-
Embeddings (Loads from HuggingFace):
- dense vectors: gte-small
- sparse vectors: Splade_PP_en_v1
-
VectorDBs:
- Support Hybrid Vectors (dense + sparse)
Note: sparse vectors defaults to prithvida/Splade_PP_en_v1
- Dense Vectors: Chroma
- Sparse Vectors: BM25
- Support Hybrid Vectors (dense + sparse)
-
Reranker:
-
Language Models (Loads from HuggingFace):