TubeChat Pro (YouTube RAG Assistant)

Streamlit app that lets you chat with a YouTube video using a local RAG stack. The app downloads a video's audio with yt-dlp, transcribes it with Whisper, indexes the transcript with Chroma + Ollama embeddings, and answers your questions with an Ollama-served LLM.

Features

Paste a YouTube URL and auto-download the audio (mp3).
Local transcription via Whisper (base model).
RAG pipeline: chunking with RecursiveCharacterTextSplitter, embeddings with nomic-embed-text, retrieval via Chroma, responses with llama3.2.
Chat-style UI built in Streamlit; transcript is viewable in-app.
Audio file is cleaned up after processing; transcript is written to transcription.txt for reuse.
CLI sample (main.py) that demonstrates the same RAG flow against a web page.

Prerequisites

Python 3.10+ and pip.
FFmpeg installed and available on PATH (required by yt-dlp and Whisper).
Ollama running locally with the models:
- ollama pull nomic-embed-text
- ollama pull llama3.2

Installation

pip install \
  streamlit \
  yt-dlp \
  openai-whisper \
  langchain \
  langchain-community \
  langchain-text-splitters \
  langchain-ollama \
  chromadb

Running the Streamlit app

streamlit run app.py

Open the provided local URL.
Paste a YouTube link and click ANALYZE ⚡.
After processing, expand Show Transcript if you want to inspect the raw text.
Ask questions in the chat box; answers are grounded in the transcript.

Notes:

Audio is removed after analysis; the transcript persists at transcription.txt (also listed in .gitignore).
If the page fails to load models, confirm Ollama is running and the two models above are available.

Using the CLI example

python main.py

The script loads a sample Turkish Wikipedia page, builds the same RAG chain, and answers questions in a loop (q to quit).

Troubleshooting

ffmpeg not found: Install FFmpeg and ensure it is on PATH.
Ollama errors / model not found: Pull the required models and restart Ollama.
Long processing times: Whisper transcription and embedding creation are CPU-heavy; GPU-capable environments speed this up.
Permission issues writing files: Ensure the repo directory is writable; audio and transcript files are created in the project root.

👨‍💻 Author

Batuhan Küçükaydın
Software Engineer | Computer Engineer | iOS Developer
📫 LinkedIn • GitHub • Medium

⭐️ Support

If you like this project, please consider giving it a star 🌟
It really helps me keep building and improving!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
YoutubeRAG		YoutubeRAG
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TubeChat Pro (YouTube RAG Assistant)

Features

Prerequisites

Installation

Running the Streamlit app

Using the CLI example

Troubleshooting

👨‍💻 Author

⭐️ Support

About

Uh oh!

Releases

Packages

Languages

batukucukaydin/YoutubeRAG

Folders and files

Latest commit

History

Repository files navigation

TubeChat Pro (YouTube RAG Assistant)

Features

Prerequisites

Installation

Running the Streamlit app

Using the CLI example

Troubleshooting

👨‍💻 Author

⭐️ Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages