This project is a powerful Streamlit-based web application that extracts and processes YouTube video transcripts using advanced Natural Language Processing (NLP) techniques.
It provides:
- ✂️ Summarization using Transformer models (T5)
- 🔑 Keyword Extraction with TF-IDF & fallback lemmatization
- 🧠 Topic Modeling using LDA
- 😊 Sentiment Analysis using both VADER and TextBlob
- 📥 Export options to download the results as
.txtor.csv
- Full Name: Monpara Romil Kamleshbhai
- 🎓 B.Tech in Information Technology, LJIET (Graduating in 2027)
- GitHub: https://github.com/romilmonpara
- LinkedIn: https://www.linkedin.com/in/romilmonpara
- 🔗 Input any YouTube video URL
- 📝 Extracts and cleans transcript using
youtube_transcript_api - 🤖 Summarizes content via Hugging Face T5 Transformer model
- 🧹 Keyword extraction using TF-IDF with fallback to word frequency
- 📚 Topic modeling via LDA (
scikit-learn) - ❤️ Sentiment analysis using:
TextBlob(Polarity, Subjectivity)NLTK's VADER (Positive, Negative, Neutral, Compound)
- 📊 Visual representation of sentiment results
- 📤 Downloadable results (TXT for summary, CSV for full data)
- Frontend: Streamlit
- Core Libraries:
youtube_transcript_api,pytube– YouTube integrationtransformers– T5 summarization modelnltk,textblob– NLP, sentiment analysisscikit-learn– Topic modeling (LDA)matplotlib,pandas,base64– Visuals & exportstreamlit– UI & interaction
git clone https://github.com/romilmonpara/youtube-transcript-streamlit-ui.gitMake sure you have Python 3.7+ installed.
pip install -r requirements.txtimport nltk
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('stopwords')
nltk.download('omw-1.4')
nltk.download('vader_lexicon')streamlit run app.py- Open the app in your browser after launching Streamlit.
- Paste a YouTube video URL in the input box.
- Click "Analyze Video".
- View:
- Video metadata (title, author, views, etc.)
- Raw transcript (optional)
- Summary
- Keywords
- Topics
- Sentiment plots
- Download:
- 📄
summary.txt - 📊
analysis.csv
- 📄
-
Summary:
"In this video, the speaker discusses..."
-
Top Keywords:
data science, machine learning, deep learning...
-
Topics:
Topic 1: ai, data, learning
Topic 2: video, algorithm, streamlit -
Sentiment:
Polarity: 0.15 | Subjectivity: 0.45
VADER Compound Score: 0.74
-
Transcript Not Available:
The video must have closed captions enabled.
-
Invalid URL:
Only standard YouTube links are accepted.
-
Model Error / CUDA Out of Memory:
Reduce summary length or input shorter videos.
- 🌐 Multilingual transcript support
- ⚡ Faster summarization with GPU model serving
- 🧠 Use more advanced models like BART or Pegasus
- 🖼️ Improve UI with themes and mobile responsiveness
- Hugging Face Transformers
- NLTK & TextBlob teams
- Streamlit Community
- YouTube Transcript API developers