A comprehensive toolkit for downloading YouTube transcripts with multiple implementation approaches, AI-powered summarization, and persistent database storage capabilities.
- Multiple Transcript Methods: youtube-transcript-api,yt-dlp, and manual web scraping
- Network Resilience: Proxy support for cloud environments and restricted networks
- Format Flexibility: Accepts video IDs, full URLs, or short URLs
- Clean Web Interface: Responsive UI for viewing transcripts with mobile optimization
- AI-Powered Summarization: OpenAI GPT-4.1 integration with proper markdown formatting
- Automatic Video Import: /watch?v=VIDEO_IDURLs automatically import videos and redirect to SEO-friendly URLs
- Clickable Channel Navigation: Channel names on video pages link directly to channel overview pages
- Channel Management: Dedicated channel overview pages with clean handle-based routing (/@channelname)
- Channel Overview Pages: Comprehensive channel hubs with statistics, navigation, and recent videos
- RESTful API: JSON endpoints for programmatic access with auto-import capabilities
- Real-time Processing: AJAX-based summarization without page reloads
- Chapter Organization: Automatic video chapter detection and structured display
- Dual View Modes: Toggle between readable paragraphs and detailed timestamps
- Video Import Settings: Configurable import strategies and processing options via Settings page
- Persistent Database Storage: Supabase integration with tables for videos, transcripts, chapters, summaries, and memory snippets
- Channel Video Import: Import latest videos from YouTube channels with automatic transcript and AI summary generation
- Memory Snippets: Save and organize insights from AI summaries with formatting preservation and tagging
- Video Metadata: Automatic extraction of titles, thumbnails, duration, and uploader info
- Chapter Support: Organized transcript display by video chapters when available
- Mobile Responsive: Optimized interface for mobile devices with reduced padding
- Proxy Support: Configurable proxy settings for restricted network environments
- Structured Summaries: Organized sections including overview, key takeaways, and actionable strategies
- GPT-4.1 Integration: Latest OpenAI model with improved instruction following and context understanding
- Proper Markdown Formatting: Server-side conversion of markdown to HTML with bullet point processing
- Consistent Display: Unified formatting across video pages and channel summary pages
- Efficient Processing: Uses pre-formatted transcript data to avoid redundant API calls
- Consolidated Import Logic: Unified process_video_complete()function ensures consistent behavior
- Error Handling: Graceful fallbacks when summarization fails
Install all dependencies using the requirements file:
pip install -r requirements.txtOr install individual packages:
pip install flask youtube-transcript-api openai python-dotenv yt-dlp supabase markdown google-api-python-clientCreate a .env file or set environment variables:
# Required for AI summarization
OPENAI_API_KEY=your_openai_api_key
# Required for database storage
SUPABASE_URL=your_supabase_url
SUPABASE_KEY=your_supabase_anon_key
# Required for importing channel videos (YouTube Data API v3)
YOUTUBE_API_KEY=your_youtube_api_key
# Optional OpenAI configuration
OPENAI_MODEL=gpt-4.1
OPENAI_MAX_TOKENS=100000
OPENAI_TEMPERATURE=0.7
# Optional proxy configuration
YOUTUBE_PROXY=proxy_ip:8080
# Optional Flask configuration
FLASK_HOST=0.0.0.0
FLASK_PORT=33079
FLASK_DEBUG=True- Create a Supabase project at supabase.com
- Copy your project URL and anon key to the .envfile
- Run the SQL commands from sql/create_tables.sqlin your Supabase SQL editor to create the required tables
# Web application
python3 app.py
# OR using start script
./start_server.shThe application follows a clean modular design with Flask blueprints:
- Configuration: Centralized environment management in src/config.py
- Route Organization: Separated into logical blueprints in src/routes/
- Business Logic: Core functionality in dedicated modules (src/video_processing.py,src/youtube_api.py, etc.)
- Utilities: Helper functions in src/utils/
- Database: Single source of truth with src/database_storage.py
- New Routes: Add to appropriate blueprint in src/routes/
- New Business Logic: Create new modules in src/or extend existing ones
- New Utilities: Add helper functions to src/utils/helpers.py
- Configuration: Add environment variables to src/config.py
# Test module imports
python3 -c "from app import create_app; app = create_app(); print('✓ App created successfully')"
# Test specific modules
python3 -c "from src.video_processing import video_processor; print('✓ Video processor available')"
python3 -c "from src.youtube_api import youtube_api; print('✓ YouTube API available')"# Main implementation with proxy support
python3 tools/download_transcript.py "https://www.youtube.com/watch?v=VIDEO_ID" [proxy_ip]
# yt-dlp alternative
python3 tools/simple_transcript.py "https://www.youtube.com/watch?v=VIDEO_ID"
# Manual web scraping fallback
python3 tools/download_transcript_manual.py "https://www.youtube.com/watch?v=VIDEO_ID"- Home: http://localhost:33079/
- Transcript: http://localhost:33079/watch?v=VIDEO_ID(auto-imports and redirects to SEO-friendly URL)
- SEO-Friendly Video: http://localhost:33079/@channelhandle/video-title-slug
- Memory Snippets: http://localhost:33079/memory-snippets
- Channels: http://localhost:33079/channels
- Channel Overview: http://localhost:33079/@channelhandle
- Channel Videos: http://localhost:33079/@channelhandle/videos
- Channel Summaries: http://localhost:33079/@channelhandle/summaries
- Channel Snippets: http://localhost:33079/@channelhandle/snippets
- Videos: http://localhost:33079/videos
Recent Improvements:
- Auto-import functionality: /watch?v=VIDEO_IDnow automatically imports videos if not found and redirects to clean URLs
- Clickable channel names: Channel names on video pages are now clickable links to channel overview pages
- Proper summary formatting: AI summaries display with correct markdown formatting (headers, bullet points, links)
- Transcript JSON: http://localhost:33079/api/transcript/VIDEO_ID(auto-imports if not found)
- Summary with Data: POST http://localhost:33079/api/summary(with transcript data in body)
- Memory Snippets: GET/POST/DELETE http://localhost:33079/api/memory-snippets
- Channel Import: POST http://localhost:33079/api/@channelhandle/import
- Storage Stats: http://localhost:33079/api/storage/stats
Import Logic: All video import operations now use the unified process_video_complete() function for consistent behavior across all endpoints.
# Command line usage
python3 download_transcript.py "https://www.youtube.com/watch?v=FjHtZnjNEBU"
# Web interface (summary generated via AJAX)
http://localhost:33079/watch?v=FjHtZnjNEBU
# API endpoints
curl http://localhost:33079/api/transcript/FjHtZnjNEBU
curl http://localhost:33079/api/storage/stats
# AJAX summary request (made automatically by web interface)
curl -X POST http://localhost:33079/api/summary \
  -H "Content-Type: application/json" \
  -d '{"video_id": "FjHtZnjNEBU", "formatted_transcript": "..."}'
# Memory snippets API
curl http://localhost:33079/api/memory-snippets
curl -X POST http://localhost:33079/api/memory-snippets \
  -H "Content-Type: application/json" \
  -d '{"video_id": "FjHtZnjNEBU", "snippet_text": "Important insight", "tags": ["key-point"]}'
# Channel video import API
curl -X POST http://localhost:33079/api/@techchannel/import \
  -H "Content-Type: application/json" \
  -d '{"max_results": 5}'- Video ID: FjHtZnjNEBU
- Full URL: https://www.youtube.com/watch?v=FjHtZnjNEBU
- Short URL: https://youtu.be/FjHtZnjNEBU
- tools/download_transcript.py- Main implementation using- youtube-transcript-apiwith proxy support
- tools/simple_transcript.py- Alternative using- yt-dlpfor broader compatibility
- tools/download_transcript_manual.py- Manual web scraping fallback
- app.py- Main Flask application entry point with modular blueprint architecture
The application follows a clean modular architecture with all source code organized in the /src directory:
- src/config.py- Centralized configuration management with environment variables
- src/database_storage.py- Supabase database integration for persistent storage
- src/legacy_file_storage.py- Legacy file-based storage (deprecated, replaced by database)
- src/transcript_summarizer.py- OpenAI GPT-4.1 powered summarization with chapter support
- src/video_processing.py- Complete video processing pipeline: transcript extraction, formatting, AI summarization
- src/youtube_api.py- YouTube Data API integration for channel and video information
- src/routes/main.py- Main routes: home page,- /watchredirects, favicon
- src/routes/api.py- RESTful API endpoints for transcript data, summaries, and video operations
- src/routes/channels.py- Channel management: overview, videos, summaries with handle-based routing
- src/routes/videos.py- Video display and transcript viewing with SEO-friendly URLs
- src/routes/snippets.py- Memory snippets management and display
- src/utils/helpers.py- Utility functions: video ID extraction, markdown conversion, URL parsing
- templates/- HTML templates for responsive web interface
- static/- CSS, JavaScript, and static assets
- sql/create_tables.sql- Database schema for Supabase setup
- sql/create_memory_snippets_table.sql- Memory snippets table creation
- sql/migration_*.sql- Database migration scripts
- sql/add_*.sql- Column addition scripts
- sql/fix_*.sql- Database fixes and corrections
- scripts/update_channel_handles.py- Update YouTube channel handles in database
- tests/test_chapters.py- Test script for debugging chapter extraction
- .env- Environment variables (create from examples above)
- CLAUDE.md- Development guidelines and project documentation
- README.md- Project overview and setup instructions
The AI summarization feature provides structured summaries with the following sections:
- Overview - Brief 2-3 sentence summary of the video content
- Main Topics Covered - Primary themes and subjects discussed
- Key Takeaways & Insights - Most important points and conclusions
- Actionable Strategies - Practical advice and implementable steps
- Specific Details & Examples - Important statistics, case studies, and examples
- Warnings & Common Mistakes - Pitfalls and errors to avoid
- Resources & Next Steps - Tools, links, and recommended follow-up actions
The Channel Overview feature provides dedicated pages for each YouTube channel, serving as a central hub for all channel-related content.
- Comprehensive Channel Information: Channel name, handle (@channelname), description, and thumbnail
- Statistics Dashboard: Video count, AI summaries count, and memory snippets count with color-coded cards
- Navigation Hub: Direct links to videos, summaries, and snippets with descriptions
- Recent Videos Grid: Visual display of latest 6 videos with thumbnails and metadata
- Channel Actions: Import latest videos and visit YouTube channel directly
- Handle-Based URLs: Clean URLs using channel handles (e.g., /@markrober)
- Breadcrumb Navigation: Easy navigation back to channel overview from sub-pages
- Responsive Design: Mobile-optimized layout with proper breakpoints
/@channelhandle              → Channel Overview (main hub)
/@channelhandle/videos       → All Videos List  
/@channelhandle/summaries    → AI Summaries
/@channelhandle/snippets     → Memory Snippets
- Channels Page (/channels) - Browse all channels
- Channel Overview (/@handle) - Channel hub with stats and navigation
- Sub-pages - Videos, summaries, or snippets with breadcrumb navigation back to overview
- Individual Content - Specific videos, summaries, or snippets
- Header: Gradient background with channel thumbnail, name, handle, and description
- Statistics Cards: Large numbers showing video count, summaries, and snippets
- Navigation Cards: Interactive cards with hover effects linking to different content types
- Recent Videos: Grid layout showing latest videos with thumbnails and quick access
- Actions: Import videos, visit YouTube channel, browse all channels
The Channel Video Import feature allows you to automatically fetch the latest videos from any YouTube channel and process them with transcripts and AI summaries.
The application provides comprehensive configuration options for video imports through the Settings page (/settings → Video Imports tab):
- Default Max Results: Number of videos to import per channel (default: 20)
- Default Days Back: Time range in days to look back for videos (default: 30)
- Max Results Limit: Maximum videos that can be imported in one request (default: 50)
- Import Timeout: Timeout in seconds for import operations (default: 300)
- Enable Auto Summary: Automatically generate AI summaries for imported videos
- Enable Transcript Extraction: Extract transcripts for imported videos
- Enable Chapter Extraction: Extract video chapters for imported videos
- Skip Existing Videos: Skip videos that already exist in the database
- Primary Strategy: Choose between uploads_playlist,activities_api, orsearch_api
- Fallback Strategies: Comma-separated list of fallback strategies to try
- Batch Processing: Process videos in batches for better performance
- Batch Size: Number of videos to process in each batch (default: 5)
- Retry Failed Imports: Retry failed video imports
- Max Retry Attempts: Maximum retry attempts for failed imports (default: 3)
- Enable Progress Tracking: Show progress updates during import operations
- Log Import Operations: Log detailed information about import operations
- Automatic Discovery: Import up to 20 latest videos from any YouTube channel
- Smart Channel Matching: Handles various channel URL formats and name variations
- Complete Processing: Each imported video gets transcript extraction and AI summary generation
- Duplicate Prevention: Skips videos that are already in your database
- Progress Tracking: Real-time feedback showing processed, skipped, and error counts
- URL Decoding: Properly handles URL-encoded channel names
- Navigate to Channels page at /channels
- Click "Import Latest Videos" on any existing channel card
- Wait for processing - the system will fetch and process each video
- View results - notification shows how many videos were processed/skipped/errors
- Page refresh - automatically updates to show new video counts
# Import 5 latest videos from a channel
curl -X POST http://localhost:33079/api/@techchannel/import \
  -H "Content-Type: application/json" \
  -d '{"max_results": 5}'
# Response includes detailed results
{
  "success": true,
  "channel_name": "TechChannel",
  "total_videos": 5,
  "processed": 3,
  "skipped": 2,
  "errors": 0,
  "results": [...]
}- YouTube Data API v3 Key: Required for channel video discovery
- OpenAI API Key: Optional, for AI summary generation
- Channel Must Exist: The channel must have at least one video already in your database for optimal matching
The Memory Snippets feature allows you to save and organize key insights from AI summaries and transcripts, creating a personal knowledge base.
- Text Selection: Select any text from AI summaries or transcripts to save as a snippet
- Formatting Preservation: HTML formatting (headers, bullet points, bold text, links) is preserved from AI summaries
- Tagging System: Add custom tags to organize and categorize snippets
- Video Grouping: Snippets are automatically grouped by video for better organization
- Context Preservation: Saves surrounding text context for better understanding
- Search & Browse: View all snippets organized by video with visual thumbnails
- Generate an AI Summary for any video
- Select text from the summary or transcript that you want to save
- Click the "💾 Save as Memory Snippet" button that appears
- Add optional tags to categorize the snippet
- Save the snippet to your personal knowledge base
- Browse snippets at /memory-snippetsgrouped by video
Memory snippets are stored in the memory_snippets table with:
- snippet_text: The selected text with preserved HTML formatting
- context_before/after: Surrounding text for context
- tags: Array of custom tags for organization
- video_id: Link to the source video
- timestamps: Creation and update times
- Persistent Storage: Supabase database with five main tables (videos, transcripts, chapters, summaries, memory_snippets)
- Automatic Relationships: Foreign keys linking videos, transcripts, chapters, summaries, and memory snippets
- Performance Indexes: Optimized queries for video_id, timestamps, and memory snippet tags
- Storage Statistics: Monitor database performance via API endpoints
- Memory Snippets: Personal knowledge base with text selection, formatting preservation, and tagging
- Cloud provider IPs are commonly blocked by YouTube
- Proxy support available for restricted environments via YOUTUBE_PROXYenv var
- Multiple fallback methods for different network conditions
- Reduced API calls through database storage and caching
- AJAX summarization: No page reloads for AI summary generation
- Efficient transcript processing: Uses formatted text for summarization
- Database caching: Persistent storage eliminates redundant API calls
- Mobile-optimized UI: Reduced padding and responsive design
- Chapter organization: Structured content display for better readability
Build and run using Docker:
# Build Docker image
docker build -t youtube-deep-search .
# Run with environment variables
docker run -p 33079:33079 \
  -e OPENAI_API_KEY=your_openai_key \
  -e SUPABASE_URL=your_supabase_url \
  -e SUPABASE_KEY=your_supabase_key \
  youtube-deep-search# Clone repository
git clone <repository-url>
cd youtube-deep-search
# Install dependencies
pip install -r requirements.txt
# Set up environment variables
cp .env.example .env
# Edit .env with your API keys
# Initialize database
# Run sql/create_tables.sql in your Supabase SQL editor
# Start application
python3 app.py