YouTube Deep Summary

A comprehensive toolkit for downloading YouTube transcripts with multiple implementation approaches, AI-powered summarization, and persistent database storage capabilities.

Features

Core Functionality

Multiple Transcript Methods: youtube-transcript-api, yt-dlp, and manual web scraping
Network Resilience: Proxy support for cloud environments and restricted networks
Format Flexibility: Accepts video IDs, full URLs, or short URLs

Web Application

Clean Web Interface: Responsive UI for viewing transcripts with mobile optimization
AI-Powered Summarization: OpenAI GPT-4.1 integration with proper markdown formatting
Automatic Video Import: /watch?v=VIDEO_ID URLs automatically import videos and redirect to SEO-friendly URLs
Clickable Channel Navigation: Channel names on video pages link directly to channel overview pages
Channel Management: Dedicated channel overview pages with clean handle-based routing (/@channelname)
Channel Overview Pages: Comprehensive channel hubs with statistics, navigation, and recent videos
RESTful API: JSON endpoints for programmatic access with auto-import capabilities
Real-time Processing: AJAX-based summarization without page reloads
Chapter Organization: Automatic video chapter detection and structured display
Dual View Modes: Toggle between readable paragraphs and detailed timestamps
Video Import Settings: Configurable import strategies and processing options via Settings page

Advanced Features

Persistent Database Storage: Supabase integration with tables for videos, transcripts, chapters, summaries, and memory snippets
Channel Video Import: Import latest videos from YouTube channels with automatic transcript and AI summary generation
Memory Snippets: Save and organize insights from AI summaries with formatting preservation and tagging
Video Metadata: Automatic extraction of titles, thumbnails, duration, and uploader info
Chapter Support: Organized transcript display by video chapters when available
Mobile Responsive: Optimized interface for mobile devices with reduced padding
Proxy Support: Configurable proxy settings for restricted network environments

Summarization Features

Structured Summaries: Organized sections including overview, key takeaways, and actionable strategies
GPT-4.1 Integration: Latest OpenAI model with improved instruction following and context understanding
Proper Markdown Formatting: Server-side conversion of markdown to HTML with bullet point processing
Consistent Display: Unified formatting across video pages and channel summary pages
Efficient Processing: Uses pre-formatted transcript data to avoid redundant API calls
Consolidated Import Logic: Unified process_video_complete() function ensures consistent behavior
Error Handling: Graceful fallbacks when summarization fails

Setup

Dependencies

Install all dependencies using the requirements file:

pip install -r requirements.txt

Or install individual packages:

pip install flask youtube-transcript-api openai python-dotenv yt-dlp supabase markdown google-api-python-client

Environment Configuration

Create a .env file or set environment variables:

# Required for AI summarization
OPENAI_API_KEY=your_openai_api_key

# Required for database storage
SUPABASE_URL=your_supabase_url
SUPABASE_KEY=your_supabase_anon_key

# Required for importing channel videos (YouTube Data API v3)
YOUTUBE_API_KEY=your_youtube_api_key

# Optional OpenAI configuration
OPENAI_MODEL=gpt-4.1
OPENAI_MAX_TOKENS=100000
OPENAI_TEMPERATURE=0.7

# Optional proxy configuration
YOUTUBE_PROXY=proxy_ip:8080

# Optional Flask configuration
FLASK_HOST=0.0.0.0
FLASK_PORT=33079
FLASK_DEBUG=True

Database Setup

Create a Supabase project at supabase.com
Copy your project URL and anon key to the .env file
Run the SQL commands from sql/create_tables.sql in your Supabase SQL editor to create the required tables

Starting the Application

# Web application
python3 app.py

# OR using start script
./start_server.sh

Development

Modular Architecture

The application follows a clean modular design with Flask blueprints:

Configuration: Centralized environment management in src/config.py
Route Organization: Separated into logical blueprints in src/routes/
Business Logic: Core functionality in dedicated modules (src/video_processing.py, src/youtube_api.py, etc.)
Utilities: Helper functions in src/utils/
Database: Single source of truth with src/database_storage.py

Adding New Features

New Routes: Add to appropriate blueprint in src/routes/
New Business Logic: Create new modules in src/ or extend existing ones
New Utilities: Add helper functions to src/utils/helpers.py
Configuration: Add environment variables to src/config.py

Testing

# Test module imports
python3 -c "from app import create_app; app = create_app(); print('✓ App created successfully')"

# Test specific modules
python3 -c "from src.video_processing import video_processor; print('✓ Video processor available')"
python3 -c "from src.youtube_api import youtube_api; print('✓ YouTube API available')"

Usage

Command Line Scripts

# Main implementation with proxy support
python3 tools/download_transcript.py "https://www.youtube.com/watch?v=VIDEO_ID" [proxy_ip]

# yt-dlp alternative
python3 tools/simple_transcript.py "https://www.youtube.com/watch?v=VIDEO_ID"

# Manual web scraping fallback
python3 tools/download_transcript_manual.py "https://www.youtube.com/watch?v=VIDEO_ID"

Web Interface

Home: http://localhost:33079/
Transcript: http://localhost:33079/watch?v=VIDEO_ID (auto-imports and redirects to SEO-friendly URL)
SEO-Friendly Video: http://localhost:33079/@channelhandle/video-title-slug
Memory Snippets: http://localhost:33079/memory-snippets
Channels: http://localhost:33079/channels
Channel Overview: http://localhost:33079/@channelhandle
Channel Videos: http://localhost:33079/@channelhandle/videos
Channel Summaries: http://localhost:33079/@channelhandle/summaries
Channel Snippets: http://localhost:33079/@channelhandle/snippets
Videos: http://localhost:33079/videos

Recent Improvements:

Auto-import functionality: /watch?v=VIDEO_ID now automatically imports videos if not found and redirects to clean URLs
Clickable channel names: Channel names on video pages are now clickable links to channel overview pages
Proper summary formatting: AI summaries display with correct markdown formatting (headers, bullet points, links)

API Endpoints

Transcript JSON: http://localhost:33079/api/transcript/VIDEO_ID (auto-imports if not found)
Summary with Data: POST http://localhost:33079/api/summary (with transcript data in body)
Memory Snippets: GET/POST/DELETE http://localhost:33079/api/memory-snippets
Channel Import: POST http://localhost:33079/api/@channelhandle/import
Storage Stats: http://localhost:33079/api/storage/stats

Import Logic: All video import operations now use the unified process_video_complete() function for consistent behavior across all endpoints.

Examples

# Command line usage
python3 download_transcript.py "https://www.youtube.com/watch?v=FjHtZnjNEBU"

# Web interface (summary generated via AJAX)
http://localhost:33079/watch?v=FjHtZnjNEBU

# API endpoints
curl http://localhost:33079/api/transcript/FjHtZnjNEBU
curl http://localhost:33079/api/storage/stats

# AJAX summary request (made automatically by web interface)
curl -X POST http://localhost:33079/api/summary \
  -H "Content-Type: application/json" \
  -d '{"video_id": "FjHtZnjNEBU", "formatted_transcript": "..."}'

# Memory snippets API
curl http://localhost:33079/api/memory-snippets
curl -X POST http://localhost:33079/api/memory-snippets \
  -H "Content-Type: application/json" \
  -d '{"video_id": "FjHtZnjNEBU", "snippet_text": "Important insight", "tags": ["key-point"]}'

# Channel video import API
curl -X POST http://localhost:33079/api/@techchannel/import \
  -H "Content-Type: application/json" \
  -d '{"max_results": 5}'

Supported Input Formats

Video ID: FjHtZnjNEBU
Full URL: https://www.youtube.com/watch?v=FjHtZnjNEBU
Short URL: https://youtu.be/FjHtZnjNEBU

Project Structure

Command-Line Tools (`/tools`)

tools/download_transcript.py - Main implementation using youtube-transcript-api with proxy support
tools/simple_transcript.py - Alternative using yt-dlp for broader compatibility
tools/download_transcript_manual.py - Manual web scraping fallback

Web Application

app.py - Main Flask application entry point with modular blueprint architecture

Core Modules (`/src`)

The application follows a clean modular architecture with all source code organized in the /src directory:

Configuration & Infrastructure

src/config.py - Centralized configuration management with environment variables
src/database_storage.py - Supabase database integration for persistent storage
src/legacy_file_storage.py - Legacy file-based storage (deprecated, replaced by database)

Business Logic

src/transcript_summarizer.py - OpenAI GPT-4.1 powered summarization with chapter support
src/video_processing.py - Complete video processing pipeline: transcript extraction, formatting, AI summarization
src/youtube_api.py - YouTube Data API integration for channel and video information

Web Interface (`/src/routes`)

src/routes/main.py - Main routes: home page, /watch redirects, favicon
src/routes/api.py - RESTful API endpoints for transcript data, summaries, and video operations
src/routes/channels.py - Channel management: overview, videos, summaries with handle-based routing
src/routes/videos.py - Video display and transcript viewing with SEO-friendly URLs
src/routes/snippets.py - Memory snippets management and display

Utilities

src/utils/helpers.py - Utility functions: video ID extraction, markdown conversion, URL parsing

Frontend & Assets

templates/ - HTML templates for responsive web interface
static/ - CSS, JavaScript, and static assets

Database & SQL

sql/create_tables.sql - Database schema for Supabase setup
sql/create_memory_snippets_table.sql - Memory snippets table creation
sql/migration_*.sql - Database migration scripts
sql/add_*.sql - Column addition scripts
sql/fix_*.sql - Database fixes and corrections

Utility Scripts (`/scripts`)

scripts/update_channel_handles.py - Update YouTube channel handles in database

Testing (`/tests`)

tests/test_chapters.py - Test script for debugging chapter extraction

Configuration & Documentation

.env - Environment variables (create from examples above)
CLAUDE.md - Development guidelines and project documentation
README.md - Project overview and setup instructions

AI Summarization Structure

The AI summarization feature provides structured summaries with the following sections:

Overview - Brief 2-3 sentence summary of the video content
Main Topics Covered - Primary themes and subjects discussed
Key Takeaways & Insights - Most important points and conclusions
Actionable Strategies - Practical advice and implementable steps
Specific Details & Examples - Important statistics, case studies, and examples
Warnings & Common Mistakes - Pitfalls and errors to avoid
Resources & Next Steps - Tools, links, and recommended follow-up actions

Channel Overview Pages

The Channel Overview feature provides dedicated pages for each YouTube channel, serving as a central hub for all channel-related content.

Features

Comprehensive Channel Information: Channel name, handle (@channelname), description, and thumbnail
Statistics Dashboard: Video count, AI summaries count, and memory snippets count with color-coded cards
Navigation Hub: Direct links to videos, summaries, and snippets with descriptions
Recent Videos Grid: Visual display of latest 6 videos with thumbnails and metadata
Channel Actions: Import latest videos and visit YouTube channel directly
Handle-Based URLs: Clean URLs using channel handles (e.g., /@markrober)
Breadcrumb Navigation: Easy navigation back to channel overview from sub-pages
Responsive Design: Mobile-optimized layout with proper breakpoints

URL Structure

/@channelhandle              → Channel Overview (main hub)
/@channelhandle/videos       → All Videos List  
/@channelhandle/summaries    → AI Summaries
/@channelhandle/snippets     → Memory Snippets

Navigation Flow

Channels Page (/channels) - Browse all channels
Channel Overview (/@handle) - Channel hub with stats and navigation
Sub-pages - Videos, summaries, or snippets with breadcrumb navigation back to overview
Individual Content - Specific videos, summaries, or snippets

Features by Section

Header: Gradient background with channel thumbnail, name, handle, and description
Statistics Cards: Large numbers showing video count, summaries, and snippets
Navigation Cards: Interactive cards with hover effects linking to different content types
Recent Videos: Grid layout showing latest videos with thumbnails and quick access
Actions: Import videos, visit YouTube channel, browse all channels

Channel Video Import

The Channel Video Import feature allows you to automatically fetch the latest videos from any YouTube channel and process them with transcripts and AI summaries.

Import Settings Configuration

The application provides comprehensive configuration options for video imports through the Settings page (/settings → Video Imports tab):

Basic Settings

Default Max Results: Number of videos to import per channel (default: 20)
Default Days Back: Time range in days to look back for videos (default: 30)
Max Results Limit: Maximum videos that can be imported in one request (default: 50)
Import Timeout: Timeout in seconds for import operations (default: 300)

Processing Options

Enable Auto Summary: Automatically generate AI summaries for imported videos
Enable Transcript Extraction: Extract transcripts for imported videos
Enable Chapter Extraction: Extract video chapters for imported videos
Skip Existing Videos: Skip videos that already exist in the database

Import Strategy

Primary Strategy: Choose between uploads_playlist, activities_api, or search_api
Fallback Strategies: Comma-separated list of fallback strategies to try

Advanced Settings

Batch Processing: Process videos in batches for better performance
Batch Size: Number of videos to process in each batch (default: 5)
Retry Failed Imports: Retry failed video imports
Max Retry Attempts: Maximum retry attempts for failed imports (default: 3)
Enable Progress Tracking: Show progress updates during import operations
Log Import Operations: Log detailed information about import operations

Features

Automatic Discovery: Import up to 20 latest videos from any YouTube channel
Smart Channel Matching: Handles various channel URL formats and name variations
Complete Processing: Each imported video gets transcript extraction and AI summary generation
Duplicate Prevention: Skips videos that are already in your database
Progress Tracking: Real-time feedback showing processed, skipped, and error counts
URL Decoding: Properly handles URL-encoded channel names

Usage

Navigate to Channels page at /channels
Click "Import Latest Videos" on any existing channel card
Wait for processing - the system will fetch and process each video
View results - notification shows how many videos were processed/skipped/errors
Page refresh - automatically updates to show new video counts

API Usage

# Import 5 latest videos from a channel
curl -X POST http://localhost:33079/api/@techchannel/import \
  -H "Content-Type: application/json" \
  -d '{"max_results": 5}'

# Response includes detailed results
{
  "success": true,
  "channel_name": "TechChannel",
  "total_videos": 5,
  "processed": 3,
  "skipped": 2,
  "errors": 0,
  "results": [...]
}

Requirements

YouTube Data API v3 Key: Required for channel video discovery
OpenAI API Key: Optional, for AI summary generation
Channel Must Exist: The channel must have at least one video already in your database for optimal matching

Memory Snippets

The Memory Snippets feature allows you to save and organize key insights from AI summaries and transcripts, creating a personal knowledge base.

Features

Text Selection: Select any text from AI summaries or transcripts to save as a snippet
Formatting Preservation: HTML formatting (headers, bullet points, bold text, links) is preserved from AI summaries
Tagging System: Add custom tags to organize and categorize snippets
Video Grouping: Snippets are automatically grouped by video for better organization
Context Preservation: Saves surrounding text context for better understanding
Search & Browse: View all snippets organized by video with visual thumbnails

Usage

Generate an AI Summary for any video
Select text from the summary or transcript that you want to save
Click the "💾 Save as Memory Snippet" button that appears
Add optional tags to categorize the snippet
Save the snippet to your personal knowledge base
Browse snippets at /memory-snippets grouped by video

Database Structure

Memory snippets are stored in the memory_snippets table with:

snippet_text: The selected text with preserved HTML formatting
context_before/after: Surrounding text for context
tags: Array of custom tags for organization
video_id: Link to the source video
timestamps: Creation and update times

Performance & Storage

Database Storage System

Persistent Storage: Supabase database with five main tables (videos, transcripts, chapters, summaries, memory_snippets)
Automatic Relationships: Foreign keys linking videos, transcripts, chapters, summaries, and memory snippets
Performance Indexes: Optimized queries for video_id, timestamps, and memory snippet tags
Storage Statistics: Monitor database performance via API endpoints
Memory Snippets: Personal knowledge base with text selection, formatting preservation, and tagging

Network Considerations

Cloud provider IPs are commonly blocked by YouTube
Proxy support available for restricted environments via YOUTUBE_PROXY env var
Multiple fallback methods for different network conditions
Reduced API calls through database storage and caching

Performance Optimizations

AJAX summarization: No page reloads for AI summary generation
Efficient transcript processing: Uses formatted text for summarization
Database caching: Persistent storage eliminates redundant API calls
Mobile-optimized UI: Reduced padding and responsive design
Chapter organization: Structured content display for better readability

Deployment

Docker Deployment

Build and run using Docker:

# Build Docker image
docker build -t youtube-deep-search .

# Run with environment variables
docker run -p 33079:33079 \
  -e OPENAI_API_KEY=your_openai_key \
  -e SUPABASE_URL=your_supabase_url \
  -e SUPABASE_KEY=your_supabase_key \
  youtube-deep-search

Manual Deployment

# Clone repository
git clone <repository-url>
cd youtube-deep-search

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
cp .env.example .env
# Edit .env with your API keys

# Initialize database
# Run sql/create_tables.sql in your Supabase SQL editor

# Start application
python3 app.py

Name		Name	Last commit message	Last commit date
Latest commit History 273 Commits
.claude		.claude
.cursor		.cursor
scripts		scripts
sql		sql
src		src
static		static
templates		templates
tests		tests
tools		tools
.DS_Store		.DS_Store
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.replit		.replit
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
IMPLEMENTATION_SHORTS_FILTERING.md		IMPLEMENTATION_SHORTS_FILTERING.md
IMPROVEMENT_SKIP_EXISTING_LOGIC.md		IMPROVEMENT_SKIP_EXISTING_LOGIC.md
README.md		README.md
URL_PATH_FEATURE.md		URL_PATH_FEATURE.md
app.py		app.py
requirements.txt		requirements.txt
start_server.sh		start_server.sh
test_import_settings.py		test_import_settings.py

dennisimoo/youtube-deepsearch

Folders and files

Latest commit

History

Repository files navigation

YouTube Deep Summary

Features

Core Functionality

Web Application

Advanced Features

Summarization Features

Setup

Dependencies

Environment Configuration

Database Setup

Starting the Application

Development

Modular Architecture

Adding New Features

Testing

Usage

Command Line Scripts

Web Interface

API Endpoints

Examples

Supported Input Formats

Project Structure

Command-Line Tools (/tools)

Web Application

Core Modules (/src)

Configuration & Infrastructure

Business Logic

Web Interface (/src/routes)

Utilities

Frontend & Assets

Database & SQL

Utility Scripts (/scripts)

Testing (/tests)

Configuration & Documentation

AI Summarization Structure

Channel Overview Pages

Features

URL Structure

Navigation Flow

Features by Section

Channel Video Import

Import Settings Configuration

Basic Settings

Processing Options

Import Strategy

Advanced Settings

Features

Usage

API Usage

Requirements

Memory Snippets

Features

Usage

Database Structure

Performance & Storage

Database Storage System

Network Considerations

Performance Optimizations

Deployment

Docker Deployment

Manual Deployment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Command-Line Tools (`/tools`)

Core Modules (`/src`)

Web Interface (`/src/routes`)

Utility Scripts (`/scripts`)

Testing (`/tests`)

Packages