Skip to content

AI-powered YouTube transcript extraction and analysis tool. Extract transcripts, generate summaries, and create structured documentation from any YouTube video. Perfect for researchers, educators, and content creators.

nelgonzalez1/-youtube-transcript-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎬 YouTube Content Analyzer

Version Python License

A powerful, modular tool for automatically extracting, analyzing, and documenting YouTube video content using AI. Perfect for researchers, content creators, educators, and professionals who need to process and analyze video content from any domain at scale.

🌟 Key Features

🎯 Core Functionality

  • Transcript Extraction: Automatic extraction of video transcripts with timestamps
  • AI-Powered Analysis: Intelligent content summarization using OpenAI GPT models
  • Multi-language Support: UI and outputs in English and Spanish (extensible)
  • Flexible Output Formats: Markdown, JSON, and plain text exports
  • Batch Processing: Process multiple videos efficiently

πŸ› οΈ Advanced Features

  • Transcript-Only Mode: Extract transcripts without AI analysis for manual processing
  • Configurable AI Analysis: Toggle AI summarization on/off
  • Separate Transcript Files: Export clean transcripts for external tools (ChatGPT, Claude, etc.)
  • Modular Architecture: Clean, maintainable codebase with separate services
  • Comprehensive Documentation: Structured analysis with key points, tools, concepts, and more

πŸ“Š Output Options

  • Full Analysis Reports: Complete markdown documents with AI insights
  • Raw Transcripts: Clean text files with optional timestamps
  • JSON Data Exports: Structured data for further processing
  • Master Index: Overview of all processed videos with statistics
  • Multi-format Support: Choose between markdown, JSON, or both

πŸš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • OpenAI API key (optional, for AI analysis)
  • Internet connection for YouTube access

Installation

  1. Clone the repository:
git clone https://github.com/your-username/youtube-content-analyzer.git
cd youtube-content-analyzer
  1. Install dependencies:
pip install -r requirements.txt
  1. Configure the tool:
cp config/config_template.json config.json
  1. Edit configuration (add your OpenAI API key):
{
  "openai_api_key": "your-openai-api-key-here",
  "features": {
    "generate_ai_summary": true,
    "export_full_transcript": true,
    "transcript_only_mode": false
  },
  "output": {
    "language": "en",
    "formats": ["markdown", "json"],
    "separate_transcript_file": true
  }
}

Basic Usage

Process a single video with AI analysis:

python main.py "https://www.youtube.com/watch?v=VIDEO_ID"

Extract transcript only (no AI analysis):

python main.py "https://www.youtube.com/watch?v=VIDEO_ID" --transcript-only

Process multiple videos:

python main.py URL1 URL2 URL3 --batch-size 5

Custom configuration:

python main.py URL1 --config my_config.json --language es

πŸ“ Project Structure

youtube-content-analyzer/
β”œβ”€β”€ src/                          # Source code
β”‚   β”œβ”€β”€ core/                     # Core application logic
β”‚   β”‚   └── youtube_research_tool.py
β”‚   β”œβ”€β”€ services/                 # Service modules
β”‚   β”‚   β”œβ”€β”€ transcript_service.py # YouTube transcript extraction
β”‚   β”‚   β”œβ”€β”€ openai_service.py    # AI analysis service
β”‚   β”‚   └── document_service.py  # Document generation
β”‚   └── utils/                    # Utilities
β”‚       └── i18n.py              # Internationalization
β”œβ”€β”€ config/                       # Configuration files
β”‚   └── config_template.json     # Configuration template
β”œβ”€β”€ locales/                      # Language files
β”‚   β”œβ”€β”€ en.json                  # English translations
β”‚   └── es.json                  # Spanish translations
β”œβ”€β”€ docs/                         # Documentation
β”œβ”€β”€ examples/                     # Usage examples
β”œβ”€β”€ tests/                        # Unit tests
β”œβ”€β”€ main.py                       # CLI entry point
β”œβ”€β”€ requirements.txt              # Python dependencies
└── README.md                     # This file

βš™οΈ Configuration Options

Core Settings

{
  "openai_api_key": "your-key-here",
  "openai_model": "gpt-4o-mini",
  "output_directory": "./research_output",
  "languages": ["en", "es"],
  "focus_topics": [
    "Technology", "Programming", "AI & Machine Learning",
    "Business", "Education", "Science"
  ]
}

Feature Toggles

{
  "features": {
    "generate_ai_summary": true,      // Enable/disable AI analysis
    "export_full_transcript": true,   // Include full transcript in output
    "transcript_only_mode": false,    // Only extract transcripts
    "include_timestamp_transcript": true
  }
}

Output Configuration

{
  "output": {
    "language": "en",                 // UI language (en/es)
    "formats": ["markdown", "json"],  // Output formats
    "separate_transcript_file": true  // Create separate .txt files
  }
}

πŸ“– Usage Examples

Example 1: Research Mode (Full Analysis)

python main.py "https://www.youtube.com/watch?v=VIDEO_ID"

Output:

  • VIDEO_ID_analysis.md - Full analysis with AI insights
  • VIDEO_ID_transcript.txt - Clean transcript file
  • VIDEO_ID_data.json - Structured data export
  • INDEX.md - Master index with summary
  • results.json - Processing results

Example 2: Transcript-Only Mode

python main.py "https://www.youtube.com/watch?v=VIDEO_ID" --transcript-only

Use cases:

  • Manual analysis with ChatGPT or Claude
  • Content preparation for other AI tools
  • Quick transcript extraction for note-taking
  • Research without API costs

Example 3: Batch Processing

python main.py URL1 URL2 URL3 URL4 URL5 --batch-size 3

Features:

  • Processes videos in batches to avoid rate limits
  • Progress tracking and error handling
  • Comprehensive reporting across all videos

Example 4: Custom Configuration

python main.py URL1 --config research_config.json --language es --output-dir ./my_research

🌐 Multi-language Support

Supported Languages

  • English (en): Default language
  • Spanish (es): Full translation available
  • Extensible: Easy to add new languages via JSON files

Adding New Languages

  1. Create locales/[language_code].json
  2. Copy structure from locales/en.json
  3. Translate all strings
  4. Update configuration to use new language

πŸ”§ API Integration

OpenAI Integration

The tool uses OpenAI's GPT models for intelligent content analysis:

  • Default Model: gpt-4o-mini (cost-effective)
  • Alternative Models: gpt-4, gpt-3.5-turbo
  • Configurable: Easily switch models in config

Transcript Extraction

Uses youtube-transcript-api for reliable transcript extraction:

  • Multiple language support
  • Automatic fallback to available languages
  • Error handling for unavailable transcripts

πŸ“Š Output Formats

Markdown Analysis Report

# πŸ“Ή Video Title

**URL:** https://www.youtube.com/watch?v=VIDEO_ID
**Language:** en
**Analysis Date:** 2024-01-15 10:30

## 🎯 Executive Summary
AI-generated summary of the video content...

## πŸš€ Key Points
- Main insight 1
- Main insight 2
- Main insight 3

## πŸ› οΈ Tools Mentioned
- Tool 1
- Tool 2

## πŸ“ Full Transcript
Complete transcript with timestamps...

πŸ” Use Cases

πŸ“š Research & Academia

  • Literature Reviews: Process educational and research videos at scale
  • Lecture Analysis: Extract insights from academic presentations
  • Documentation: Create structured notes from any video content

πŸ’Ό Business & Professional

  • Market Research: Analyze industry trend videos and competitor content
  • Training Analysis: Process corporate training and educational materials
  • Content Strategy: Extract insights from marketing and business videos

πŸŽ“ Education & Learning

  • Course Analysis: Extract key points from online courses and tutorials
  • Study Materials: Convert video lectures to structured study notes
  • Knowledge Management: Process educational content for documentation

πŸ”¬ General Research

  • Content Analysis: Analyze videos from any domain or topic
  • Information Extraction: Pull key insights from interviews and presentations
  • Cross-Domain Research: Process videos from technology, science, business, arts, etc.

🀝 Contributing

We welcome contributions!

Development Setup

git clone https://github.com/your-username/youtube-content-analyzer.git
cd youtube-content-analyzer
pip install -r requirements.txt

πŸ“„ License

This project is licensed under the MIT License.

πŸ™ Acknowledgments

  • YouTube Transcript API: For reliable transcript extraction
  • OpenAI: For powerful AI analysis capabilities

Made with ❀️ for researchers, educators, and content creators worldwide

About

AI-powered YouTube transcript extraction and analysis tool. Extract transcripts, generate summaries, and create structured documentation from any YouTube video. Perfect for researchers, educators, and content creators.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages