Skip to content
This repository was archived by the owner on Jan 5, 2026. It is now read-only.

Root-FTW/YT_DB_Trending

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

298 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 YouTube Trending Analytics & Data Intelligence

Automated data collection and analysis of YouTube trending videos across 27 countries

Python YouTube API GitHub Actions Data Collection Countries


📋 Table of Contents


🎯 Project Overview

YouTube Trending Analytics is an automated data collection and analysis system that tracks trending videos across 27 countries in real-time. The system captures comprehensive video metadata, engagement metrics, and channel information to enable advanced content performance analysis.

🎯 Core Objectives

Objective Status Description
📊 Data Collection Active Daily automated collection from 27 countries
🗄️ Historical Database Active 999-day retention with comprehensive metrics
🔍 Performance Analysis Ready 65+ fields per video for ML analysis
🌐 Multi-Language Support Active Spanish & English speaking countries
🚀 Real-time Processing Active GitHub Actions automation

🌍 Countries Analyzed

Our system monitors trending videos across 27 countries organized into language groups:

🇪🇸 Spanish-Speaking Countries (19)

🇦🇷 Argentina 🇧🇴 Bolivia 🇨🇱 Chile 🇨🇴 Colombia 🇨🇷 Costa Rica
🇩🇴 Dominican Rep. 🇪🇨 Ecuador 🇪🇸 Spain 🇬🇹 Guatemala 🇭🇳 Honduras
🇲🇽 Mexico 🇳🇮 Nicaragua 🇵🇦 Panama 🇵🇪 Peru 🇵🇷 Puerto Rico
🇵🇾 Paraguay 🇸🇻 El Salvador 🇺🇾 Uruguay 🇻🇪 Venezuela

🇬🇧 English-Speaking Countries (8)

🇦🇺 Australia 🇨🇦 Canada 🇬🇧 United Kingdom 🇮🇪 Ireland
🇯🇲 Jamaica 🇳🇿 New Zealand 🇸🇬 Singapore 🇺🇸 United States

⚡ Quick Start

📊 Live Data Access

Data Type Location Update Frequency
🎬 Trending Videos assets/meta/trending/ Daily at 1 AM PST
📈 Video Statistics assets/meta/video_stats/ Daily (after trending)
📋 Consolidated CSV db/ods/trending_videos.csv Daily
🌍 Worldwide Data assets/meta/trending/languages/www/ Daily

🔧 Local Setup

# Clone the repository
git clone https://github.com/Root-FTW/YT_DB_Trending.git
cd YT_DB_Trending

# Install dependencies
pip install -r src/requirements.txt

# Set up YouTube API key
export YOUTUBE_API_KEY="your_api_key_here"

# Run data collection (manual)
python src/collection/trending.py
python src/collection/video_stats.py

🔄 Data Pipeline

Our automated data collection pipeline runs daily and processes data through multiple stages:

graph TD
    A[🎬 trending.py<br/>Fetch Trending Videos<br/>5 API Parts] -->|27 Country JSONs| B[🌐 trending_consolidator.py<br/>Language Consolidation]
    A -->|Daily Files| C[📊 trending_db.py<br/>Aggregate & Clean Data]

    B -->|Spanish Group| D[🇪🇸 Spanish Consolidated<br/>19 Countries]
    B -->|English Group| E[🇬🇧 English Consolidated<br/>8 Countries]
    B -->|Global| F[🌍 Worldwide Consolidated<br/>All 27 Countries]

    C -->|Unified CSV| G[📈 video_stats.py<br/>Detailed Video Analytics<br/>10 API Parts + Channel Data]
    G -->|65+ Fields| H[📋 Daily Video Stats JSON<br/>Ready for ML Analysis]

    style A fill:#ff6b6b,color:#fff
    style B fill:#4ecdc4,color:#fff
    style C fill:#45b7d1,color:#fff
    style G fill:#96ceb4,color:#fff
    style H fill:#feca57,color:#000
    style F fill:#ff9ff3,color:#000
Loading

🔄 Pipeline Stages

Stage Script Input Output Frequency
1️⃣ Collection trending.py YouTube API 27 country JSON files Daily 1 AM PST
2️⃣ Consolidation trending_consolidator.py Country JSONs Language group files After stage 1
3️⃣ Aggregation trending_db.py All JSONs Unified CSV (999 days) After stage 2
4️⃣ Enhancement video_stats.py CSV video IDs Detailed stats JSON After stage 3

📊 Data Structure

Our system captures comprehensive data at multiple levels:

📁 File Organization

YT_DB_Trending/
├── 📂 assets/meta/trending/
│   ├── 📂 countries/          # Individual country data
│   │   ├── 📂 AR/            # Argentina files
│   │   ├── 📂 US/            # United States files
│   │   └── ...               # (27 countries total)
│   └── 📂 languages/         # Consolidated data
│       ├── 📂 ES/            # Spanish-speaking consolidation
│       ├── 📂 EN/            # English-speaking consolidation
│       └── 📂 www/           # Worldwide consolidation
├── 📂 assets/meta/video_stats/ # Detailed video analytics
├── 📂 db/ods/                # Processed datasets
│   └── 📄 trending_videos.csv # Unified trending data
└── 📂 src/                   # Source code
    ├── 📂 collection/        # Data collection scripts
    └── 📂 processing/        # Data processing scripts

📋 Data Fields Collected

🎬 Trending Video Data (per country)
Field Type Description
id String Unique YouTube video ID
trending_position Integer Position in trending list (1-50)
collection_date Date When data was collected
country_code String Country code (AR, US, etc.)
title String Video title
channelTitle String Channel name
viewCount Integer Total views
likeCount Integer Total likes
commentCount Integer Total comments
categoryId String YouTube category
publishedAt DateTime Video publication date
thumbnail_url String High-quality thumbnail URL
📈 Enhanced Video Statistics (65+ fields)

Basic Metrics:

  • Views, likes, comments, favorites
  • Duration, resolution, category
  • Publication date, language

Channel Intelligence:

  • Subscriber count, total videos
  • Channel country, keywords
  • Topic categories, status

Calculated Metrics:

  • engagement_rate: (likes + comments) / views
  • views_to_subscribers_ratio: views / subscriber_count
  • likes_to_views_ratio: likes / views
  • comments_to_views_ratio: comments / views

Technical Details:

  • File size, container format
  • Video/audio streams, bitrate
  • Processing status, quality indicators

🤖 Automation

⏰ Scheduled Workflows

Our system runs automatically using GitHub Actions:

Workflow Trigger Schedule Duration
🎬 Trending Collection Daily 1:00 AM PST ~5 minutes
📈 Video Stats Collection After trending Dependent ~10 minutes
🧪 Code Quality Push/PR On-demand ~2 minutes

🔄 Workflow Details

📊 Daily YouTube Data Pipeline

Trigger: 0 9 * * * (1 AM PST daily)

Steps:

  1. 🔄 Checkout repository
  2. 🐍 Setup Python 3.9 environment
  3. 📦 Install dependencies
  4. 🔑 Configure YouTube API key
  5. 🎬 Run trending.py (fetch trending videos)
  6. 🌐 Run trending_consolidator.py (language consolidation)
  7. 📊 Run trending_db.py (aggregate data)
  8. 💾 Commit and push changes

Output Files:

  • 27 country JSON files
  • 3 language consolidation files
  • 1 worldwide consolidation file
  • 1 unified CSV file
📈 Daily Video Statistics Collector

Trigger: After "Daily YouTube Data Pipeline" completes successfully

Steps:

  1. 🔄 Checkout repository
  2. 🐍 Setup Python 3.9 environment
  3. 📦 Install dependencies
  4. 🔑 Configure YouTube API key
  5. 📈 Run video_stats.py (detailed analytics)
  6. 💾 Commit and push changes

Output Files:

  • Daily video statistics JSON (65+ fields per video)

📊 Data Retention

Data Type Retention Period Storage Location
Trending JSONs 999 days (~2.7 years) assets/meta/trending/
Video Stats JSONs 999 days (~2.7 years) assets/meta/video_stats/
Consolidated CSV All historical data db/ods/trending_videos.csv

📈 Analytics Features

🔍 Performance Analysis Ready

Our dataset enables advanced content performance analysis through comprehensive metrics:

Metric Category Available Metrics Use Case
📊 Engagement Likes, comments, engagement rate Audience interaction analysis
👥 Audience Reach Views-to-subscribers ratio Content reach analysis
🌍 Geographic Spread Trending positions across countries Global appeal measurement
📺 Channel Context Subscriber count, channel history Relative performance analysis
⚡ Content Quality Technical specs, processing status Content optimization insights

🎯 Key Performance Indicators

Indicator Formula Interpretation
Engagement Rate (likes + comments) / views Higher = More engaging content
Audience Reach views / subscriber_count Higher = Greater content reach
Geographic Appeal countries_trending / 27 Higher = Global relevance
Trending Velocity average_position Lower = Better performance

📊 Analysis Examples

High-Performance Video Detection:

  • Video with 10K subscribers getting 2M views → High performance potential
  • Video trending in 15+ countries → Global appeal
  • Engagement rate > 5% → Highly engaging content

Channel Performance:

  • Compare views-to-subscribers across similar channels
  • Analyze trending frequency by country/language
  • Track engagement patterns over time

🛠️ Technical Details

🔧 Technology Stack

Component Technology Version Purpose
Language Python 3.9 Core development
API YouTube Data API v3 Data collection
Automation GitHub Actions Latest Workflow orchestration
Data Processing pandas Latest Data manipulation
Configuration JSON - Settings management

📋 API Usage

YouTube Data API Parts Used:

🎬 Trending Collection (5 parts)
  • snippet - Basic video information
  • statistics - View, like, comment counts
  • contentDetails - Duration, resolution
  • status - Privacy settings
  • topicDetails - Content categorization
📈 Video Statistics (10 parts)
  • All trending parts +
  • fileDetails - Technical file information
  • processingDetails - Processing status
  • suggestions - Quality recommendations
  • localizations - Multi-language content
  • liveStreamingDetails - Live streaming data

⚙️ Configuration

The system is configured via config.json:

{
    "TRENDING_METADATA_LOC": "assets/meta/trending",
    "TRENDING_ODS_DIR": "db/ods/",
    "TRENDING_COUNTRY_CODES": [
        "AR", "BO", "CL", "CO", "CR", "DO", "EC", "ES",
        "GT", "HN", "MX", "NI", "PA", "PE", "PR", "PY",
        "SV", "UY", "VE", "AU", "CA", "GB", "IE", "JM",
        "NZ", "SG", "US"
    ],
    "VIDEO_STATS_METADATA_LOC": "assets/meta/video_stats"
}

📁 Project Structure

YT_DB_Trending/
├── 📄 README.md                    # This documentation
├── 📄 LICENSE                      # MIT License
├── 📄 config.json                  # Configuration settings
├── 📂 .github/workflows/           # GitHub Actions
│   ├── 📄 tube_data_collection_pipeline.yml
│   ├── 📄 daily_video_stats_collector.yml
│   └── 📄 python-app.yml
├── 📂 src/                         # Source code
│   ├── 📄 requirements.txt         # Python dependencies
│   ├── 📂 collection/              # Data collection scripts
│   │   ├── 📄 trending.py          # Fetch trending videos
│   │   ├── 📄 trending_consolidator.py # Language consolidation
│   │   └── 📄 video_stats.py       # Detailed video analytics
│   └── 📂 processing/              # Data processing scripts
│       └── 📄 trending_db.py       # Data aggregation & cleaning
├── 📂 assets/meta/                 # Generated data
│   ├── 📂 trending/                # Trending video data
│   │   ├── 📂 countries/           # Per-country files
│   │   └── 📂 languages/           # Consolidated files
│   └── 📂 video_stats/             # Detailed analytics
├── 📂 db/ods/                      # Processed datasets
│   └── 📄 trending_videos.csv     # Unified trending data
└── 📂 analysis/                    # Analysis notebooks
    ├── 📄 README.md
    └── 📄 YouTube Performance Predictor.ipynb

🚀 Ready for Analysis

This system provides a comprehensive foundation for YouTube content performance research and machine learning applications.

Data is automatically updated daily • 27 countries • 65+ fields per video • 999-day retention


Built with ❤️ for YouTube Analytics Research

About

Youtube Trending DB

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages