Skip to content

edlsh/audio-extraction-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Audio Extraction Analysis v2.1.0

πŸŽ₯βž‘οΈπŸŽ΅βž‘οΈπŸ“ Professional Audio-to-Transcript Pipeline with Multiple Providers

CI Python 3.11+ Code style: black Ruff Security: bandit

A comprehensive, production-ready Python package that transforms video recordings into structured, actionable documentation. Features a unified CLI with interactive TUI, robust error handling, event streaming, and extensive transcription analysis including speaker diarization, topic detection, and sentiment analysis. Supports multiple transcription providers including Deepgram Nova 3, ElevenLabs, OpenAI Whisper, and NVIDIA Parakeet.

πŸš€ Quick Start (3 Steps)

1. Install

# Clone and install
git clone https://github.com/edlsh/audio-extraction-analysis.git
cd audio-extraction-analysis
uv sync

# Install FFmpeg
# macOS: brew install ffmpeg
# Ubuntu: sudo apt install ffmpeg  
# Windows: choco install ffmpeg

2. Configure API Key (or install Whisper)

# Option 1: Use Deepgram (cloud-based, full features)
# Get your API key from: https://console.deepgram.com/
export DEEPGRAM_API_KEY='your-key-here'

# Option 2: Use Whisper (local processing, no API key needed)
uv add openai-whisper torch

# Or create .env file for API keys
echo "DEEPGRAM_API_KEY=your-key-here" > .env

3. Process Your First Video

# Complete pipeline (concise default): video β†’ audio β†’ transcript β†’ single analysis
audio-extraction-analysis process meeting.mp4

# Process from URL (YouTube, Vimeo, etc.)
audio-extraction-analysis process --url "https://youtube.com/watch?v=..." --output-dir ./results

# Custom output directory
audio-extraction-analysis process video.mp4 --output-dir ./results

# Full 5-file analysis output
audio-extraction-analysis process video.mp4 --analysis-style full --output-dir ./results

🎯 Complete Workflow

MP4/URL β†’ Audio Extraction β†’ Transcription β†’ AI Analysis β†’ Structured Output
   ↓            ↓                 ↓              ↓              ↓
FFmpeg    Quality Presets   4 Providers    Smart Analysis  Actionable Docs
         (speech/high)    (Cloud/Local)   (GPT/Gemini)    (MD/HTML/JSON)

✨ New in v2.1.0

  • Standardized Health Checks: Consistent response format across all providers
  • Simplified Codebase: Reduced duplication with shared helpers and centralized exceptions
  • Better CLI: Consolidated transcription options with shared argument helpers
  • Improved Maintainability: ~175 lines of redundant code removed

✨ v2.0.0 Highlights

  • URL Ingestion: Direct processing from YouTube, Vimeo, and other platforms
  • Multiple Providers: Deepgram, ElevenLabs, Whisper, Parakeet support
  • Interactive TUI: Terminal UI with live progress and health monitoring
  • Event Streaming: JSONL output for integration and monitoring
  • Enhanced CI/CD: Full test coverage with black, ruff, and bandit checks
  • Security Hardening: Path sanitization, input validation, secure temp files

πŸ“‹ CLI Commands

Main Commands

# Full pipeline (recommended)
audio-extraction-analysis process video.mp4              # Complete workflow (concise)
audio-extraction-analysis process video.mp4 --output-dir ./results
audio-extraction-analysis process video.mp4 --analysis-style full --output-dir ./results  # five files

# Interactive TUI (Terminal User Interface) - NEW!
audio-extraction-analysis tui                            # Launch interactive interface
audio-extraction-analysis tui --input video.mp4          # Pre-populate input file
audio-extraction-analysis tui --output-dir ./results     # Pre-set output directory

# Individual steps
audio-extraction-analysis extract video.mp4              # Audio extraction only  
audio-extraction-analysis transcribe audio.mp3           # Transcription only

# Event streaming (for monitoring/integration) - NEW!
audio-extraction-analysis process video.mp4 --jsonl      # Stream events as JSONL
audio-extraction-analysis process video.mp4 --jsonl | jq '.type'  # Monitor event types

# Help and info
audio-extraction-analysis --help                         # Show all commands
audio-extraction-analysis --version                      # Show version

# Markdown export (timestamps + speakers by default)
audio-extraction-analysis export-markdown audio.mp3 \
  --output-dir ./output \
  --template default \
  --timestamps --speakers

# Or add Markdown export to existing commands
audio-extraction-analysis transcribe audio.mp3 --export-markdown --md-template detailed
audio-extraction-analysis process video.mp4 --export-markdown --md-no-speakers --md-template minimal

Common Options

Option Values Description
--quality speech, standard, high, compressed Audio extraction quality
--language en, es, fr, de, etc. Transcription language
--output-dir Directory path Where to save results
--analysis-style concise, full Output style: single analysis vs. 5 files
--verbose Flag Detailed logging
--jsonl Flag Stream events as JSONL for monitoring
--provider auto, deepgram, elevenlabs, whisper, parakeet Transcription provider selection

πŸ–₯️ Interactive TUI (Terminal User Interface)

For a guided, interactive experience with live progress monitoring, use the TUI mode:

# Launch interactive TUI
audio-extraction-analysis tui

# Or with pre-populated paths
audio-extraction-analysis tui --input video.mp4 --output-dir ./results

TUI Features

  • πŸ“Š Live Progress Monitoring: Real-time progress cards for extraction, transcription, and analysis with ETAs
  • πŸ“ Log Streaming: Filterable, color-coded logs (DEBUG, INFO, WARNING, ERROR)
  • πŸ₯ Provider Health: Monitor transcription provider status (Deepgram, ElevenLabs, Whisper, Parakeet)
  • πŸ’Ύ Auto-Save Settings: Configuration and recent files persist across sessions
  • ⌨️ Keyboard Shortcuts: Full keyboard navigation (press h or ? for help)
  • 🎯 File Browser: Browse and select files with recent files quick access

TUI Keyboard Shortcuts

Key Action
h, ? Show help screen
q Quit application
d Toggle dark mode
Esc Go back / Close screen

In Run Screen:

  • c - Cancel pipeline
  • o - Open output directory
  • a/d/i/w/e - Filter logs (all/debug+/info+/warning+/error)

TUI Workflow

Welcome β†’ Select File β†’ Configure β†’ Run β†’ View Results
   ↓           ↓            ↓         ↓        ↓
 Start      Browse       Settings  Progress  Open Output

πŸ’‘ CLI Usage Examples

Basic Usage

# Process a local video file
audio-extraction-analysis process team-meeting.mp4

# Process from URL (YouTube, Vimeo, etc.)
audio-extraction-analysis process --url "https://youtube.com/watch?v=VIDEO_ID"

# Process with custom settings
audio-extraction-analysis process interview.mp4 \
  --output-dir ./transcripts \
  --quality high \
  --language en

# Extract audio only (for manual transcription)
audio-extraction-analysis extract presentation.mp4 --quality speech

# Transcribe existing audio file
audio-extraction-analysis transcribe recording.mp3 --language es

Batch Processing

# Process multiple videos
for video in *.mp4; do
  audio-extraction-analysis process "$video" --output-dir "./results/${video%.*}"
done

Quality Settings

  • speech (default): Optimized for meetings, interviews
  • standard: Balanced quality for general content
  • high: Maximum quality for archival
  • compressed: Smaller files for quick tests

πŸ–₯️ Interactive TUI (Terminal User Interface)

The TUI provides a visual, interactive interface for processing audio and video files with real-time progress monitoring.

Features

  • πŸ“Š Live Progress Monitoring: Real-time progress bars with ETAs for each stage
  • πŸ“ File Browser: Navigate filesystem with directory tree and recent files
  • βš™οΈ Configuration Screen: Visual settings editor with persistence
  • πŸ“ Run Screen: Live progress cards and scrollable, color-coded logs
  • 🎨 Themes: Dark/light mode toggle (press d)

Launch the TUI

# Basic launch
audio-extraction-analysis tui

# With pre-populated input
audio-extraction-analysis tui --input video.mp4

# With custom output directory
audio-extraction-analysis tui --output-dir ./results

Keyboard Shortcuts

Key Action Context
q Quit application Global
d Toggle dark/light mode Global
? or h Show help Global
Enter Select/Confirm Navigation
Tab Switch panes Home Screen
/ Filter files Home Screen
c Cancel pipeline Run Screen
o Open output folder Run Screen (when complete)
Escape Go back All screens

TUI Workflow

Welcome β†’ File Selection β†’ Configuration β†’ Live Processing β†’ Auto-open Results

πŸ”„ Event Streaming for Monitoring

Stream structured events in JSONL format for monitoring, logging, or integration with other tools.

Enable Event Streaming

# Stream events to stdout
audio-extraction-analysis process video.mp4 --jsonl

# Monitor specific event types
audio-extraction-analysis process video.mp4 --jsonl | jq 'select(.type=="stage_progress")'

# Save events to file
audio-extraction-analysis process video.mp4 --jsonl > events.jsonl

# Real-time monitoring with jq
audio-extraction-analysis process video.mp4 --jsonl | jq -r '
  if .type == "stage_start" then 
    "β–Ά Starting: " + .stage
  elif .type == "stage_progress" then 
    "⏳ Progress: " + .stage + " " + (.data.completed/.data.total*100|tostring) + "%"
  elif .type == "stage_end" then 
    "βœ“ Completed: " + .stage
  else . end'

Event Types

  • stage_start: Stage beginning (extract, transcribe, analyze)
  • stage_progress: Progress updates with completed/total counts
  • stage_end: Stage completion with duration
  • artifact: File created with path and metadata
  • log, warning, error: Log messages at various levels
  • summary: Final metrics and statistics
  • cancelled: Pipeline cancellation

πŸ“ Output Structure

Depends on --analysis-style:

Concise (default)

  • {video}.mp3 - Extracted audio
  • {video}_analysis.md - Single comprehensive analysis
  • {video}_transcript.txt - Provider-formatted transcript

Full

Each processed video generates 5 structured markdown files:

File Purpose Target Audience
01_executive_summary.md High-level overview with metadata Executives, managers
02_chapter_overview.md Detailed content breakdown by topic Project managers, team leads
03_key_topics_and_intents.md Technical analysis of discussion themes Analysts, researchers
04_full_transcript_with_timestamps.md Complete searchable record All stakeholders, archives
05_key_insights_and_takeaways.md Strategic insights and action items Decision makers, implementers

Sample Output

./output/
β”œβ”€β”€ meeting_2024.mp3                    # Extracted audio
β”œβ”€β”€ 01_executive_summary.md              # 2-3 KB overview
β”œβ”€β”€ 02_chapter_overview.md               # 4-5 KB chapters  
β”œβ”€β”€ 03_key_topics_and_intents.md         # 5-6 KB analysis
β”œβ”€β”€ 04_full_transcript_with_timestamps.md # 100+ KB transcript
└── 05_key_insights_and_takeaways.md     # 3-4 KB insights

Markdown Export Output

When exporting markdown, files are organized as:

./output/
└── <source_name>/
    β”œβ”€β”€ transcript.md
    β”œβ”€β”€ metadata.json
    └── segments.json

βš™οΈ Configuration

Environment Variables

# Required for cloud providers
export DEEPGRAM_API_KEY='your-api-key-here'     # Get from console.deepgram.com
export ELEVENLABS_API_KEY='your-api-key-here'   # Get from elevenlabs.io/api

# Optional - Whisper configuration (local processing)
export WHISPER_MODEL='base'                     # tiny, base, small, medium, large
export WHISPER_DEVICE='cuda'                    # cuda or cpu
export WHISPER_COMPUTE_TYPE='float16'           # float16 or float32

# Optional - Parakeet configuration (NVIDIA models)
export PARAKEET_MODEL='stt_en_conformer_ctc_large'  # stt_en_conformer_ctc_large, stt_en_conformer_transducer_large, stt_en_fastconformer_ctc_large
export PARAKEET_DEVICE='auto'                   # auto, cuda or cpu
export PARAKEET_BATCH_SIZE=8                    # Batch size for processing
export PARAKEET_BEAM_SIZE=10                    # Beam size for decoding
export PARAKEET_USE_FP16=true                   # Use FP16 for faster processing
export PARAKEET_CHUNK_LENGTH=30                 # Audio chunk length in seconds
export PARAKEET_MODEL_CACHE_DIR='~/.cache/parakeet'  # Model cache directory

# Optional - General configuration
export LOG_LEVEL='INFO'                         # DEBUG, INFO, WARNING, ERROR
export TEMP_DIR='/custom/temp/path'             # Custom temporary directory

# Optional - TUI configuration persistence
# TUI settings are automatically saved to platform-specific config directory:
# - macOS: ~/Library/Application Support/audio-extraction-analysis/
# - Linux: ~/.config/audio-extraction-analysis/
# - Windows: %APPDATA%\audio-extraction-analysis\

.env File (Alternative)

# Create .env file in project root
echo "DEEPGRAM_API_KEY=your-key-here" > .env
echo "ELEVENLABS_API_KEY=your-key-here" >> .env
echo "WHISPER_MODEL=base" >> .env
echo "WHISPER_DEVICE=cuda" >> .env
echo "LOG_LEVEL=DEBUG" >> .env

Supported Languages

  • en - English (default)
  • es - Spanish
  • fr - French
  • de - German
  • it - Italian
  • pt - Portuguese
  • auto - Auto-detect

Whisper Model Sizes

Whisper supports multiple model sizes with different performance characteristics:

Model Parameters Disk Space RAM Usage VRAM Usage Quality
tiny 39M 75MB ~1GB ~1GB Basic
base 74M 142MB ~1GB ~1GB Good
small 244M 461MB ~2GB ~2GB Better
medium 769M 1.5GB ~5GB ~5GB Great
large 1.5B 2.9GB ~10GB ~10GB Best

Set model size with: export WHISPER_MODEL=medium

Parakeet Model Options

Parakeet supports multiple model architectures with different performance characteristics:

Model Type Accuracy Speed Memory Languages
stt_en_conformer_ctc_large CTC High Fast 4GB English
stt_en_conformer_transducer_large RNN-T Highest Medium 6GB English
stt_en_fastconformer_ctc_large CTC Medium Fastest 2GB English

Set model with: export PARAKEET_MODEL=stt_en_conformer_ctc_large

🧩 Templates

Templates are defined in src/formatters/templates.py:

  • default: Rich header, timestamps, speaker prefixes
  • minimal: Title only + text
  • detailed: Report-style header, stats, bold timestamps

Each template accepts placeholders in strings:

  • Header: {title}, {source}, {duration}, {processed_at}, {provider}, {segment_count}, {avg_confidence}
  • Segment: {timestamp}, {speaker_prefix}, {text}, {confidence}

To customize, add a new entry to TEMPLATES with keys: header, segment, speaker_prefix, timestamp_format.

🎯 Key Features

Intelligent Content Organization

  • Automatic Chapter Detection: Identifies topic transitions and creates logical sections
  • Speaker Separation: Maintains clear attribution with timestamps
  • Topic Extraction: Identifies and ranks discussion themes by frequency and importance

Advanced Analysis

  • Intent Detection: Recognizes underlying purposes in discussions
  • Sentiment Analysis: Tracks positive, neutral, and negative segments
  • Insight Generation: Extracts actionable takeaways with supporting evidence

Production Ready

  • Unified CLI with TUI: Simple commands and interactive interface for complex workflows
  • Quality Presets: Optimized audio extraction for different needs
  • Provider Health Checking: Automatic validation and fallback to working providers
  • Circuit Breaker Pattern: Fault tolerance with automatic retry and failover
  • Event Streaming: Real-time monitoring via JSONL events
  • Path Sanitization: Security-hardened file operations
  • Error Handling: Robust processing with detailed logging
  • Fast Processing: 2-hour meetings processed in ~5-7 minutes
  • Configuration Persistence: TUI settings saved across sessions

πŸ› οΈ System Requirements

Prerequisites

  • Python 3.11+ (3.12 recommended)
  • FFmpeg (for audio extraction)
  • API Key for cloud providers (Deepgram or ElevenLabs) OR local models (Whisper/Parakeet)
  • Internet connection (for cloud API calls)

Installation Steps

  1. Install FFmpeg: brew install ffmpeg (macOS) or sudo apt install ffmpeg (Ubuntu)
  2. Clone repository: git clone <repository-url>
  3. Install package with desired features:
    # Basic installation (cloud providers only)
    uv sync
    
    # With TUI (Terminal User Interface)
    uv sync --extra tui
    
    # With Whisper (local transcription)
    uv add openai-whisper torch
    
    # With Parakeet (NVIDIA models)
    uv sync --extra parakeet
    
    # With all features (recommended for development)
    uv sync --dev --extra tui --extra parakeet --extra yaml --extra redis
  4. Configure API keys (for cloud providers):
    • Deepgram: export DEEPGRAM_API_KEY='your-key' (get from console.deepgram.com)
    • ElevenLabs: export ELEVENLABS_API_KEY='your-key' (get from elevenlabs.io)
  5. Test: audio-extraction-analysis --version

Supported Formats

  • Input: MP4, MOV, AVI, MKV, MP3, WAV, M4A
  • Output: MP3 (audio), Markdown (transcripts)
  • File size: Up to 2GB
  • Duration: No limit

πŸ”§ Troubleshooting

Common Issues

"Input file not found"

# Check file path and permissions
ls -la your-file.mp4
# Use absolute path if needed
audio-extraction-analysis process /full/path/to/video.mp4

"Deepgram API key not configured"

# Set API key
export DEEPGRAM_API_KEY="your-key-here"
# Or create .env file
echo "DEEPGRAM_API_KEY=your-key-here" > .env
# Get API key from: https://console.deepgram.com/

"FFmpeg not found"

# Install FFmpeg
# macOS: brew install ffmpeg
# Ubuntu: sudo apt install ffmpeg
# Windows: choco install ffmpeg
# Verify: ffmpeg -version

"Whisper dependencies not installed"

# Install Whisper and PyTorch
uv add openai-whisper torch

# For GPU acceleration (recommended):
uv add openai-whisper torch torchaudio --index-url https://download.pytorch.org/whl/cu118

# Verify installation:
python -c "import whisper; print('Whisper installed successfully')"

"Parakeet dependencies not installed"

# Install Parakeet and NeMo dependencies
uv sync --extra parakeet
# Or directly: uv add "nemo-toolkit[asr]@1.20.0" --extra parakeet

# For GPU acceleration (recommended):
uv add "nemo-toolkit[asr]@1.20.0" torch torchaudio --extra parakeet

# Verify installation:
python -c "import nemo; print('Parakeet installed successfully')"

"TUI not working / Textual not found"

# Install TUI dependencies
uv sync --extra tui

# Or install Textual directly
uv add "textual>=0.47.0" --extra tui

# Verify TUI works
audio-extraction-analysis tui --help

"Permission denied"

# Check output directory permissions
ls -la /path/to/output/
# Create directory if needed
mkdir -p /path/to/output

Performance Tips

  • Use --quality speech for faster processing
  • Process large files in smaller segments
  • Ensure stable internet for API calls
  • Monitor disk space for output files

πŸ“Š Use Cases

🏒 Business Meetings

  • Input: 2-hour team meeting recording
  • Output: Executive summary, action items, decisions
  • Time: ~5-7 minutes processing

πŸŽ“ Training Sessions

  • Input: Multi-hour training video
  • Output: Searchable reference, key concepts, Q&A
  • Benefits: Reusable training materials

πŸ‘₯ Customer Interviews

  • Input: Interview recordings
  • Output: Insights, pain points, feature requests
  • Benefits: Structured feedback analysis

🎧 Podcast/Webinar Analysis

  • Input: Long-form content
  • Output: Chapter breakdown, topics, quotes
  • Benefits: Content repurposing, highlights

Performance Metrics

  • Accuracy: 95%+ with Deepgram Nova 3, 85%+ with Whisper large
  • Speed: Real-time processing capability (cloud), 0.5-5x real-time (Whisper)
  • Output: 5 files, 100-150KB total
  • Languages: 10+ supported languages (Whisper supports 100+ languages)

πŸ“š Documentation

For detailed information:

  • CLI Help (audio-extraction-analysis --help) - Discover the latest command syntax directly from the tool
  • HTML Dashboard Guide - Learn how to render interactive dashboards from analysis output
  • Examples Directory - Browse runnable samples and generated markdown outputs

🀝 Contributing

# Development setup
git clone https://github.com/edlsh/audio-extraction-analysis.git
cd audio-extraction-analysis
uv sync --dev

# Install Whisper for testing
uv add openai-whisper torch

# Install Parakeet for testing
uv add "nemo-toolkit[asr]@1.20.0" --extra parakeet

# Run tests (default - unit tests only)
pytest

# Run specific test profiles
./scripts/run_tests.sh --profile fast       # Fast unit tests only
./scripts/run_tests.sh --profile integration # Integration tests
./scripts/run_tests.sh --profile e2e        # End-to-end tests
./scripts/run_tests.sh --profile benchmark  # Performance benchmarks
./scripts/run_tests.sh --profile all        # Complete test suite

# Run with mock provider (no API keys needed)
AUDIO_TEST_MODE=1 pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Code quality checks (CI/CD compliant)
black src/ tests/                    # Format code
ruff check src/ tests/                # Lint code
bandit -r src -ll                     # Security analysis
./scripts/run_static_checks.sh       # Run all static checks

πŸ“„ License

This project is provided as-is for professional use. Adapt and modify according to your organization's needs.


Transform your recordings into structured, actionable documentation.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published