YouTube Video Transcriber

A Python tool to generate high-accuracy, line-by-line transcripts from any YouTube video with audio. While optimized for song lyrics, it works great for podcasts, interviews, lectures, and other speech content.

Features

Downloads and transcribes any YouTube videos with audio
Shows progress bars for download and transcription
Intelligently formats content with natural line breaks
Supports multiple Whisper model sizes for different accuracy levels
Special optimization for music and lyrics transcription
Parallel processing using multiple CPU cores
GPU acceleration with CUDA support
Fast mode for quicker transcription
Beautifully animated progress bars
Advanced logging with file output support

Prerequisites

Python 3.7+
FFmpeg (required for audio processing)

Installing FFmpeg

Windows: Download from FFmpeg website or install with Chocolatey: choco install ffmpeg
macOS: Install using Homebrew: brew install ffmpeg
Linux: Install using package manager, e.g., apt install ffmpeg or dnf install ffmpeg

Installation

Clone or download this repository
Install the required Python packages:

pip install -r requirements.txt

Usage

Basic usage:

python transcribe.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

The tool works with any YouTube content containing speech or music:

Song lyrics
Podcasts
Interviews
Lectures
Speeches
Any other audio content

Options

Flag	Shortcut	Description	Default
`--output FILE`	`-o`	Save lyrics to this file	stdout
`--model SIZE`	`-m`	Choose model size (tiny, base, small, medium, large)	medium
`--keep-audio`		Keep the downloaded audio file after processing	False
`--audio-output PATH`		Specify path to save the downloaded audio
`--min-line-duration N`		Minimum pause duration (seconds) for line breaks	1.0
`--raw-data`		Save raw transcription data as JSON	False
`--device TYPE`		Device to use for inference (cpu or cuda)	auto
`--chunk-size N`		Process audio in chunks of this size in seconds
`--fast`		Enable fast mode (sacrifices some accuracy for speed)	False
`--max-duration N`		Maximum duration (seconds) to transcribe from video start
`--verbose`	`-v`	Enable verbose logging for debugging	False
`--log-file FILE`		Save logs to specified file in addition to console
`--safe-mode`		Use safe mode without word-level timestamps	False

Examples

Transcribe content with higher accuracy:

python transcribe.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" -m large -o transcript.txt

Customize line breaks with longer pauses:

python transcribe.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --min-line-duration 2.0 -o transcript.txt

Get raw data with timestamps:

python transcribe.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --raw-data -o transcript.txt

Performance Optimization Examples

For fastest transcription with slightly lower quality:

python transcribe.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --fast

Use GPU acceleration (if you have a CUDA-compatible GPU):

python transcribe.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --device cuda

Process a long audio file in chunks:

python transcribe.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --chunk-size 30

Only transcribe the first minute of a video:

python transcribe.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --max-duration 60

Fastest possible transcription (GPU + fast mode):

python transcribe.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --device cuda --fast

Logging Options

Enable detailed debug logs:

python transcribe.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" -v

Save logs to a file for later analysis:

python transcribe.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --log-file transcription.log

Both verbose console output and file logging:

python transcribe.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" -v --log-file debug.log

For Genius.com Users

This tool is specially optimized for creating lyrics for Genius.com:

Lines are intelligently separated based on natural pauses in the song
Default model is set to "medium" for better lyrics accuracy
Output is properly formatted and cleaned for easy copy/paste to Genius

Performance Tips

GPU Acceleration: If you have a NVIDIA GPU, use --device cuda for significantly faster transcription
Parallel Processing: For long videos, use --chunk-size 30 to process in 30-second chunks
Fast Mode: The --fast flag will:
- Use beam size 1 for faster inference
- Automatically downgrade from medium to small model for better speed
- Disable word-level timestamps for faster processing
Limit Duration: For testing or when you only need part of a song, use --max-duration 60 to only process the first minute
Model Selection: For quick drafts, use -m small or -m base

Virtual Environment Setup

For best results, install in a dedicated virtual environment:

# Create a virtual environment
python -m venv youtube_transcriber_env

# Activate it (Windows)
youtube_transcriber_env\Scripts\activate

# Activate it (Linux/Mac)
source youtube_transcriber_env/bin/activate

# Install dependencies
pip install -r requirements.txt

Alternatively, use uv for faster installation:

# Install uv
pip install uv

# Create and activate virtual environment with uv
uv venv
source .venv/bin/activate  # On Linux/Mac
# OR
.venv\Scripts\activate     # On Windows

# Install dependencies
uv pip install -r requirements.txt

Notes

The script requires an internet connection to download the YouTube video and (first time only) to download the Whisper model.
For transcribing speech (interviews, lectures, etc.), the "base" or "small" model often provides sufficient accuracy
For song lyrics, the "medium" or "large" model provides much better accuracy and is highly recommended
Adjust the --min-line-duration parameter based on content type:
- Fast speech or songs: try lower values (0.8-1.2 seconds)
- Slow speech or songs: try higher values (2.0-3.0 seconds)
Progress bars show download and transcription progress in real-time
Additional dependencies: For parallel processing, you need to install librosa (pip install librosa)

Contributing

Contributions are welcome! If you'd like to contribute to this project:

Fork the repository
Create a new branch (git checkout -b feature/your-feature-name)
Make your changes
Commit your changes (git commit -m 'Add some feature')
Push to the branch (git push origin feature/your-feature-name)
Open a Pull Request

Please ensure your code follows the existing style and includes appropriate documentation.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

OpenAI Whisper - The AI speech recognition model that powers the transcription
yt-dlp - For YouTube video downloading capabilities
PyTorch - For the underlying machine learning framework
tqdm - For the progress bar visualizations

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
transcribe.py		transcribe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

YouTube Video Transcriber

Features

Prerequisites

Installing FFmpeg

Installation

Usage

Options

Examples

Performance Optimization Examples

Logging Options

For Genius.com Users

Performance Tips

Virtual Environment Setup

Notes

Contributing

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

jaredescott/youtube_transcriber

Folders and files

Latest commit

History

Repository files navigation

YouTube Video Transcriber

Features

Prerequisites

Installing FFmpeg

Installation

Usage

Options

Examples

Performance Optimization Examples

Logging Options

For Genius.com Users

Performance Tips

Virtual Environment Setup

Notes

Contributing

License

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages