SubGen-WhisperX

A powerful subtitle generation tool using WhisperX for accurate speech-to-text transcription with precise timestamp alignment.

Features

🎯 Accurate speech recognition using WhisperX
⚡ GPU acceleration support (CUDA)
🎵 Handles both video and audio files
📁 Batch processing support
⏱️ Performance timing and logging
🔧 Multiple language model options
🎛️ Configurable compute device selection
🌐 Multi-language support with auto-detection
📑 Text file input support for batch processing
🔄 Automatic model size selection

Prerequisites

Python 3.10 or later
NVIDIA GPU with CUDA 12 support (optional, for GPU acceleration)
Latest driver from NVIDIA (Very important, when using cuda flag)
FFmpeg
Git

Installation

The easy way

Clone the repository:

git clone https://github.com/protik09/subgen-whisperx.git --depth=1
cd subgen-whisperx

Create and activate a virtual environment:

In Powershell:

.\uv_init.ps1

or in bash

.\uv_init.sh

Usage

Basic Usage

Generate subtitles for a single video file:

python subgen_whisperx.py -f path/to/video.mp4

Advanced Options

Process all media files in a directory:

python subgen_whisperx.py -l en -d path/to/directory

Specify compute device and model size:

python subgen_whisperx.py -f video.mp4 -c cuda -m medium

Set logging level:

python subgen_whisperx.py -f video.mp4 -l DEBUG

Command Line Arguments

Argument	Description	Default
`-f`, `--file`	Path to input media file	None
`-d`, `--directory`	Path to directory containing media files	None
`-c`, `--compute_device`	Device for computation (`cuda` or `cpu`)	Auto
`-m`, `--model_size`	WhisperX model size	Auto
`-l`, `--language`	Subtitle language	None
`-log`, `--log-level`	Logging level	`ERROR`
`-t-`, `--txt`	Text file with file/folder paths	None

Output

The script generates SRT subtitle files in the same directory as the input media:

Format: filename.{language}-ai.srt
Example: Meetings-0822.en-ai.srt for a video called Meetings-0822.mp4

Troubleshooting

If the automatic model selection leads to the CUDA Out of Memory (OOM) Issue, just manually select the next smaller model using the -m flag. For example, if the -m medium flag causes a CUDA OOM, then use -m small.en or -m small.

Performance

GPU acceleration provides significantly faster processing
There is CPU fallback if GPU access fails
Progress indicators show real-time status
Performance timing information displayed after completion

License

This project operates under the MIT Open Source License

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Acknowledgments

WhisperX for the core transcription technology
FFmpeg for media processing capabilities

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
libs		libs
objects		objects
scripts		scripts
test_files		test_files
tests		tests
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
__init__.py		__init__.py
init_env.ps1		init_env.ps1
init_env.sh		init_env.sh
log_format.json		log_format.json
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
subgen_whisperx.py		subgen_whisperx.py
test_gui.py		test_gui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SubGen-WhisperX

Features

Prerequisites

Installation

The easy way

Usage

Basic Usage

Advanced Options

Command Line Arguments

Output

Troubleshooting

Performance

License

Contributing

Acknowledgments

About

Contributors 2

Languages

License

protik09/subgen-whisperx

Folders and files

Latest commit

History

Repository files navigation

SubGen-WhisperX

Features

Prerequisites

Installation

The easy way

Usage

Basic Usage

Advanced Options

Command Line Arguments

Output

Troubleshooting

Performance

License

Contributing

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages