Subdub is a command-line tool for creating subtitles from video, translating them, generating dubbed audio and syncing dubbed audio with the original video. It was created to enhance the dubbing functionality of Pandrator, but can be used on its own, albeit with limited functionality. Pandrator provides a GUI that makes it possible to preview, edit, and regenerate subtitle audio before aligning and synchronising it, as well as manage the entire workflow.
Dubbing sample, including translation from Russian (video source):
pandrator_example_dubbing.mp4
-
Clone the repository:
git clone https://github.com/lukaszliniewicz/Subdub.git
-
Move to the Subdub directory:
cd Subdub
-
Install requirements:
pip install -r requirements.txt
-
Make sure WhisperX is available on your system.
-
Ensure the XTTS API Server is running.
Basic usage:
python Subdub.py -i [input_file] -sl [source_language] -tl [target_language] -task [task_name] -tts_voice [path to a 6-12s 22050hz mono .wav file]
If you want to perform translation, you need to have an Anthropic, OpenAI or DeepL API key and provide it as an argument. Using the local LLM API (Text Generation WebUI's) doesn't require an api key. You can also set the keys as environmental variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, DEEPL_API_KEY).
Subdub offers several task modes to suit different needs:
- Description: Performs all steps: transcription, translation, TTS generation, and audio synchronization.
- Input: Video file
- Output: Translated subtitles, TTS audio, and final dubbed video
- Usage:
python Subdub.py -i video.mp4 -sl English -tl Spanish -task full
- Description: Transcribes the audio from a video file.
- Input: Video file
- Output: SRT subtitle file in the source language
- Usage:
python Subdub.py -i video.mp4 -sl English -task transcribe
. You can also specify the whisper model.
- Description: Translates existing subtitles to the target language.
- Input: SRT subtitle file
- Output: Translated SRT subtitle file
- Usage:
python Subdub.py -i subtitles.srt -sl English -tl French -task translate
. You can also specify the translation api as well as the model you want to use (for Anthropic and OpenAI).
- Description: Generates Text-to-Speech audio from existing subtitles.
- Input: SRT subtitle file
- Output: WAV audio files for each subtitle block
- Usage:
python Subdub.py -i subtitles.srt -tl Spanish -task tts -tts_voice voice.wav
- Description: Creates speech blocks JSON from subtitles for advanced audio processing.
- Input: SRT subtitle file
- Output: JSON file containing speech blocks
- Usage:
python Subdub.py -i subtitles.srt -task speech_blocks
- Description: Synchronizes existing TTS audio with the original video.
- Input: Video file, speech blocks JSON, and TTS audio files
- Output: Final dubbed video
- Usage:
python Subdub.py -i video.mp4 -task sync -session existing_session
Argument | Description | Default |
---|---|---|
-i , --input |
Input video or subtitle file path (required) | - |
-sl , --source_language |
Source language | English |
-tl , --target_language |
Target language for translation | - |
-task |
Task to perform | full |
-session |
Session name or path | - |
-llm-char |
Character limit for translation | 4000 |
-ant_api |
Anthropic API key | - |
-evaluate |
Perform evaluation of translations | False |
-translation_memory |
Enable translation memory/glossary feature | False |
-tts_voice |
Path to TTS voice WAV file | - |
-whisper_model |
Whisper model for transcription | large-v2 |
-llm-model |
LLM model for translation (sonnet, haiku, gpt-4o and gpt-4o-mini) | sonnet |
-merge_threshold |
Max time (ms) between subtitles to merge | 1 |
-
Evaluate (-evaluate): When enabled, this feature performs an additional pass on the translated subtitles. It uses the AI model to review and improve the initial translations, trying to produce better quality and consistency. Only bigger models like Sonnet and GPT-4o can consistently adhere to the instructions. It should be used for output that must be as good as can be achieved with AI translation. Be mindful of the additional cost. Generally, Sonnet performs the best.
-
Translation Memory/Glossary (-translation_memory): This feature maintains a glossary of terms and their translations. It helps ensure consistency across the translation, especially for domain-specific terms or recurring phrases. The glossary is updated throughout the translation process and can be reused in future sessions.
Subdub follows a logical workflow to process videos and generate dubbed audio:
-
SRT Generation: If the input is a video file, Subdub uses WhisperX to transcribe the audio and generate an SRT file.
-
Translation Blocks: The SRT file is divided into translation blocks, considering character limits and sentence structures.
-
Translation: Each block is translated using an API. If translation memory is enabled, a glossary is used and updated during this process.
-
Translated SRT: The translated blocks are reassembled into a new SRT file in the target language.
-
Speech Blocks: The translated SRT is processed to create speech blocks, which are optimized segments for Text-to-Speech generation.
-
TTS Generation: Audio is generated for each speech block using the XTTS API Server.
-
Alignment Blocks: Speech blocks are aligned with the original video timing.
-
Synchronization and Mixing: The generated audio is synchronized with the video. During this process, the original audio volume is lowered when dubbed audio is playing.
-
Full process with translation memory: python Subdub.py -i video.mp4 -sl English -tl Spanish -task full -translation_memory
-
Translate existing subtitles and evaluate the translation: python Subdub.py -i subtitles.srt -sl English -tl French -task translate -evaluate
-
Generate TTS for translated subtitles with a custom voice: python Subdub.py -i translated.srt -tl German -task tts -tts_voice custom_voice.wav
- FFmpeg
- WhisperX
- Anthropic, OpenAI, DeepL or Text Generation WebUI API (for translation)
- XTTS API Server (for Text-to-Speech)
Ensure all dependencies are installed and properly configured before running Subdub.