Automatically transcribe, translate, and dub videos into different languages using AI-powered text-to-speech.
- ποΈ Speech Recognition: Transcribe audio using OpenAI Whisper
- π Translation: Translate to 100+ languages via Google Translate
- π£οΈ Three TTS Engines:
- Edge TTS: High-quality Microsoft voices (recommended)
- Silero: Fast Russian TTS (offline after first download)
- XTTS: Voice cloning from 6-10 second samples
- π¬ Video Preservation: Keeps original video, mixes original audio (20%) with dubbed audio (150%)
- π Subtitle Generation: Creates SRT files for translated text
# Fedora/RHEL
sudo dnf install ffmpeg python3.10 python3.10-devel
# Ubuntu/Debian
sudo apt install ffmpeg python3.10 python3.10-devel
# macOS
brew install ffmpegpython3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install openai-whisper pysrt edge-tts deep-translator soundfile tqdm
pip install TTS # Only needed for XTTS voice cloningAutoDub now supports fully offline translation using Ollama. This is ideal for privacy, avoiding API limits, and achieving more context-aware translations.
For Linux (Fedora/Ubuntu/etc.):
curl -fsSL [https://ollama.com/install.sh](https://ollama.com/install.sh) | sh
ollama pull llama3Here is the concise guide on how to get started using your setup.sh script, formatted in Markdown: π Quick Start Guide
Follow these three steps to set up and start dubbing your videos:
- Prepare Files
Ensure you have the following files in your project directory:
setup.sh (The installer)
autodub_v4_1.py (The main engine)
install.txt (List of dependencies)
- Run Installation
Open your terminal in the project folder and execute:
chmod +x setup.sh && ./setup.sh# Dub to Russian (default)
python autodub.py video.mp4
# Dub to English
python autodub.py video.mp4 --target_lang en
# Dub to German
python autodub.py video.mp4 --target_lang de# Default voice (aidar)
python autodub.py video.mp4 --tts silero
# Female voice
python autodub.py video.mp4 --tts silero --silero_voice xenia
# Available voices: aidar, baya, kseniya, xenia, eugene# Requires 6-10 second clean voice sample
python autodub.py video.mp4 --tts xtts --ref_voice my_voice.wav --target_lang en# Use Ollama with default llama3 model
./run.sh video.mp4 --translator ollama
# Use a specific model (e.g., Mistral)
./run.sh video.mp4 --translator ollama --ollama_model mistral positional arguments:
video Input video file
options:
-h, --help Show help message
--tts {edge,silero,xtts}
TTS engine (default: edge)
--target_lang LANG Target language code (default: ru)
Supports: ru, en, de, fr, es, it, pt, ja, zh, etc.
--silero_voice {aidar,baya,kseniya,xenia,eugene}
Silero voice for Russian (default: aidar)
--ref_voice FILE Reference WAV for XTTS voice cloning
--keep-temp Keep temporary files after processing
Edge TTS supports 100+ languages. Common codes:
ru- Russianen- Englishde- Germanfr- Frenches- Spanishit- Italianpt- Portugueseja- Japanesezh- Chinese
Full list: https://speech.microsoft.com/portal/voicegallery
The script generates:
{video}_dubbed.mp4- Video with dubbed audio{video}_{lang}.srt- Subtitle file with translations
| Engine | Speed | Quality | Languages | Notes |
|---|---|---|---|---|
| Edge TTS | Fast | βββββ | 100+ | Best quality, requires internet |
| Silero | Very Fast | βββ | Russian only | Offline, robotic |
| XTTS | Slow | βββββ | 16 | Voice cloning, GPU recommended |
pip install soundfileThis is already patched in the code. If you still see it, update PyTorch:
pip install --upgrade torch torchaudioThe script will auto-download on first run (~40MB). Check your internet connection.
Use CPU mode or reduce video length. For long videos, split into segments.
Use Edge TTS or XTTS instead. Silero is designed for speed, not quality.
- Privacy: Your transcripts and translations never leave your local machine.
- Custom Context: LLMs can handle nuances, slang, and technical terms better than basic translators.
- Cost: 100% free with no character limits or subscription fees.
- Offline Workflow: Combined with Silero or XTTS, you can dub videos without an active internet connection.
| Feature | Google Translate | Ollama (Local LLM) |
|---|---|---|
| Speed | Instant | Depends on your GPU/RAM |
| Setup | Zero setup | Requires model download |
| Internet | Required | Not required |
| Quality | Literal / Standard | Contextual / Natural |
- Extract Audio: FFmpeg extracts mono 16kHz WAV
- Transcribe: Whisper "base" model transcribes with timestamps
- Translate: Google Translate API translates segments
- Synthesize: TTS engine generates speech for each subtitle
- Merge: FFmpeg mixes original (20%) + dubbed (150%) audio with video
- Original audio: 20% volume (background)
- Dubbed audio: 150% volume (foreground)
- Output: AAC 128kbps, video copied without re-encoding
MIT License - see LICENSE file
- OpenAI Whisper - Speech recognition
- Edge-TTS - Microsoft TTS
- Silero Models - Russian TTS
- Coqui TTS - XTTS voice cloning
Issues and pull requests welcome!