Professional video dubbing solution with automated transcription, translation, and text-to-speech generation using Parakeet-TDT-0.6b-v2, Gemini AI, and Edge TTS.
- Automatic Transcription: Extract and transcribe audio from videos using Parakeet-TDT-0.6b-v2
- AI Translation: Translate content using Google Gemini AI
- Manual Translation: Support for custom translations in JSON format
- TTS Generation: High-quality text-to-speech with multiple voice options
- Video Synchronization: Automatically sync dubbed audio with original video
- Multiple Audio Processing: Upload one video and multiple audio files
- Batch Output: Generate multiple dubbed videos automatically
- Efficient Workflow: Process multiple variations quickly
- Install through Pinokio platform
- Click "Install" to set up dependencies
- Click "Start Application" to launch
# Clone repository
git clone <repository-url>
cd video-dubbing-pipeline
# Create virtual environment
python -m venv env
source env/bin/activate # On Windows: env\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Start application
python app.py
- Python 3.8+
- FFmpeg (for video/audio processing)
- CUDA GPU (recommended for optimal performance)
- 4GB+ VRAM (for ASR model)
- Google Gemini API Key (for translation and TTS)
- Get your API key from Google AI Studio
- Multiple keys supported for higher rate limits
- Configure API Keys: Enter your Gemini API keys (one per line)
- Upload Video: Select your video file for dubbing
- Choose Voice: Select voice name (e.g., Kore, Puck, Zephyr)
- Select Mode:
- Automatic: AI-powered translation
- Manual: Provide custom JSON translation
- Run Pipeline: Click "Run Dubbing Pipeline"
- Download Results: Get dubbed video and audio files
- Configure API Keys: Enter your Gemini API keys
- Upload Video: Select base video file
- Upload Audio Files: Select multiple audio files
- Choose Voice: Select voice configuration
- Create Batch: Click "Create Batch Videos"
- Download All: Get all generated videos
├── app.py # Main Gradio application
├── requirements.txt # Python dependencies
├── README.md # This file
├── PINOKIO.MD # Pinokio platform documentation
├── install.js # Pinokio installation script
├── start.js # Pinokio startup script
├── pinokio.js # Pinokio configuration
├── real_gemini_service.py # Gemini AI translation service
├── final_working_tts.py # TTS generation service
├── simple_edge_tts.py # Edge TTS integration
└── batch_dubbed_videos/ # Output directory for batch processing
- Kore: Balanced, natural voice
- Puck: Energetic, youthful voice
- Zephyr: Calm, professional voice
- Custom: Specify your own voice name
- Target Language: Currently optimized for Hindi
- Tone: Neutral, professional tone
- Dialect: Hindi Devanagari script
- Genre: General content adaptation
- Video: MP4, AVI, MOV, MKV, WebM
- Audio: WAV, MP3, FLAC, M4A, OGG
- Video: MP4 (H.264 + AAC)
- Audio: WAV (16-bit, 16kHz)
- Model Loading Errors: Ensure sufficient VRAM (4GB+)
- FFmpeg Not Found: Install FFmpeg and add to PATH
- API Key Errors: Verify Gemini API key validity
- CUDA Issues: Install CUDA toolkit for GPU acceleration
- Use GPU for faster transcription
- Provide multiple API keys for higher rate limits
- Process shorter videos for faster results
- Ensure stable internet connection for API calls
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
For issues and questions:
- Check the troubleshooting section
- Review the documentation
- Open an issue on GitHub
Made with ❤️ for content creators and developers