This project enables you to create lip-synced videos from an input image and generated audio from a given text prompt. The pipeline converts text into speech (TTS), and then uses the Wav2Lip model to animate the image with synchronized lip movements to match the audio.
- Convert any text into audio using TTS (Text-to-Speech)
- Use Wav2Lip to generate a lip-synced video from a static face image and audio
- Automatic audio extraction and video frame preparation
- Resize factor customization to fit different GPU memory requirements
- CUDA support for fast inference
git clone https://github.com/your-bchachar/lip_sync_video_generator.git
cd lip_sync_video_generator
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txt
Make sure you have
ffmpeg
installed and accessible via command line.
Clone the official Wav2Lip repository into the project root directory:
git clone https://github.com/Rudrabha/Wav2Lip.git
Download the wav2lip.pth
file from this Hugging Face link and place it in ./Wav2Lip/checkpoints/
- Input text is converted into audio using a TTS system.
- The audio is saved to
./audio/output.wav
- The image and audio are passed to the Wav2Lip model.
- A video is generated where the lips of the image move in sync with the spoken audio.
python main.py
- Your GPU (e.g., RTX 4090) may not be supported by the current PyTorch installation.
- Fix: Reinstall PyTorch with support for compute capability 8.9:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
- Occurs when the wrong model file is used (e.g., a TorchScript
.pt
file instead of a PyTorch.pth
checkpoint) - Fix: Use the
.pth
file from the correct source (e.g., Hugging Face link above)
- Fix: Use the
--resize_factor
argument (e.g., 2 or 4)
- Fix: Ensure
ffmpeg
is installed and accessible from the command line.
.
├── Wav2Lip
│ ├── checkpoints
│ │ └── wav2lip.pth
├── audio
│ └── output.wav
├── images
│ └── sample.jpg
├── video
│ └── result_voice.mp4
├── generate_lipsync_video.py
└── requirements.txt
MIT License. See LICENSE
file for more information.