Yu Guo1 Ying Shan 2 Fei Wang 1
CVPR 2023
TL;DR: single portrait image 🙎♂️ + audio 🎤 = talking head video 🎞.
- 🔥 The extension of the stable-diffusion-webui is online. Checkout more details here.
sadtalker-webui.mp4
- 🔥
full image mode
is online! checkout here for more details.
still+enhancer in v0.0.1 | still + enhancer in v0.0.2 | input image @bagbag1815 |
---|---|---|
still_e_n.mp4 |
full_body_2.bus_chinese_enhanced.mp4 |
-
🔥 Several new mode, eg,
still mode
,reference mode
,resize mode
are online for better and custom applications. -
🔥 Happy to see more community demos at bilibili, Youtube and twitter #sadtalker.
📋 Changelog (Previous changelog can be founded here)
-
[2023.04.12]: Fixed the sd-webui safe issues becasue of the 3rd packages, optimize the output path in
sd-webui-extension
. -
[2023.04.08]: ❗️❗️❗️ In v0.0.2, we add a logo watermark to the generated video to prevent abusing since it is very realistic.
-
[2023.04.08]: v0.0.2, full image animation, adding baidu driver for download checkpoints. Optimizing the logic about enhancer.
Previous TODOs
- Generating 2D face from a single Image.
- Generating 3D face from Audio.
- Generating 4D free-view talking examples from audio and a single image.
- Gradio/Colab Demo.
- Full body/image Generation.
- training code of each componments.
- Audio-driven Anime Avatar.
- interpolate ChatGPT for a conversation demo 🤔
- integrade with stable-diffusion-web-ui. (stay tunning!)
⚙️ Installation (中文windows教程|日本語コース )
git clone https://github.com/Winfredy/SadTalker.git
cd SadTalker
conda create -n sadtalker python=3.8
conda activate sadtalker
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
conda install ffmpeg
pip install -r requirements.txt
### tts is optional for gradio demo.
### pip install TTS
More tips about installnation on Windows and the Docker file can be founded here
Installing in Sd-Webui-Extension.
CLICK ME
You can run the following script to put all the models in the right place.
bash scripts/download_models.sh
OR download our pre-trained model from google drive or our lastest github release page, and then, put it in ./checkpoints.
OR we provided the downloaded model in 百度云盘 提取码: sadt.
Model | Description |
---|---|
checkpoints/auido2exp_00300-model.pth | Pre-trained ExpNet in Sadtalker. |
checkpoints/auido2pose_00140-model.pth | Pre-trained PoseVAE in Sadtalker. |
checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker. |
checkpoints/mapping_00109-model.pth.tar | Pre-trained MappingNet in Sadtalker. |
checkpoints/facevid2vid_00189-model.pth.tar | Pre-trained face-vid2vid model from the reappearance of face-vid2vid. |
checkpoints/epoch_20.pth | Pre-trained 3DMM extractor in Deep3DFaceReconstruction. |
checkpoints/wav2lip.pth | Highly accurate lip-sync model in Wav2lip. |
checkpoints/shape_predictor_68_face_landmarks.dat | Face landmark model used in dilb. |
checkpoints/BFM | 3DMM library file. |
checkpoints/hub | Face detection models used in face alignment. |
🔮 Quick Start (Best Practice)
python inference.py --driven_audio <audio.wav> --source_image <video.mp4 or picture.png> --enhancer gfpgan
The results will be saved in results/$SOME_TIMESTAMP/*.mp4
.
More examples and configuration and tips can be founded in the >>> best practice documents <<<.
Using --still
to generate a natural full body video. You can add enhancer
to improve the quality of the generated video.
python inference.py --driven_audio <audio.wav> \
--source_image <video.mp4 or picture.png> \
--result_dir <a file to store results> \
--still \
--preprocess full \
--enhancer gfpgan
A local gradio demo similar to our hugging-face demo can be run by:
## you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced.
python app.py
If you find our work useful in your research, please consider citing:
@article{zhang2022sadtalker,
title={SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation},
author={Zhang, Wenxuan and Cun, Xiaodong and Wang, Xuan and Zhang, Yong and Shen, Xi and Guo, Yu and Shan, Ying and Wang, Fei},
journal={arXiv preprint arXiv:2211.12194},
year={2022}
}
Facerender code borrows heavily from zhanglonghao's reproduction of face-vid2vid and PIRender. We thank the authors for sharing their wonderful code. In training process, We also use the model from Deep3DFaceReconstruction and Wav2lip. We thank for their wonderful work.
- StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN (ECCV 2022)
- CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior (CVPR 2023)
- VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild (SIGGRAPH Asia 2022)
- DPE: Disentanglement of Pose and Expression for General Video Portrait Editing (CVPR 2023)
- 3D GAN Inversion with Facial Symmetry Prior (CVPR 2023)
- T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations (CVPR 2023)
This is not an official product of Tencent. This repository can only be used for personal/research/non-commercial purposes.
LOGO: color and font suggestion: ChatGPT, logo font:Montserrat Alternates .
All the copyright of the demo images and audio are from communities users or the geneartion from stable diffusion. Free free to contact us if you feel uncomfortable.