Skip to content

[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

License

Notifications You must be signed in to change notification settings

ohyeahwuhawuha/SadTalker

Repository files navigation

            Open In Colab       Hugging Face Spaces

Wenxuan Zhang *,1,2Xiaodong Cun *,2Xuan Wang 3Yong Zhang 2Xi Shen 2
Yu Guo1 Ying Shan 2   Fei Wang 1

1 Xi'an Jiaotong University   2 Tencent AI Lab   3 Ant Group  

CVPR 2023

sadtalker

TL;DR:       single portrait image 🙎‍♂️      +       audio 🎤       =       talking head video 🎞.


🔥 Highlight

sadtalker-webui.mp4
  • 🔥 full image mode is online! checkout here for more details.
still+enhancer in v0.0.1 still + enhancer in v0.0.2 input image @bagbag1815
still_e_n.mp4
full_body_2.bus_chinese_enhanced.mp4
  • 🔥 Several new mode, eg, still mode, reference mode, resize mode are online for better and custom applications.

  • 🔥 Happy to see more community demos at bilibili, Youtube and twitter #sadtalker.

📋 Changelog (Previous changelog can be founded here)

  • [2023.04.12]: Fixed the sd-webui safe issues becasue of the 3rd packages, optimize the output path in sd-webui-extension.

  • [2023.04.08]: ❗️❗️❗️ In v0.0.2, we add a logo watermark to the generated video to prevent abusing since it is very realistic.

  • [2023.04.08]: v0.0.2, full image animation, adding baidu driver for download checkpoints. Optimizing the logic about enhancer.

🚧 TODO

Previous TODOs
  • Generating 2D face from a single Image.
  • Generating 3D face from Audio.
  • Generating 4D free-view talking examples from audio and a single image.
  • Gradio/Colab Demo.
  • Full body/image Generation.
  • training code of each componments.
  • Audio-driven Anime Avatar.
  • interpolate ChatGPT for a conversation demo 🤔
  • integrade with stable-diffusion-web-ui. (stay tunning!)

Installing Sadtalker on Linux:

git clone https://github.com/Winfredy/SadTalker.git

cd SadTalker 

conda create -n sadtalker python=3.8

conda activate sadtalker

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

conda install ffmpeg

pip install -r requirements.txt

### tts is optional for gradio demo. 
### pip install TTS

More tips about installnation on Windows and the Docker file can be founded here

Installing in Sd-Webui-Extension.

Download Trained Models

CLICK ME

You can run the following script to put all the models in the right place.

bash scripts/download_models.sh

OR download our pre-trained model from google drive or our lastest github release page, and then, put it in ./checkpoints.

OR we provided the downloaded model in 百度云盘 提取码: sadt.

Model Description
checkpoints/auido2exp_00300-model.pth Pre-trained ExpNet in Sadtalker.
checkpoints/auido2pose_00140-model.pth Pre-trained PoseVAE in Sadtalker.
checkpoints/mapping_00229-model.pth.tar Pre-trained MappingNet in Sadtalker.
checkpoints/mapping_00109-model.pth.tar Pre-trained MappingNet in Sadtalker.
checkpoints/facevid2vid_00189-model.pth.tar Pre-trained face-vid2vid model from the reappearance of face-vid2vid.
checkpoints/epoch_20.pth Pre-trained 3DMM extractor in Deep3DFaceReconstruction.
checkpoints/wav2lip.pth Highly accurate lip-sync model in Wav2lip.
checkpoints/shape_predictor_68_face_landmarks.dat Face landmark model used in dilb.
checkpoints/BFM 3DMM library file.
checkpoints/hub Face detection models used in face alignment.

🔮 Quick Start (Best Practice)

Animating Portrait Image from default config.

python inference.py --driven_audio <audio.wav> --source_image <video.mp4 or picture.png> --enhancer gfpgan 

The results will be saved in results/$SOME_TIMESTAMP/*.mp4.

More examples and configuration and tips can be founded in the >>> best practice documents <<<.

Full body/image Generation

Using --still to generate a natural full body video. You can add enhancer to improve the quality of the generated video.

python inference.py --driven_audio <audio.wav> \
                    --source_image <video.mp4 or picture.png> \
                    --result_dir <a file to store results> \
                    --still \
                    --preprocess full \
                    --enhancer gfpgan 

Local Gradio demo.

A local gradio demo similar to our hugging-face demo can be run by:

## you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced.

python app.py

🛎 Citation

If you find our work useful in your research, please consider citing:

@article{zhang2022sadtalker,
  title={SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation},
  author={Zhang, Wenxuan and Cun, Xiaodong and Wang, Xuan and Zhang, Yong and Shen, Xi and Guo, Yu and Shan, Ying and Wang, Fei},
  journal={arXiv preprint arXiv:2211.12194},
  year={2022}
}

💗 Acknowledgements

Facerender code borrows heavily from zhanglonghao's reproduction of face-vid2vid and PIRender. We thank the authors for sharing their wonderful code. In training process, We also use the model from Deep3DFaceReconstruction and Wav2lip. We thank for their wonderful work.

🥂 Related Works

📢 Disclaimer

This is not an official product of Tencent. This repository can only be used for personal/research/non-commercial purposes.

LOGO: color and font suggestion: ChatGPT, logo font:Montserrat Alternates .

All the copyright of the demo images and audio are from communities users or the geneartion from stable diffusion. Free free to contact us if you feel uncomfortable.

About

[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 97.1%
  • Shell 1.5%
  • Jupyter Notebook 1.3%
  • Batchfile 0.1%