Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated README.md, made it more precise, engaging, fun, short and emojified! ✨ #126

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 50 additions & 55 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,48 +1,45 @@
<div align="center">

<h1>🎬 VideoReTalking: <br/> Audio-based Lip Synchronization for Talking Head Video Magic! 🪄</h1>

<h2>VideoReTalking <br/> <span style="font-size:12px">Audio-based Lip Synchronization for Talking Head Video Editing In the Wild</span> </h2>

<a href='https://arxiv.org/abs/2211.14758'><img src='https://img.shields.io/badge/ArXiv-2211.14758-red'></a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href='https://vinthony.github.io/video-retalking/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vinthony/video-retalking/blob/main/quick_demo.ipynb)

<a href='https://arxiv.org/abs/2211.14758'><img src='https://img.shields.io/badge/📚 ArXiv-2211.14758-red'></a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href='https://vinthony.github.io/video-retalking/'><img src='https://img.shields.io/badge/🚀 Project Page-Green'></a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vinthony/video-retalking/blob/main/quick_demo.ipynb)

<div>
<a target='_blank'>Kun Cheng <sup>*,1,2</sup> </a>&emsp;
<a href='https://vinthony.github.io/' target='_blank'>Xiaodong Cun <sup>*,2</a>&emsp;
<a href='https://yzhang2016.github.io/yongnorriszhang.github.io/' target='_blank'>Yong Zhang <sup>2</sup></a>&emsp;
<a href='https://menghanxia.github.io/' target='_blank'>Menghan Xia <sup>2</sup></a>&emsp;
<a href='https://feiiyin.github.io/' target='_blank'>Fei Yin <sup>2,3</sup></a>&emsp;<br/>
<a href='https://web.xidian.edu.cn/mrzhu/en/index.html' target='_blank'>Mingrui Zhu <sup>1</sup></a>&emsp;
<a href='https://xuanwangvc.github.io/' target='_blank'>Xuan Wang <sup>2</sup></a>&emsp;
<a href='https://juewang725.github.io/' target='_blank'>Jue Wang <sup>2</sup></a>&emsp;
<a href='https://web.xidian.edu.cn/nnwang/en/index.html' target='_blank'>Nannan Wang <sup>1</sup></a>
<a target='_blank'>🌟 Kun Cheng <sup>*,1,2</sup> </a>&emsp;
<a href='https://vinthony.github.io/' target='_blank'>🎩 Xiaodong Cun <sup>*,2</a>&emsp;
<a href='https://yzhang2016.github.io/yongnorriszhang.github.io/' target='_blank'>👾 Yong Zhang <sup>2</sup></a>&emsp;
<a href='https://menghanxia.github.io/' target='_blank'>🌠 Menghan Xia <sup>2</sup></a>&emsp;
<a href='https://feiiyin.github.io/' target='_blank'>🎨 Fei Yin <sup>2,3</sup></a>&emsp;<br/>
<a href='https://web.xidian.edu.cn/mrzhu/en/index.html' target='_blank'>🔮 Mingrui Zhu <sup>1</sup></a>&emsp;
<a href='https://xuanwangvc.github.io/' target='_blank'>🧙‍♂️ Xuan Wang <sup>2</sup></a>&emsp;
<a href='https://juewang725.github.io/' target='_blank'>🌟 Jue Wang <sup>2</sup></a>&emsp;
<a href='https://web.xidian.edu.cn/nnwang/en/index.html' target='_blank'>🪄 Nannan Wang <sup>1</sup></a>
</div>
<br>
<div>
<sup>1</sup> Xidian University &emsp; <sup>2</sup> Tencent AI Lab &emsp; <sup>3</sup> Tsinghua University
</div>
<br>
<i><strong><a href='https://sa2022.siggraph.org/' target='_blank'>SIGGRAPH Asia 2022 Conference Track</a></strong></i>
<i><strong><a href='https://sa2022.siggraph.org/' target='_blank'>🌟 SIGGRAPH Asia 2022 Conference Track</a></strong></i>
<br>
<br>
<img src="./docs/static/images/teaser.png?raw=true" width="768px">

<div align="justify"> We present VideoReTalking, a new system to edit the faces of a real-world talking head video according to input audio, producing a high-quality and lip-syncing output video even with a different emotion. Our system disentangles this objective into three sequential tasks: (1) face video generation with a canonical expression; (2) audio-driven lip-sync; and (3) face enhancement for improving photo-realism. Given a talking-head video, we first modify the expression of each frame according to the same expression template using the expression editing network, resulting in a video with the canonical expression. This video, together with the given audio, is then fed into the lip-sync network to generate a lip-syncing video. Finally, we improve the photo-realism of the synthesized faces through an identity-aware face enhancement network and post-processing. We use learning-based approaches for all three steps and all our modules can be tackled in a sequential pipeline without any user intervention.</div>
<div align="justify">Prepare to be amazed! 🤩 Introducing VideoReTalking, a groundbreaking system that lets you edit the faces in talking head videos, all driven by audio input. It produces high-quality, lip-synced videos, even with different emotions. Our system splits this into three enchanting tasks: (1) Face video creation with a standard expression; (2) Audio-driven lip-sync; and (3) Face enhancement for mind-blowing realism. These magic tricks are powered by learning-based approaches and can be seamlessly combined in a magical sequence without any user intervention! ✨</div>
<br>

<p>
<img alt='pipeline' src="./docs/static/images/pipeline.png?raw=true" width="768px"><br>
<em align='center'>Pipeline</em>
<em align='center'>Witness the Magic! ✨</em>
</p>

</div>

## Results in the Wild (contains audio
## Results in the Wild 🌍 (includes audio)
https://user-images.githubusercontent.com/4397546/224310754-665eb2dd-aadc-47dc-b1f9-2029a937b20a.mp4

## Setup Your Wizardry Workstation 🧙‍♂️



## Environment
```
git clone https://github.com/vinthony/video-retalking.git
cd video-retalking
Expand All @@ -51,23 +48,24 @@ conda activate video_retalking

conda install ffmpeg

# Please follow the instructions from https://pytorch.org/get-started/previous-versions/
# This installation command only works on CUDA 11.1
# Dive into the magic realm of PyTorch! 🔮
# Follow the magical instructions at https://pytorch.org/get-started/previous-versions/
# This spell only works with CUDA 11.1 🪄
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html

pip install -r requirements.txt
```

## Quick Inference
## Quick Inference - Instant Wizardry! 🪄

#### Pretrained Models
Please download our [pre-trained models](https://drive.google.com/drive/folders/18rhjMpxK8LVVxf7PI6XwOidt8Vouv_H0?usp=share_link) and put them in `./checkpoints`.
#### Unleash the Power of Pretrained Models 🌟
Please download our [pre-trained models](https://drive.google.com/drive/folders/18rhjMpxK8LVVxf7PI6XwOidt8Vouv_H0?usp=share_link) and place them in `./checkpoints`.

<!-- We also provide some [example videos and audio](https://drive.google.com/drive/folders/14OwbNGDCAMPPdY-l_xO1axpUjkPxI9Dv?usp=share_link). Please put them in `./examples`. -->

#### Inference
#### Cast Your Spell

```
```shell
python3 inference.py \
--face examples/face/1.mp4 \
--audio examples/audio/1.wav \
Expand All @@ -79,55 +77,52 @@ You can also control the expression by adding the following parameters:

```--exp_img```: Pre-defined expression template. The default is "neutral". You can choose "smile" or an image path.

```--up_face```: You can choose "surprise" or "angry" to modify the expression of upper face with [GANimation](https://github.com/donydchen/ganimation_replicate).

```--up_face```: You can choose "surprise" or "angry" to modify the expression of the upper face with [GANimation](https://github.com/donydchen/ganimation_replicate).

## Magical Incantation 🪄

## Citation

If you find our work useful in your research, please consider citing:
If you find our wizardry useful in your magical research, please consider citing:

```
@misc{cheng2022videoretalking,
title={VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild},
title={VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Magic! 🪄},
author={Kun Cheng and Xiaodong Cun and Yong Zhang and Menghan Xia and Fei Yin and Mingrui Zhu and Xuan Wang and Jue Wang and Nannan Wang},


year={2022},
eprint={2211.14758},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```

## Acknowledgement
Thanks to
[Wav2Lip](https://github.com/Rudrabha/Wav2Lip),
[PIRenderer](https://github.com/RenYurui/PIRender),
[GFP-GAN](https://github.com/TencentARC/GFPGAN),
[GPEN](https://github.com/yangxy/GPEN),
[ganimation_replicate](https://github.com/donydchen/ganimation_replicate),
[STIT](https://github.com/rotemtzaban/STIT)
for sharing their code.
## Wizard's Gratitude 🙏

Special thanks to these sorcerers and enchanters for sharing their magic spells:

- [Wav2Lip](https://github.com/Rudrabha/Wav2Lip)
- [PIRenderer](https://github.com/RenYurui/PIRender)
- [GFP-GAN](https://github.com/TencentARC/GFPGAN)
- [GPEN](https://github.com/yangxy/GPEN)
- [ganimation_replicate](https://github.com/donydchen/ganimation_replicate)
- [STIT](https://github.com/rotemtzaban/STIT)

## Explore More Magical Realms 🌌

## Related Work
- [StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN (ECCV 2022)](https://github.com/FeiiYin/StyleHEAT)
- [CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior (CVPR 2023)](https://github.com/Doubiiu/CodeTalker)
- [SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation (CVPR 2023)](https://github.com/Winfredy/SadTalker)
- [DPE: Disentanglement of Pose and Expression for General Video Portrait Editing (CVPR 2023)](https://github.com/Carlyx/DPE)
- [3D GAN Inversion with Facial Symmetry Prior (CVPR 2023)](https://github.com/FeiiYin/SPI/)
- [T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations (CVPR 2023)](https://github.com/Mael-zys/T2M-GPT)

## Disclaimer

This is not an official product of Tencent.
## Disclaimer - Our Magic Spell! 🪄

```
1. Please carefully read and comply with the open-source license applicable to this code before using it.
2. Please carefully read and comply with the intellectual property declaration applicable to this code before using it.
3. This open-source code runs completely offline and does not collect any personal information or other data. If you use this code to provide services to end-users and collect related data, please take necessary compliance measures according to applicable laws and regulations (such as publishing privacy policies, adopting necessary data security strategies, etc.). If the collected data involves personal information, user consent must be obtained (if applicable). Any legal liabilities arising from this are unrelated to Tencent.
4. Without Tencent's written permission, you are not authorized to use the names or logos legally owned by Tencent, such as "Tencent." Otherwise, you may be liable for legal responsibilities.
5. This open-source code does not have the ability to directly provide services to end-users. If you need to use this code for further model training or demos, as part of your product to provide services to end-users, or for similar use, please comply with applicable laws and regulations for your product or service. Any legal liabilities arising from this are unrelated to Tencent.
6. It is prohibited to use this open-source code for activities that harm the legitimate rights and interests of others (including but not limited to fraud, deception, infringement of others' portrait rights, reputation rights, etc.), or other behaviors that violate applicable laws and regulations or go against social ethics and good customs (including providing incorrect or false information, spreading pornographic, terrorist, and violent information, etc.). Otherwise, you may be liable for legal responsibilities.

```
1. Please respect the wizard's code, abide by the open-source license before using our magic.
2. Follow the intellectual property declaration - stealing magic is frowned upon.
3. Our magic works offline and doesn't collect personal information. If you use it for your spells, follow the law!
4. Don't misuse names and logos owned by Tencent or you'll have to deal with the magical consequences.
5. Use our magic responsibly and ethically. Don't use it for dark deeds.
6. Keep the magic alive! Don't spread misinformation, or engage in harmful activities.

</div>