Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
evaluation		evaluation
face_detection		face_detection
images		images
models		models
temp		temp
README.md		README.md
audio.py		audio.py
color_syncnet_train.py		color_syncnet_train.py
convertFPS.py		convertFPS.py
datagen.py		datagen.py
datagen_aug.py		datagen_aug.py
emotion_disc_train.py		emotion_disc_train.py
hparams.py		hparams.py
inference.py		inference.py
license		license
preprocess_crema-d.py		preprocess_crema-d.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Repository files navigation

Emotionally Enhanced Talking Face Generation

Results.mp4

This repository is the official PyTorch implementation of our paper: Emotionally Enhanced Talking Face Generation. We introduce a multimodal framework to generate lipsynced videos agnostic to any arbitrary identity, language, and emotion. Our proposed framework is equipped with a user-friendly web interface with a real-time experience for talking face generation with emotions.

📑 Original Paper	📰 Project Page	🌀 Demo	⚡ Live Testing
Paper	Project Page	Demo Video	Interactive Demo

Disclaimer

All results from this open-source code or our demo website should only be used for research/academic/personal purposes only.

Prerequisites

ffmpeg: sudo apt-get install ffmpeg
Install necessary packages using pip install -r requirements.txt.
Face detection pre-trained model should be downloaded to face_detection/detection/sfd/s3fd.pth. Alternative link if the above does not work.

Preparing CREMA-D for training

Download data

Download the data from this repo.

Convert videos to 25 fps

python convertFPS.py -i <raw_video_folder> -o <folder_to_save_25fps_videos>

Preprocess dataset

python preprocess_crema-d.py --data_root <folder_of_25fps_videos> --preprocessed_root preprocessed_dataset/

Train!

There are three major steps: (i) Train the expert lip-sync discriminator, (ii) Train the emotion discriminator (iii) Train the EmoGen model.

Training the expert discriminator

python color_syncnet_train.py --data_root preprocessed_dataset/ --checkpoint_dir <folder_to_save_checkpoints>

Training the emotion discriminator

python emotion_disc_train.py -i preprocessed_dataset/ -o <folder_to_save_checkpoints>

Training the main model

python train.py --data_root preprocessed_dataset/ --checkpoint_dir <folder_to_save_checkpoints> --syncnet_checkpoint_path <path_to_expert_disc_checkpoint> --emotion_disc_path <path_to_emotion_disc_checkpoint>

You can also set additional less commonly-used hyper-parameters at the bottom of the hparams.py file.

Inference

Comment these code lines for inference: line1 and line2.

python inference.py --checkpoint_path <ckpt> --face <video.mp4> --audio <an-audio-source> --emotion <categorical emotion>

The result is saved (by default) in results/{emotion}.mp4. You can specify it as an argument, similar to several other available options. The audio source can be any file supported by FFMPEG containing audio data: *.wav, *.mp3 or even a video file, from which the code will automatically extract the audio. Choose categorical emotion from this list: [HAP, SAD, FEA, ANG, DIS, NEU].

Tips for better results:

Experiment with the --pads argument to adjust the detected face bounding box. Often leads to improved results. You might need to increase the bottom padding to include the chin region. E.g. --pads 0 20 0 0.
If you see the mouth position dislocated or some weird artifacts such as two mouths, then it can be because of over-smoothing the face detections. Use the --nosmooth argument and give another try.
Experiment with the --resize_factor argument, to get a lower resolution video. Why? The models are trained on faces which were at a lower resolution. You might get better, visually pleasing results for 720p videos than for 1080p videos (in many cases, the latter works well too).

Evaluation

Please check the evaluation/ folder for the instructions.

License and Citation

Theis repository can only be used for personal/research/non-commercial purposes. Please cite the following paper if you use this repository:

@misc{goyal2023emotionally,
      title={Emotionally Enhanced Talking Face Generation}, 
      author={Sahil Goyal and Shagun Uppal and Sarthak Bhagat and Yi Yu and Yifang Yin and Rajiv Ratn Shah},
      year={2023},
      eprint={2303.11548},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgements

The code structure is inspired by Wav2Lip. We thank the authors for the wonderful code. The code for Face Detection has been taken from the face_alignment repository. We thank the authors for releasing their code and models. Demo website is developed by @ddhroov10 and @SakshatMali.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Emotionally Enhanced Talking Face Generation

Disclaimer

Prerequisites

Preparing CREMA-D for training

Download data

Convert videos to 25 fps

Preprocess dataset

Train!

Training the expert discriminator

Training the emotion discriminator

Training the main model

Inference

Tips for better results:

Evaluation

License and Citation

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

sahilg06/EmoGen

Folders and files

Latest commit

History

Repository files navigation

Emotionally Enhanced Talking Face Generation

Disclaimer

Prerequisites

Preparing CREMA-D for training

Download data

Convert videos to 25 fps

Preprocess dataset

Train!

Training the expert discriminator

Training the emotion discriminator

Training the main model

Inference

Tips for better results:

Evaluation

License and Citation

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages