Skip to content

Official repository for Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation

License

Notifications You must be signed in to change notification settings

MStypulkowski/diffused-heads

 
 

Repository files navigation

Training code of Diffused Heads

Here you can find information about training and evaluation of Diffused Heads. If you want to test our model on CREMA, please switch back to main.

Note: No checkpoints or datasets are provided. This code was roughly cleaned and can have bugs. Please raise an issue to open discussion on your problem. We apologize for the delay in publishing the code.

Checkpoints

CREMA checkpoint can be downloaded here. No LRW checkpoint will be provided due to the license.

Data

Alignment

Our model works best on videos with the same alignment. To prepare videos, please use face processor. You can experiment with different offset values.

Audio embeddings

Precompute audio embeddings for your dataset.

We worked with the one from SDA. You can use part of the demo code from main where a scripted checkpoint is provided.

You are free to use any suitable audio encoder. Perhaps a better (and easier) choice is Whisper Large. Remember to change the dimension of audio embeddings in the config file, if needed.

Folder structure

The provided dataset class works on predefined file_list.txt containing relative paths to video clips. Examples can be found in datasets/ The data folder should contain subfolders audio/ and video/ with separate audio and video files.

Scripts

To train the model, specify paths and parameters in ./configs/config.yaml.

python train.py

To generate multiple test videos, specify paths and parameters in ./configs/config_gen_test.yaml.

python generate.py

To generate a video from any image/video and audio, specify paths and parameters in ./configs/config_gen_custom.yaml.

python custom_video.py

Evaluation

The test splits for CREMA and LRW we used can be found in datasets/.

Metrics used:

W&B

Our code supports W&B login. We left the code in the main scripts commented.

Citation

@inproceedings{stypulkowski2024diffused,
  title={Diffused heads: Diffusion models beat gans on talking-face generation},
  author={Stypu{\l}kowski, Micha{\l} and Vougioukas, Konstantinos and He, Sen and Zi{\k{e}}ba, Maciej and Petridis, Stavros and Pantic, Maja},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={5091--5100},
  year={2024}
}

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

CC BY-NC-SA 4.0

About

Official repository for Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages