Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
configs		configs
datasets		datasets
lightning_modules		lightning_modules
models		models
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
custom_video.py		custom_video.py
environment.yml		environment.yml
generate.py		generate.py
smoothness_eval.py		smoothness_eval.py
train.py		train.py

Repository files navigation

Training code of Diffused Heads

Project | Paper | Demo

Here you can find information about training and evaluation of Diffused Heads. If you want to test our model on CREMA, please switch back to main.

Note: No checkpoints or datasets are provided. This code was roughly cleaned and can have bugs. Please raise an issue to open discussion on your problem. We apologize for the delay in publishing the code.

Checkpoints

CREMA checkpoint can be downloaded here. No LRW checkpoint will be provided due to the license.

Data

Alignment

Our model works best on videos with the same alignment. To prepare videos, please use face processor. You can experiment with different offset values.

Audio embeddings

Precompute audio embeddings for your dataset.

We worked with the one from SDA. You can use part of the demo code from main where a scripted checkpoint is provided.

You are free to use any suitable audio encoder. Perhaps a better (and easier) choice is Whisper Large. Remember to change the dimension of audio embeddings in the config file, if needed.

Folder structure

The provided dataset class works on predefined file_list.txt containing relative paths to video clips. Examples can be found in datasets/ The data folder should contain subfolders audio/ and video/ with separate audio and video files.

Scripts

To train the model, specify paths and parameters in ./configs/config.yaml.

python train.py

To generate multiple test videos, specify paths and parameters in ./configs/config_gen_test.yaml.

python generate.py

To generate a video from any image/video and audio, specify paths and parameters in ./configs/config_gen_custom.yaml.

python custom_video.py

Evaluation

The test splits for CREMA and LRW we used can be found in datasets/.

Metrics used:

FVD: Laughing Matters repo
FID: torchmetrics
Blinks/s and Blink duration: https://github.com/DinoMan/blink-detector
OFM and F-MSE: ./smoothness_eval.py
AV offset and AV Confidence: https://github.com/joonson/syncnet_python
WER: a pretrained lipreading model that we cannot share. You can use any available one.

W&B

Our code supports W&B login. We left the code in the main scripts commented.

Citation

@inproceedings{stypulkowski2024diffused,
  title={Diffused heads: Diffusion models beat gans on talking-face generation},
  author={Stypu{\l}kowski, Micha{\l} and Vougioukas, Konstantinos and He, Sen and Zi{\k{e}}ba, Maciej and Petridis, Stavros and Pantic, Maja},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={5091--5100},
  year={2024}
}

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Training code of Diffused Heads

Project | Paper | Demo

Checkpoints

Data

Alignment

Audio embeddings

Folder structure

Scripts

Evaluation

W&B

Citation

License

About

Releases

Packages

Languages

License

MStypulkowski/diffused-heads

Folders and files

Latest commit

History

Repository files navigation

Training code of Diffused Heads

Project | Paper | Demo

Checkpoints

Data

Alignment

Audio embeddings

Folder structure

Scripts

Evaluation

W&B

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages