Fork changes

This repository is a fork from the official FaceDiffuser repository to attempt some bug corrections and slight improvements to the code structure. Please check out the official repository at this link.

Installation

For python 3.9:

Install Pytorch 1.10.1+cu111 from PyTorch website.
Install requirements.txt with the following command:
```
  pip install -r requirements.txt
```
(Optional) Install ffmpeg if you want to render sequences

Usage

To simplify the reproduction of results, I create a main file that relies on a global config file which can call the training/testing/evaluating process. Here's how to use it:

    python ./main.py --script <name_of_script> --config <path_to_config_file>

For instance, if you want to train and evaluate a model you would use the following:

    python ./main.py --script train --config ./configs/vocaset.yaml
    python ./main.py --script test --config ./configs/vocaset.yaml
    python ./main.py --script evaluation.compute_objective_metrics --config ./configs/vocaset.yaml

Results

With the config files included in this repository, I achieved the following results:

Model	Trained	MVE $(\times 10^{-5})$ (↓)	LVE $(\times 10^{-5})$ (↓)	FDD $(\times 10^{-5})$ (↓)	ABS FDD $(\times 10^{-5})$ (↓)	Diversity $(\times 10^{-5})$ (↑)
Official weights (grabbed from the official repo)		104.90	7.8633	0.02171	0.02539	74.436
vocaset.yaml	✔️	103.04	7.6469	0.02083	0.02489	73.042

Table 1: Results obtained on the VOCASET dataset

Model	Trained	MVE $(\times 10^{-5})$ (↓)	LVE $(\times 10^{-5})$ (↓)	FDD $(\times 10^{-5})$ (↓)	ABS FDD $(\times 10^{-5})$ (↓)	Diversity $(\times 10^{-5})$ (↑)
Official weights (grabbed from the official repo)		6442.1	934.96	16.587	18.157	1548.1
multiface.yaml	✔️	9618.7	1728.7	-13.310	24.876	9447.7
multiface_no_conditioning.yaml	✔️	633.73	74.228	7.5171	8.6142	/

Table 2: Results obtained on the MULTIFACE dataset

Disclaimers

I've only tested the train/test and compute_objective_metrics. I can't guarantee that the proposed modifications didn't break the rest of the code. I also only tested the code with VOCASET and MULTIFACE datasets.

License

This project is subject to the original authors license.

FaceDiffuser (MIG '23)

Code repository for the implementation of: FaceDiffuser: Speech-Driven Facial Animation Synthesis Using Diffusion.

This GitHub repository contains PyTorch implementation of the work presented above. FaceDiffuser generates facial animations based on raw audio input of speech sequences. By employing the diffusion mechanism our model produces different results for every new inference.

We reccomend visiting the project website and watching the supplementary video.

Paper Project Website

Environment

Linux and Windows (tested on Windows 10 and 11)
Python 3.9+
PyTorch 1.10.1+cu111

Dependencies

ffmpeg
Check the required python packages and libraries in requirements.txt.
Install them by running the command: pip install -r requirements.txt

Data

BIWI

The Biwi 3D Audiovisual Corpus of Affective Communication dataset is available upon request for research or academic purposes.

BIWI Data Preparation and Data Pre-process

In the interest of fair comparison with previous works, BIWI dataset was prepared according to the data processing that was done in CodeTalker. Please follow this link and follow the instructions there to prepare the dataset. After processing, the *.npy files should be in data/BIWI/vertices_npy/ folder whereas the .wav files should be in data/BIWI/wav/ folder. This processing only prepares the emotional subset sequences. The results reported in the paper are based on this pre-processed data.

P.S.: FaceXHuBERT also provides a data processing workflow that processes the full BIWI dataset (including neutral and emotional sequences).

VOCASET

Download the training data from: https://voca.is.tue.mpg.de/download.php.

Place the downloaded files data_verts.npy, raw_audio_fixed.pkl, templates.pkl and subj_seq_to_idx.pkl in the folder data/vocaset/. Read the downloaded data and convert it to .npy and .wav format accepted by the model. Run the following instructions for this:

cd data/vocaset
python process_voca_data.py

Multiface

Download the Multiface dataset by following the instructions here: https://github.com/facebookresearch/multiface.

Keep in mind that only mesh and audio data is needed for training the model.

cd data/mutliface
python convert_topology.py
python preprocess.py

Beat

Download the Beat dataset from here: https://pantomatrix.github.io/BEAT/. Keep in mind that only the facial motion (stored in json files) and audio (stored in wav files) are needed for training the model.

Follow the instructions in data/beat for preprocessing the data before training.

Model Training

Training and Testing

Arguments	BIWI	VOCASET	Multiface	UUDaMM	BEAT
--dataset	BIWI	vocaset	multiface	damm	beat
--vertice_dim	70110	15069	18516	192	51
--output_fps	25	30	30	30	30

Train the model by running the following command:
```
 python main.py
```
The test split predicted results will be saved in the result/. The trained models (saves the model in 25 epoch interval) will be saved in the save/ folder.

Predictions

Download the trained weights from here and add them to the folder pretrained_models.
To generate predictions use the commands:

BIWI

python predict.py --dataset BIWI --vertice_dim 70110 --feature_dim 512 --output_fps 25 --train_subjects "F2 F3 F4 M3 M4 M5" --test_subjects "F2 F3 F4 M3 M4 M5" --model_name "pretrained_BIWI" --fps 25 --condition "F2" --subject "F2" --diff_steps 500 --gru_dim 512 --wav_path "test.wav"

Vocaset

python predict.py --dataset vocaset --vertice_dim 15069 --feature_dim 256 --output_fps 30 --train_subjects "FaceTalk_170728_03272_TA FaceTalk_170904_00128_TA FaceTalk_170725_00137_TA FaceTalk_170915_00223_TA FaceTalk_170811_03274_TA FaceTalk_170913_03279_TA FaceTalk_170904_03276_TA FaceTalk_170912_03278_TA" --test_subjects "FaceTalk_170809_00138_TA FaceTalk_170731_00024_TA" --model_name "pretrained_vocaset" --fps 30 --condition "FaceTalk_170728_03272_TA" --subject "FaceTalk_170731_00024_TA" --diff_steps 1000 --gru_dim 256 --wav_path "test.wav"

Multiface

python predict.py --dataset multiface --vertice_dim 18516 --feature_dim 256 --output_fps 30 --train_subjects "2 3 6 7 9 10 11 12 13" --test_subjects "1 4 5 8" --model_name "pretrained_multiface" --fps 30 --condition "2" --subject "1" --diff_steps 1000 --gru_dim 256 --wav_path "test.wav"

Visualization

Run the following command to render the predicted test sequences stored in result/:
```
python render_result.py
```
The rendered videos will be saved in the renders/videos/ folder.

Training and Testing on BEAT

To train the model on beat and obtain test results use the following command:

python .\main.py --dataset beat --train_subjects "2 4 6 8" --test_subject "2 4 6 8" --val_subjects "2 4 6 8" --vertice_dim 51 --gru_dim 256 --output_fps 30 --feature_dim 256

To visualise the results open the Blender project from data\beat\BeatVisualise.blend. In the Scripting tab modify the res_path variable value to the npy sequence you want to render.
To add the audio, open the Video Editor tab and drag and drop your audio.
To render use the button Render -> Render Animation or press CTRL + F12

Trained Weights

The trained weights can be downloaded from THIS link.

Acknowledgements

We borrow and adapt the code from FaceXHuBERT, MDM, EDGE, CodeTalker. Thanks for making their code available and facilitating future research. Additional thanks to huggingface-transformers for the implementation of HuBERT.

We are also grateful for the publicly available datasets used during this project:

ETHZ-CVL for providing the B3D(AC)2 dataset
MPI-IS for releasing the VOCASET dataset.
Facebook Research for realising the Multiface dataset.
Utrecht University for the UUDaMM dataset.
The authors of the BEAT dataset.

Any third-party packages are owned by their respective authors and must be used under their respective licenses.

License

This repository is released under CC-BY-NC-4.0-International License

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
configs		configs
data		data
diffusion		diffusion
evaluation		evaluation
hubert		hubert
static		static
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
data_loader.py		data_loader.py
index.html		index.html
main.py		main.py
modeling_hubert.py		modeling_hubert.py
models.py		models.py
predict.py		predict.py
render_result.py		render_result.py
replicate_predict.py		replicate_predict.py
requirements.txt		requirements.txt
test.py		test.py
test.wav		test.wav
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fork changes

Installation

Usage

Results

Disclaimers

License

FaceDiffuser (MIG '23)

Environment

Dependencies

Data

BIWI

BIWI Data Preparation and Data Pre-process

VOCASET

Multiface

Beat

Model Training

Training and Testing

Predictions

Visualization

Training and Testing on BEAT

Trained Weights

Acknowledgements

License

About

Uh oh!

Releases

Packages

Languages

License

EpiX-1/FaceDiffuser

Folders and files

Latest commit

History

Repository files navigation

Fork changes

Installation

Usage

Results

Disclaimers

License

FaceDiffuser (MIG '23)

Environment

Dependencies

Data

BIWI

BIWI Data Preparation and Data Pre-process

VOCASET

Multiface

Beat

Model Training

Training and Testing

Predictions

Visualization

Training and Testing on BEAT

Trained Weights

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages