Facial-Foundation-Model

MAE-DFER

✨ Overview

Dynamic Facial Expression Recognition (DFER) is facing supervised dillema. On the one hand, current efforts in DFER focus on developing various deep supervised models, but only achieving incremental progress which is mainly attributed to the longstanding lack of large-scale high-quality datasets. On the other hand, due to the ambiguity and subjectivity in facial expression perception, acquiring large-scale high-quality DFER samples is pretty time-consuming and labor-intensive. Considering that there are massive unlabeled facial videos on the Internet, this work aims to explore a new way (i.e., self-supervised learning) which can fully exploit large-scale unlabeled data to largely advance the development of DFER.

Overview of our MAE-DFER.

Inspired by recent success of VideoMAE, MAE-DFER makes an early attempt to devise a novel masked autoencoder based self-supervised framework for DFER. It improves VideoMAE by developing an efficient LGI-Former as the encoder and introducing joint masked appearance and motion modeling. With these two core designs, MAE-DFER largely reduces the computational cost (about 38% FLOPs) during fine-tuning while having comparable or even better performance.

The architecture of LGI-Former.

Extensive experiments on six DFER datasets show that our MAE-DFER consistently outperforms the previous best supervised methods by significant margins (+5∼8% UAR on three in-the-wild datasets and +7∼12% WAR on three lab-controlled datasets), which demonstrates that it can learn powerful dynamic facial representations for DFER via large-scale self-supervised pre-training. We believe MAE-DFER has paved a new way for the advancement of DFER and can inspire more relevant research in this field and even other related tasks (e.g., dynamic micro-expression recognition and facial action unit detection).

Evaluation

UAR （Unweighted Accuracy Rate） $\text{UAR} = \frac{1}{N} \sum_{i=1}^{N} \text{Accuracy}_i$
WAR （Weighted Accuracy Rate） $\text{WAR} = \sum_{i=1}^{N} \left( \frac{n_i}{N} \times \text{Accuracy}_i \right)$
- $n_i$ is the number of samples in class i
- $N$ is the total number of samples
- $Accuracy_i$ is the accuracy for class i

WAR is more common-used

🔨 Installation

The environment is tested with both python 3.8 and python 3.10.

conda create -n <your_env_name> python=3.10
pip install -r requirement.txt

Mamba installation:

Clone the VideoMamba repo:

git clone https://github.com/OpenGVLab/VideoMamba.git

Install its dependencies:

cd VideoMamba
pip install -e causal-conv1d
pip install -e mamba

➡️ Data Preparation

Please follow the files (e.g., dfew.py) in preprocess for data preparation.

Specifically, you need to enerate annotations for dataloader ("<path_to_video> <video_class>" in annotations). The annotation usually includes train.csv, val.csv and test.csv. The format of *.csv file is like:

dataset_root/video_1  label_1
dataset_root/video_2  label_2
dataset_root/video_3  label_3
...
dataset_root/video_N  label_N

An example of train.csv of DFEW fold1 (fd1) is shown as follows:

/mnt/data1/brain/AC/Dataset/DFEW/Clip/jpg_256/02522 5
/mnt/data1/brain/AC/Dataset/DFEW/Clip/jpg_256/02536 5
/mnt/data1/brain/AC/Dataset/DFEW/Clip/jpg_256/02578 6

Note that, label for the pre-training dataset (i.e., VoxCeleb2) is dummy label, you can simply use 0 (see voxceleb2.py).

🔄 Pre-training MAE-DFER

VoxCeleb2
```
python run_pretraining_with_yacs.py \
--config configs/voxceleb2_pretrain.yaml \
--output_dir output/voxceleb2_pretrain/ 
```
You can download our pre-trained model on VoxCeleb2 from here and put it into this folder.

Put the pre-train model at saved/model/pretraining/voxceleb2/videomae_pretrain_xxx for fine-tuning

⤴️ Fine-tuning with pre-trained models

DFEW

python run_finetuning_with_yacs.py \
--config configs/dfew_finetune.yaml \
--output_dir output/dfew_finetune/ \

FERV39k

Dataset not available yet.

MAFW

python run_finetuning_with_yacs.py \
--config configs/mafw_finetune.yaml \
--output_dir output/mafw_finetune/

↕️ Demo

Not available yet.

To finetune the model for Gaze 360 dataset

Dataset preparation

Download the Gaze 360 dataset from Gaze 360 to the current folder.
Run the preprocess/data_prepocessing_gaze360.py script to normalize the dataset and labels.
Run the preprocess/preprocess_gaze360.py to align the dataset labels, which will be generated to saved/data/gaze360/.

Relabel the Emotion datasets With Gaze

For a emotion dataset, we need $4$ steps to relabel it to a dataset with gaze.

Step One

Transfer the csvs from videos representation to frames representation. i.e,

Original csv is like, an example is dfew_224

dataset_root/video_1  label_1
dataset_root/video_2  label_2
...

After step one, it should be

dataset_root/video_1/00001  label_1
dataset_root/video_1/00002  label_1
...
dataset_root/video_2/00001  label_2
...

Step Two

Split the whole csv into several csvs by their video, you can check this script

After step two, it should be like gaze360T

- test
----- test_00000_0.csv
-------- dataset_root/video_1/00001  label_1
-------- dataset_root/video_1/00002  label_1
-------- ...
----- test_00000_1.csv
-------- dataset_root/video_1/00101  label_1
-------- dataset_root/video_1/00102  label_1
-------- ...
...
- train
...

Step Three

Run relbl_with_gaze.py to relabel the emotion datasets with gaze information, you should check the paths in the file, and then get a csv like:

dataset_root/video_1  label_1 pitch yaw
dataset_root/video_2  label_2 pitch yaw
...

Step Four

Repeat step two to split this csv into several ones by video. A final example is dfew_combine.

Run relabeled Training

python run_finetuning_with_yacs.py --config configs/dfew_combine.yaml --output_dir output/dfew_combine/

You should note these configs:

data:
  num_classes_cls: 7 # Number of classes for classification
  num_dim_reg: 2 # Number of dimensions for regression

training:
  combine_loss_alpha : 0.0  # Weight for classification loss in combined loss

Script

python run_finetuning_with_yacs.py  \
 --config configs/gaze360_finetune.yaml  \
 --output_dir output/gaze360_finetune/

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
configs		configs
docs		docs
preprocess		preprocess
saved		saved
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Updates.md		Updates.md
demo.py		demo.py
environment.yml		environment.yml
relbl_with_gaze.py		relbl_with_gaze.py
requirements.txt		requirements.txt
run_finetuning_with_yacs.py		run_finetuning_with_yacs.py
run_pretraining_with_yacs.py		run_pretraining_with_yacs.py
unit_test.py		unit_test.py
visualize.ipynb		visualize.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Facial-Foundation-Model

MAE-DFER

✨ Overview

Evaluation

🔨 Installation

Mamba installation:

➡️ Data Preparation

🔄 Pre-training MAE-DFER

⤴️ Fine-tuning with pre-trained models

↕️ Demo

To finetune the model for Gaze 360 dataset

Dataset preparation

Relabel the Emotion datasets With Gaze

Step One

Step Two

Step Three

Step Four

Run relabeled Training

Script

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

thuhci/Facial-Foundation-Model

Folders and files

Latest commit

History

Repository files navigation

Facial-Foundation-Model

MAE-DFER

✨ Overview

Evaluation

🔨 Installation

Mamba installation:

➡️ Data Preparation

🔄 Pre-training MAE-DFER

⤴️ Fine-tuning with pre-trained models

↕️ Demo

To finetune the model for Gaze 360 dataset

Dataset preparation

Relabel the Emotion datasets With Gaze

Step One

Step Two

Step Three

Step Four

Run relabeled Training

Script

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages