[ICCV2025] [FakeSTormer] Vulnerability-Aware Spatio-Temporal Learning for Generalizable Deepfake Video Detection

This is an official implementation of FakeSTormer! [📜Paper]

Updates

Comming soon: Release of code and pretrained weights ⏳.
08/07/2025: First version pre-released for this open source code 🌱.
26/06/2025: FakeSTormer has been accepted to ICCV2025 🎉.

Abstract

Detecting deepfake videos is highly challenging given the complexity of characterizing spatio-temporal artifacts. Most existing methods rely on binary classifiers trained using real and fake image sequences, therefore hindering their generalization capabilities to unseen generation methods. Moreover, with the constant progress in generative Artificial Intelligence (AI), deepfake artifacts are becoming imperceptible at both the spatial and the temporal levels, making them extremely difficult to capture. To address these issues, we propose a fine-grained deepfake video detection approach called FakeSTormer that enforces the modeling of subtle spatio-temporal inconsistencies while avoiding overfitting. Specifically, we introduce a multi-task learning framework that incorporates two auxiliary branches for explicitly attending artifact-prone spatial and temporal regions. Additionally, we propose a video-level data synthesis strategy that generates pseudo-fake videos with subtle spatio-temporal artifacts, providing high-quality samples and hand-free annotations for our additional branches. Extensive experiments on several challenging benchmarks demonstrate the superiority of our approach compared to recent state-of-the-art methods.

Main Results

Results on 6 datasets (CDF2, DFW, DFD, DFDC, DFDCP, and DiffSwap) under cross-dataset evaluation setting reported by AUC (%) at video-level.

CDF2

DFW

DFD

DFDC

DFDCP

DiffSwap

Compression
c23
c0

AUC
92.4
96.5

AUC
74.2
76.3

AUC
98.5
98.9

AUC
74.6
77.6

AUC
90.0
94.1

AUC
96.9
97.7

Recommended Environment

For experiment purposes, we encourage the installment of the following libraries. Both Conda or Python virtual env should work.

CUDA: 11.4
Python: >= 3.8.x
PyTorch: 1.8.0
TensorboardX: 2.5.1
ImgAug: 0.4.0
Scikit-image: 0.17.2
Torchvision: 0.9.0
Albumentations: 1.1.0
mmcv: 1.6.1
natsort: 8.4.0

Pre-trained Models

📌 The pre-trained weights will be released soon!

Docker Build (Optional)

We further provide an optional Docker file that can be used to build a working env with Docker. More detailed steps can be found here.

Install docker to the system (skip the step if docker has already been installed):
```
sudo apt install docker
```
To start your docker environment, please go to the folder dockerfiles:
```
cd dockerfiles
```
Create a docker image (you can put any name you want):
```
docker build --tag 'fakestormer' .
```

Quickstart

Preparation

Prepare environment

Installing main packages as the recommended environment. Note that we recommend building mmcv from source as below.

git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
git checkout v1.6.1
MMCV_WITH_OPS=1 pip install -e .

Prepare dataset

Downloading FF++ Original dataset for training data preparation. Following the original split convention, it is firstly used to randomly extract frames and facial crops:

python package_utils/images_crop.py -d {dataset} \
-c {compression} \
-n {num_frames} \
-t {task}

(This script can also be utilized for cropping faces in other datasets such as CDF2, DFD, DFDCP, DFDC for cross-evaluation test. You do not need to run crop for DFW as the data is already preprocessed).

Parameter	Value	Definition
-d	Subfolder in each dataset. For example: ['Face2Face','Deepfakes','FaceSwap','NeuralTextures', ...]	You can use one of those datasets.
-c	['raw','c23','c40']	You can use one of those compressions
-n	256	Number of frames (default 32 for val/test and 256 for train)
-t	['train', 'val', 'test']	Default train

These faces cropped are saved for online pseudo-fake generation in the training process, following the data structure below:

ROOT = '/data/deepfake_cluster/datasets_df'
└── Celeb-DFv2
    └──...
└── FF++
    └── c0
    └── c23
        ├── test
        │   └── videos
        │       └── Deepfakes
        |           ├── 000_003
        |           ├── 044_945
        |           ├── 138_142
        |           ├── ...
        │       ├── Face2Face
        │       ├── FaceSwap
        │       ├── NeuralTextures
        │       └── original
        |   └── frames
        ├── train
        │   └── videos
        │       └── aligned
        |           ├── 001
        |           ├── 002
        |           ├── ...  
        │       └── original
        |           ├── 001
        |           ├── 002
        |           ├── ...
        |   └── frames
        └── val
            └── videos
                ├── aligned
                └── original
            └── frames
    └── c40

Downloading Dlib [81] facial landmarks detector pretrained and place into /pretrained/ for SBI synthesis.
Landmarks detection. After completing the following script running, a file that stores metadata information of the data is saved at processed_data/c23/{SPLIT}_FaceForensics_videos_<n_landmarks>.json.
```
python package_utils/geo_landmarks_extraction.py \
--config configs/data_preprocessing_c23.yaml \
--extract_landmarks
```

Training script

We offer a number of config files for different compression levels of training data. For c23, opening configs/temporal/FakeSTormer_base_c23.yaml, please make sure you set TRAIN: True and FROM_FILE: True and run:
```
.scripts/fakestormer_sbi.sh
```
Otherwise, with [c0, c40], the config file is configs/temporal/FakeSTormer_base_[c0, c40].yaml. You can also find other configs for other network architectures in the configs/ folder.
Testing script

Opening configs/temporal/FakeSTormer_base_c23.yaml, with subtask: eval in the test section, we support evaluation mode, please turn off TRAIN: False and FROM_FILE: False and run:
```
.scripts/test_fakestormer.sh
```
For others (.e.g., data compression levels, network architectures), please change the path of the coressponding config file.

⚠️ Please make sure you set the correct path to your download pre-trained weights in the config files.

ℹ️ Flip test can be used by setting flip_test: True

ℹ️ The mode for single video inference is also provided, please set sub_task: test_vid and pass a video path as an argument in test.py

Contact

Please contact dat.nguyen@uni.lu. Any questions or discussions are welcomed!

License

Acknowledge

We acknowledge the excellent implementation from OpenMMLab (mmengine, mmcv), SBI, and LAA-Net.

Citation

Please kindly consider citing our papers in your publications.

@article{nguyen2025vulnerability,
  title={Vulnerability-Aware Spatio-Temporal Learning for Generalizable and Interpretable Deepfake Video Detection},
  author={Nguyen, Dat and Astrid, Marcella and Kacem, Anis and Ghorbel, Enjie and Aouada, Djamila},
  journal={arXiv preprint arXiv:2501.01184},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
demo		demo
dockerfiles		dockerfiles
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[ICCV2025] [FakeSTormer] Vulnerability-Aware Spatio-Temporal Learning for Generalizable Deepfake Video Detection

Updates

Abstract

Main Results

Recommended Environment

Pre-trained Models

Docker Build (Optional)

Quickstart

Contact

License

Acknowledge

Citation

About

Uh oh!

Languages

License

10Ring/FakeSTormer

Folders and files

Latest commit

History

Repository files navigation

[ICCV2025] [FakeSTormer] Vulnerability-Aware Spatio-Temporal Learning for Generalizable Deepfake Video Detection

Updates

Abstract

Main Results

Recommended Environment

Pre-trained Models

Docker Build (Optional)

Quickstart

Contact

License

Acknowledge

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages