Introduction | Preparation | Get Started | Paper |
Face Forgery Video Detection (FFVD) is a critical yet challenging task in determining whether a digital facial video is authentic or forged. Existing FFVD methods typically focus on isolated spatial or coarsely fused spatiotemporal information, failing to leverage temporal forgery cues thus resulting in unsatisfactory performance. We strive to unravel these cues across three progressive levels: momentary anomaly, gradual inconsistency, and cumulative distortion. Accordingly, we design a consecutive correlate module to capture momentary anomaly cues by correlating interactions among consecutive frames. Then, we devise a future guide module to unravel inconsistency cues by iteratively aggregating historical anomaly cues and gradually propagating them into future frames. Finally, we introduce a historical review module that unravels distortion cues via momentum accumulation from future to historical frames. These three modules form our Temporal Forgery Cue Unraveling (TFCU) framework, sequentially highlighting spatial discriminative features by unraveling temporal forgery cues bidirectionally between historical and future frames. Extensive experiments and ablation studies demonstrate the effectiveness of our TFCU method, achieving state-of-the-art performance across diverse unseen datasets and manipulation methods.
This project is implemented with Python version >= 3.10 and CUDA version >= 11.3.
It is recommended to follow the steps below to configure the environment:
conda create -n tfcu python=3.10
conda activate tfcu
pip install torch==1.13.0+cu116 torchvision==0.14.0+cu116 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
Before training, follow the steps below to prepare the data:
- 
Download datasets. 
- 
Frame and Landmarks Extraction: Extract frames and landmarks from the video files. 
- 
Face Alignment and Cropping: Referring to the FTCN, RetinaFace was chosen for facial recognition, followed by cropping and alignment procedures. When multiple faces appear in the video, tracking the face with the longest appearance time for preservation. 
Download weights from Baidu Cloud(code: ffvd) and put it into 'checkpoints/Final_TFCU_Model/ckpt'.
Infer a single video: Run the python Inference_demo.py.
Download weights from Baidu Cloud(code: ffvd) and put it into 'checkpoints/Final_TFCU_Model/ckpt' . Then run:
bash test.sh 0 1 12345 checkpoints/Final_TFCU_Model/video_level_c_lm.yaml
| Celeb-DF | DFDC | FFIW | Checkpoints | |
|---|---|---|---|---|
| Ours | 93.18% | 86.05% | 91.27% | Baidu(code: ffvd) | 
@InProceedings{Guo_2025_CVPR,
    author    = {Guo, Zonghui and Liu, Yingjie and Zhang, Jie and Zheng, Haiyong and Shan, Shiguang},
    title     = {Face Forgery Video Detection via Temporal Forgery Cue Unraveling},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {7396-7405}
}