Eleanor Tursman,
Marilyn George,
Seny Kamara,
James Tompkin
Brown University
Media Forensics CVPR Workshop 2020
If you find our work useful for your research, please cite:
@InProceedings{Tursman_2020_CVPR_Workshops,
author = {Tursman, Eleanor and George, Marilyn and Kamara, Seny and Tompkin, James},
title = {Towards Untrusted Social Video Verification to Combat Deepfakes via Face Geometry Consistency},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2020}
}
The functions cpca
, screeplot
, mahalanobis
, kernelEVD
, greatsort
, and classSVD
are from the LIBRA toolbox, which we pulled from their repo at commit 2e1c400e953caf3daa3037305cbd5df09b14bde3 in Feb. 2020, as described in the following papers:
@article{verboven2005libra,
title={LIBRA: a MATLAB library for robust analysis},
author={Verboven, Sabine and Hubert, Mia},
journal={Chemometrics and intelligent laboratory systems},
volume={75},
number={2},
pages={127--136},
year={2005},
publisher={Elsevier}
}
@article{verboven2010matlab,
title={Matlab library LIBRA},
author={Verboven, Sabine and Hubert, Mia},
journal={Wiley Interdisciplinary Reviews: Computational Statistics},
volume={2},
number={4},
pages={509--515},
year={2010},
publisher={Wiley Online Library}
}
LipGAN is described in the following paper:
@inproceedings{KR:2019:TAF:3343031.3351066,
author = {K R, Prajwal and Mukhopadhyay, Rudrabha and Philip, Jerin and Jha, Abhishek and Namboodiri, Vinay and Jawahar, C V},
title = {Towards Automatic Face-to-Face Translation},
booktitle = {Proceedings of the 27th ACM International Conference on Multimedia},
series = {MM '19},
year = {2019},
isbn = {978-1-4503-6889-6},
location = {Nice, France},
= {1428--1436},
numpages = {9},
url = {http://doi.acm.org/10.1145/3343031.3351066},
doi = {10.1145/3343031.3351066},
acmid = {3351066},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {cross-language talking face generation, lip synthesis, neural machine translation, speech to speech translation, translation systems, voice transfer},
}
This section will explain how to process your own data with our scripts. If you'd like to use our data, which includes the raw video and the post-processed landmark matrices, please see Downloading our Dataset.
- You will need to download the dlib face detector and put it in
./Data-Processing/
before running the bash script. You can download it here. - You will also need to install FAN, by running
pip install face-alignment
. - You will need to install the requirements for LipGAN. Since it is under an MIT License, I have included the code in this repo, minus their models, which you will need to download yourself and place in
./LipGAN/logs/
. I made modifications to theirbatch_inference.py
script so that it would use our pre-computed bounding boxes, and to add a little blending to the final result to make it look a bit nicer. We use the LipGAN repo's commit 03b540c68aa5ab871baa4e64f6ade6736131b1b9, which we pulled Feb 11th, 2020.
If you are using your own data, the format the experiment code expects is a struct with entries cam1
,cam2
,cam3
,cam4
,cam5
,cam6
, and fake
, where the fake is a manipulated version of one of the real cameras. Each field contains an f x 40
matrix, where there are f
frames, and each row contains mouth landmarks [x_1, y_1, x_2, y_2, ..., x_20, y_20]
(though you may use any subset of landmarks you like--- the 20 mouth landmarks are what we used for all our experiments). Each f x 40
matrix should be normalized by subtracting the mean and dividing by the standard deviation. We generate our .mat files using ./Data-Processing/landmark-npy-to-mat.py
. Here is an example of how to run that script to generate one of the mat files:
python3 landmark-npy-to-mat.py 4 ID6 /path/to/dataset/
where the fake camera is camera four, and we're looking at person six from the dataset.
The process for pre-processing the data is outlined in data-pipeline.bash
. To run it:
bash -i data-pipeline <path-to-video-folder> <path-to-this-script-folder> <audio filename for lipgan>
.
The bash script's steps are as follows:
- Convert the video to frames using ffmpeg.
- Get bounding box coordinates for the face in each frame using dlib. Save the results to a txt file. If you want to save the cropped images proper, swap in
saveCrop = True
on line 62 ofcnn_face_detector.py
. Dlib needs to be enabled with CUDA support for this to run reasonably fast. - Run 2D landmark detection using FAN given the saved bounding boxes, and save npy files with 68 landmarks per frame.
- Create a visualization of the landmarks on top of every camera view, to verify everything has worked properly.
- Run LipGAN to create a fake for each input camera, using the audio you specify. The audio must be pre-processed as directed in LipGAN using their matlab scripts before this step, and placed in
./LipGAN/audio/
. (It isn't automated in the bash script.) The bash script expects that you've made a conda environment called 'lipgan' for running lipgan. - Processes the lipgan fakes as in steps 1-3.
Before running experimental code, you will have to go into the Matlab function clusterdata.m
, and change line one from function [T] = clusterdata(X, varargin)
to function [T,Z] = clusterdata(X, varargin)
. Z is the actual tree which we will cluster ourselves.
To generate the results, make sure you populate your Data file with the normalized landmark matrices. Then run the script ./Experiments/full_sequence_accuracy_experiment.m
.
To generate the results, make sure you populate your Data file with the normalized landmark matrices. Then pick your method on line 10 and run the script ./Experiments/windowed_accuracy_experiment.m
.
To re-create the accuracy and ROC plots of Fig. 5, set accOn = true;
and rocOn = true;
in Experiments/make_plots.m
. In the paper, we display results for the DWT baseline and for our method. To re-create the histograms in Fig. 6, set histogramOn = true;
. In the paper, we display results for our method.
For output created using:
- The simple mouth baseline, use
datasetName = 'simpleMouth';
- The DWT baseline, use
datasetName = noPCA;
- The PCA method, use
datasetName = onlyPCA;
The numbers for Table 1 were generated by looking at the numerical output for the mean accuracy and ROC curve numbers.
This experiment tests whether all sets of cameras separated by the same real-world angular distance are more similar to one another than to a fake video. To run, run Experiments/angular_experiment.m
. If you do not want to use Matlab's CPU threading, edit line LINE from parfor t=1:length(threshes)
to for t=1:length(threshes)
.
To visualize your results (re-creating Fig. 7), set angle = true;
, all other flags to false, and datasetName = angle;
in ./Experiments/make_plots.m
, then run the script.
Script to re-create Fig. 4 is ./Experiments/3d_model_experiment.py
. It expects the parameter data to be structured like in our dataset, where FLAME and 3DMM parameters are stored in npy files. An example of running the code:
python3 3d_model_experiment.py 3 ID5 /path/to/dataset/
where the fake camera is camera three, and we're looking at person five from the dataset.
Our video data and post-processed landmarks are available via the Brown library.
To reproduce our results, please use the post-processed landmark .mat files with the experimental scripts on Github. Once the .zip is downloaded, extract the contents of Dataset/Processed-Results/
into ./Experiments/Data/
from the codebase root directory.
All original work is under the GNU General Public License v3.0. LipGAN is under the MIT License, and the LIBRA MATLAB functions are under the academic license specified by LIBRA_LICENSE.
- 2020 August 26th: Moved document downloads to separate webpage. Added errata from paper, thank to Harman Suri.
- 2020 August 26th: Added dataset URL.
- 2020 July 15th: Uploaded the code for running our experiments and processing our data.