ReStraV: AI-Generated Video Detection via Perceptual Straightening

Official implementation of the paper "AI-Generated Video Detection via Perceptual Straightening", accepted at NeurIPS 2025.

Figure 1: The ReStraV method. Video frames are processed by a self-supervised encoder (DINOv2) to get embeddings. In this representation space, natural videos trace "straighter" paths than AI-generated ones. The trajectory's geometry, especially its curvature, serves as a powerful signal for a lightweight classifier to distinguish real from fake.

Important (local setup knobs): several scripts include hard-coded values for device (e.g. cuda:1), batch_size, num_workers, paths, and download worker counts.
You will likely need to open the files and change these values to match your machine (GPU index, RAM/VRAM, CPU cores, filesystem layout).

What this repo does

Core idea:

Sample a short clip from each video (default: ~2 seconds, 24 frames).
Encode frames with a pretrained vision backbone (DINOv2 ViT-S/14 via torch.hub).
Treat the per-frame embeddings as a trajectory in representation space.
Compute temporal geometry features: stepwise distances and curvature/turning angles across time.
Train a lightweight classifier (an MLP) on a 21-D feature vector per video.
Use the trained model to predict whether a new video is REAL or FAKE.

Repository layout (high level)

dinov2_features.py — video decoding + DINOv2 embedding extraction + 21-D feature computation
train.py — trains the MLP classifier; saves model.pt, mean.npy, std.npy, best_tau.npy
demo.py — Gradio demo (upload video or paste URL; uses yt-dlp to download)
DATA/ — data + helper scripts (download/extract features) and generated artifacts

Method details (the 21-D feature vector)

The feature builder in dinov2_features.py computes:

7 early stepwise distances: d[0:7]
6 early turning angles: theta[0:6]
8 summary statistics (mean/min/max/variance) for distances and angles:
- μ_d, min_d, max_d, var_d
- μ_θ, min_θ, max_θ, var_θ

Total: 7 + 6 + 8 = 21 features per video.

Setup

1) Clone

git clone https://github.com/ChristianInterno/ReStraV.git
cd ReStraV

2) Install dependencies

pip install -r requirements.txt

Data (training)

REAL videos: pulled from the Video Similarity Challenge URL list, filtered by a local reference list file
FAKE videos: pulled from VidProM (often the example/ subset from Hugging Face)

Step-by-step pipeline

Step A — Download training videos

python DATA/download_training_data.py

Downloads a subset of REAL mp4s by matching filenames from a ref_file_paths.txt list
Downloads FAKE examples from the VidProM dataset and extracts .tar files into FAKE/

Things you may need to edit inside the script:

MAX_WORKERS (default may be too high for your network / OS)
TIMEOUT

Step B — Extract DINOv2 geometry features into an HDF5

python DATA/extract_training_features.py

This writes an HDF5 file:

path (string)
label (int; 1=real, 0=fake)
features (float; shape [N, 21])

Things you may need to edit inside this script:

batch_size
device

Step C — Train the classifier

python train.py

Loads all samples from the HDF5
Balances classes by subsampling to equal priors
Normalizes features (saves mean.npy and std.npy)
Splits 50/50 train/test with stratification
Trains a small MLP for a fixed number of epochs
Picks an operating threshold τ* maximizing F1 on the training set
Evaluates on test set; writes test_predictions_all.csv
Saves model weights to model.pt

Things you may need to edit inside train.py:

device
DataLoader batch_size
num_workers
epochs, learning rate, hidden sizes

Outputs written in the working directory by default:

model.pt
mean.npy
std.npy
best_tau.npy
test_predictions_all.csv

Demo (Gradio)

Once you have model.pt, mean.npy, std.npy, and best_tau.npy in the repo root:

python demo.py

The demo supports:

Uploading a video file, or
Pasting a URL; it downloads the video via yt-dlp into a temp folder

Citation

If you find our work useful in your research, please consider citing our paper:

@misc{internò2025aigeneratedvideodetectionperceptual,
      title={AI-Generated Video Detection via Perceptual Straightening}, 
      author={Christian Internò and Robert Geirhos and Markus Olhofer and Sunny Liu and Barbara Hammer and David Klindt},
      year={2025},
      eprint={2507.00583},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.00583}, 
}

Acknowledgements

This research was partly funded by Honda Research Institute Europe and Cold Spring Harbor Laboratory. We would like to thank Eero Simoncelli for insightful discussions and feedback, as well as all our colleagues from Google DeepMind, the Machine Learning Group at Bielefeld University, Honda Research Institute for the insightful discussions and feedback.

All code in this repository was contributed by Sam Pagon (@sampagon).

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
DATA		DATA
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
dinov2_features.py		dinov2_features.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReStraV: AI-Generated Video Detection via Perceptual Straightening

What this repo does

Repository layout (high level)

Method details (the 21-D feature vector)

Setup

1) Clone

2) Install dependencies

Data (training)

Step-by-step pipeline

Step A — Download training videos

Step B — Extract DINOv2 geometry features into an HDF5

Step C — Train the classifier

Demo (Gradio)

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

ChristianInterno/ReStraV

Folders and files

Latest commit

History

Repository files navigation

ReStraV: AI-Generated Video Detection via Perceptual Straightening

What this repo does

Repository layout (high level)

Method details (the 21-D feature vector)

Setup

1) Clone

2) Install dependencies

Data (training)

Step-by-step pipeline

Step A — Download training videos

Step B — Extract DINOv2 geometry features into an HDF5

Step C — Train the classifier

Demo (Gradio)

Citation

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages