This repo is the source code of the implementations of the baselines in the paper "SVD: A Large-Scale Short Video Dataset for Near-Duplicate Video Retrieval" publised on ICCV-2019. The authors are Qing-Yuan Jiang, Yi He, Gen Li, Jian Lin, Lei Li and Wu-Jun Li. If you have any questions about the source code, pls contact: linj#lamda.nju.edu.cn.
python 3
pytorch
path/to/data
│
└───videos/
│ │ xxx.mp4
│ │ ...
│
└───frames/
│ │ xxxx.mp4/0000.jpg
│ │ xxxx.mp4/0001.jpg
│ │ ...
│ │ xxxx.mp4/xxxx.jpg
│ │ ...
│ │ xxxy.mp4/0000.jpg
│ │ xxxy.mp4/0001.jpg
│ │ ...
│ │ xxxy.mp4/xxxx.jpg
│ │ ...
│
└───features/
│ │ frames-features.h5
│ │ videos-features.h5
│ │ ...
Required files: videos in the folder: /path/to/data/videos/*.mp4.
Run the following command:
python videoprocess/frame_extraction.py --dataname svd
The extracted frames will be saved in the folder: /path/to/data/frames/. The total storage cost for frames is about 400G (358G on my device) when fps=1.
Required files: frames/xxx.mp4/xxxx.jpg in the folder: /path/to/data/frames.
Run the following command:
CUDA_VISIBLE_DEVICES=1 python videoprocess/deepfeatures_extraction.py --dataname svd
The extracted deep features for each video will be saved in the file: /path/to/data/features/frames-features.h5. This file is about 153G when fps=1.
Required files: frames-features.h5 in the folder: /path/to/data/features.
python videoprocess/videofeatures_extraction.py --dataname svd
The aggregated features for each will be stored in the file: /path/to/data/features/videos-features.h5. This file is about 8.8G when fps=1.
Required files: features in the folder: /path/to/data/features/videos-features.h5.
Run the following command:
python demos/bfs_demo.py --dataname svd
The map is: 0.7537
Required file: features in the folder: /path/to/data/features/videos-features.h5.
Run the following command:
python demos/lsh_demo.py --dataname svd --approach lsh --bit 16
The map is: 0.0370
Run the following command:
python demos/itq_demo.py --dataname svd --approach itq --bit 16
The map@16bits is: 0.0560
Run the following command:
python demos/isoh_demo.py --dataname svd --approach isoh --bit 16
The map@16bits is: 0.0562
- Step 1: sampling frames for clustering:
Required files: frames-features.h5 in the folder: /path/to/data/features.
python videoprocess/cnnlv_keyframe_sampling.py --dataname svd --approach cfs
Sampled frames are stored at: /path/to/data/features/cnnlv-sampling-features.h5
- Step 2: clustering
Required files: cnnlv-sampling-features.h5 in the folder: /path/to/data/features.
python videoprocess/cnnv_clustering.py --dataname svd --approach cnnvcluster
The learned centers are stored at: /path/to/data/features/cnnv-centers.h5
- Step 3: generating cnnv features
Required files: cnnv-centers in the folder: /path/to/data/features/cnnv-centers.h5
python videoprocess/cnnv_feature_aggregation.py --dataname svd --approach cnnvfa
The learned features are stored at: /path/to/data/features/cnnv-agg-features.h5
- Step 4: evaluating cnnv
Required files: cnnv-aggregated features in the folder: /path/to/data/features/cnnv-agg-features.h5
python demos/cnnv_demo.py --dataname svd --approach cnnv
The map is: 0.1895
- frame extraction
- deep feature extraction
- brute-force demo
- LSH demo
- ITQ demo
- IsoH demo
- CNNV demo