If our open source codes are helpful for your research, please cite our paper:
@inproceedings{lin2023deepsvc,
title={DeepSVC: Deep Scalable Video Coding for Both Machine and Human Vision},
author={Lin, Hongbin and Chen, Bolin and Zhang, Zhichen and Lin, Jielian and Wang, Xu and Zhao, Tiesong},
booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
pages={9205--9214},
year={2023}
}
- see env.txt
- The code should run with mmtracking, copy the codes in temporal_roi_align.py to selsa.py. Then run test.py with config file selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid.py . Please see instructions in docs.
- Run test_video.py, please change data path in the file.
- The code should run with mmtracking, copy the codes in temporal_roi_align.py to selsa.py. Then run train.py with config file selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid.py . Please see instructions in docs.
-
Download the training data. We train the models on the Vimeo90k dataset (Download link).
-
Run main.py to train the PSNR/MS-SSIM models. We first pretrian model with key frame coded with bpg and lambda=2048. Then load the pretrianed weights, train with key frame coded with key frame coded with AI codecs (in image_model.py). More detail see main.py.