forked from open-mmlab/mmagic
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Feature] Add a config of TOFlow (open-mmlab#811)
* [Feature] Add config of TOFlow * Update * Update * Update * Update * Update * Update * Update
- Loading branch information
Showing
9 changed files
with
719 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# TOFlow (IJCV'2019) | ||
|
||
> [Video Enhancement with Task-Oriented Flow](https://arxiv.org/abs/1711.09078) | ||
<!-- [ALGORITHM] --> | ||
|
||
## Abstract | ||
|
||
<!-- [ABSTRACT] --> | ||
|
||
Many video enhancement algorithms rely on optical flow to register frames in a video sequence. Precise flow estimation is however intractable; and optical flow itself is often a sub-optimal representation for particular video processing tasks. In this paper, we propose task-oriented flow (TOFlow), a motion representation learned in a self-supervised, task-specific manner. We design a neural network with a trainable motion estimation component and a video processing component, and train them jointly to learn the task-oriented flow. For evaluation, we build Vimeo-90K, a large-scale, high-quality video dataset for low-level video processing. TOFlow outperforms traditional optical flow on standard benchmarks as well as our Vimeo-90K dataset in three video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution. | ||
|
||
<!-- [IMAGE] --> | ||
<div align=center > | ||
<img src="https://user-images.githubusercontent.com/7676947/144035477-2480d580-1409-4a7c-88d5-c13a3dbd62ac.png" width="400"/> | ||
</div > | ||
|
||
## Results and models | ||
|
||
Evaluated on RGB channels. | ||
The metrics are `PSNR / SSIM` . | ||
|
||
| Method | Pretrained SPyNet | Vimeo90k-triplet | Download | | ||
| :----------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------: | :--------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | | ||
| [tof_vfi_spynet_chair_nobn_1xb1_vimeo90k](/configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py) | [spynet_chairs_final](https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_chair_20220321-4d82e91b.pth) | 33.3294 / 0.9465 | [model](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k_20220321-2fc9e258.pth) \| [log](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k_20220321-2fc9e258.log.json) | | ||
| [tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k](/configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py) | [spynet_chairs_final](https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_kitti_20220321-dbcc1cc1.pth) | 33.3339 / 0.9466 | [model](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k_20220321-3f7ca4cd.pth) \| [log](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k_20220321-3f7ca4cd.log.json) | | ||
| [tof_vfi_spynet_sintel_clean_nobn_1xb1_vimeo90k](/configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py) | [spynet_chairs_final](https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_sintel_clean_20220321-0756630b.pth) | 33.3170 / 0.9464 | [model](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_sintel_clean_nobn_1xb1_vimeo90k_20220321-6e52a6fd.pth) \| [log](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_sintel_clean_nobn_1xb1_vimeo90k_20220321-6e52a6fd.log.json) | | ||
| [tof_vfi_spynet_sintel_final_nobn_1xb1_vimeo90k](/configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py) | [spynet_chairs_final](https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_sintel_final_20220321-5e89dcec.pth) | 33.3237 / 0.9465 | [model](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_sintel_final_nobn_1xb1_vimeo90k_20220321-8ab70dbb.pth) \| [log](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_sintel_final_nobn_1xb1_vimeo90k_20220321-8ab70dbb.log.json) | | ||
| [tof_vfi_spynet_pytoflow_nobn_1xb1_vimeo90k](/configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py) | [spynet_chairs_final](https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_pytoflow_20220321-5bab842d.pth) | 33.3426 / 0.9467 | [model](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_pytoflow_nobn_1xb1_vimeo90k_20220321-5f4b243e.pth) \| [log](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_pytoflow_nobn_1xb1_vimeo90k_20220321-5f4b243e.log.json) | | ||
|
||
Note: These pretrained SPyNets don't contain BN layer since `batch_size=1`, which is consistent with `https://github.com/Coldog2333/pytoflow`. | ||
|
||
## Citation | ||
|
||
```bibtex | ||
@article{xue2019video, | ||
title={Video enhancement with task-oriented flow}, | ||
author={Xue, Tianfan and Chen, Baian and Wu, Jiajun and Wei, Donglai and Freeman, William T}, | ||
journal={International Journal of Computer Vision}, | ||
volume={127}, | ||
number={8}, | ||
pages={1106--1125}, | ||
year={2019}, | ||
publisher={Springer} | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
Collections: | ||
- Metadata: | ||
Architecture: | ||
- TOFlow | ||
Name: TOFlow | ||
Paper: | ||
- https://arxiv.org/abs/1711.09078 | ||
README: configs/video_interpolators/tof/README.md | ||
Models: | ||
- Config: configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py | ||
In Collection: TOFlow | ||
Metadata: | ||
Training Data: VIMEO90K | ||
Name: tof_vfi_spynet_chair_nobn_1xb1_vimeo90k | ||
Results: | ||
- Dataset: VIMEO90K | ||
Metrics: | ||
Vimeo90k-triplet: | ||
PSNR: 33.3294 | ||
SSIM: 0.9465 | ||
Task: Video_interpolators | ||
Weights: https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k_20220321-2fc9e258.pth | ||
- Config: configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py | ||
In Collection: TOFlow | ||
Metadata: | ||
Training Data: VIMEO90K | ||
Name: tof_vfi_spynet_chair_nobn_1xb1_vimeo90k | ||
Results: | ||
- Dataset: VIMEO90K | ||
Metrics: | ||
Vimeo90k-triplet: | ||
PSNR: 33.3339 | ||
SSIM: 0.9466 | ||
Task: Video_interpolators | ||
Weights: https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k_20220321-3f7ca4cd.pth | ||
- Config: configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py | ||
In Collection: TOFlow | ||
Metadata: | ||
Training Data: VIMEO90K | ||
Name: tof_vfi_spynet_chair_nobn_1xb1_vimeo90k | ||
Results: | ||
- Dataset: VIMEO90K | ||
Metrics: | ||
Vimeo90k-triplet: | ||
PSNR: 33.317 | ||
SSIM: 0.9464 | ||
Task: Video_interpolators | ||
Weights: https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_sintel_clean_nobn_1xb1_vimeo90k_20220321-6e52a6fd.pth | ||
- Config: configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py | ||
In Collection: TOFlow | ||
Metadata: | ||
Training Data: VIMEO90K | ||
Name: tof_vfi_spynet_chair_nobn_1xb1_vimeo90k | ||
Results: | ||
- Dataset: VIMEO90K | ||
Metrics: | ||
Vimeo90k-triplet: | ||
PSNR: 33.3237 | ||
SSIM: 0.9465 | ||
Task: Video_interpolators | ||
Weights: https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_sintel_final_nobn_1xb1_vimeo90k_20220321-8ab70dbb.pth | ||
- Config: configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py | ||
In Collection: TOFlow | ||
Metadata: | ||
Training Data: VIMEO90K | ||
Name: tof_vfi_spynet_chair_nobn_1xb1_vimeo90k | ||
Results: | ||
- Dataset: VIMEO90K | ||
Metrics: | ||
Vimeo90k-triplet: | ||
PSNR: 33.3426 | ||
SSIM: 0.9467 | ||
Task: Video_interpolators | ||
Weights: https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_pytoflow_nobn_1xb1_vimeo90k_20220321-5f4b243e.pth |
118 changes: 118 additions & 0 deletions
118
configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
exp_name = 'tof_vfi_spynet_chair_nobn_1xb1_vimeo90k' | ||
|
||
# pretrained SPyNet | ||
source = 'https://download.openmmlab.com/mmediting/video_interpolators/toflow' | ||
spynet_file = 'pretrained_spynet_chair_20220321-4d82e91b.pth' | ||
load_pretrained_spynet = f'{source}/{spynet_file}' | ||
|
||
# model settings | ||
model = dict( | ||
type='BasicInterpolator', | ||
generator=dict( | ||
type='TOFlowVFI', | ||
rgb_mean=[0.485, 0.456, 0.406], | ||
rgb_std=[0.229, 0.224, 0.225], | ||
flow_cfg=dict(norm_cfg=None, pretrained=load_pretrained_spynet)), | ||
pixel_loss=dict(type='CharbonnierLoss', loss_weight=1.0, reduction='mean')) | ||
# model training and testing settings | ||
train_cfg = None | ||
test_cfg = dict(metrics=['PSNR', 'SSIM'], crop_border=0) | ||
|
||
# dataset settings | ||
train_dataset_type = 'VFIVimeo90KDataset' | ||
|
||
train_pipeline = [ | ||
dict( | ||
type='LoadImageFromFileList', | ||
io_backend='disk', | ||
key='inputs', | ||
channel_order='rgb', | ||
backend='pillow'), | ||
dict( | ||
type='LoadImageFromFile', | ||
io_backend='disk', | ||
key='target', | ||
channel_order='rgb', | ||
backend='pillow'), | ||
dict(type='RescaleToZeroOne', keys=['inputs', 'target']), | ||
dict(type='FramesToTensor', keys=['inputs']), | ||
dict(type='ImageToTensor', keys=['target']), | ||
dict( | ||
type='Collect', | ||
keys=['inputs', 'target'], | ||
meta_keys=['inputs_path', 'target_path', 'key']) | ||
] | ||
|
||
demo_pipeline = [ | ||
dict( | ||
type='LoadImageFromFileList', | ||
io_backend='disk', | ||
key='inputs', | ||
channel_order='rgb', | ||
backend='pillow'), | ||
dict(type='RescaleToZeroOne', keys=['inputs']), | ||
dict(type='FramesToTensor', keys=['inputs']), | ||
dict(type='Collect', keys=['inputs'], meta_keys=['inputs_path', 'key']) | ||
] | ||
|
||
root_dir = 'data/vimeo_triplet' | ||
data = dict( | ||
workers_per_gpu=1, | ||
train_dataloader=dict(samples_per_gpu=1, drop_last=True), # 1 gpu | ||
val_dataloader=dict(samples_per_gpu=1), | ||
test_dataloader=dict(samples_per_gpu=1), | ||
|
||
# train | ||
train=dict( | ||
type='RepeatDataset', | ||
times=1000, | ||
dataset=dict( | ||
type=train_dataset_type, | ||
folder=f'{root_dir}/sequences', | ||
ann_file=f'{root_dir}/tri_trainlist.txt', | ||
pipeline=train_pipeline, | ||
test_mode=False)), | ||
# val | ||
val=dict( | ||
type=train_dataset_type, | ||
folder=f'{root_dir}/sequences', | ||
ann_file=f'{root_dir}/tri_validlist.txt', | ||
pipeline=train_pipeline, | ||
test_mode=True), | ||
# test | ||
test=dict( | ||
type=train_dataset_type, | ||
folder=f'{root_dir}/sequences', | ||
ann_file=f'{root_dir}/tri_testlist.txt', | ||
pipeline=train_pipeline, | ||
test_mode=True), | ||
) | ||
|
||
# optimizer | ||
optimizers = dict( | ||
generator=dict(type='Adam', lr=5e-5, betas=(0.9, 0.99), weight_decay=1e-4)) | ||
|
||
# learning policy | ||
total_iters = 1000000 | ||
lr_config = dict( | ||
policy='Step', | ||
by_epoch=False, | ||
gamma=0.5, | ||
step=[200000, 400000, 600000, 800000]) | ||
|
||
checkpoint_config = dict(interval=5000, save_optimizer=True, by_epoch=False) | ||
# remove gpu_collect=True in non distributed training | ||
evaluation = dict(interval=5000, save_image=True, gpu_collect=True) | ||
log_config = dict( | ||
interval=100, hooks=[ | ||
dict(type='TextLoggerHook', by_epoch=False), | ||
]) | ||
visual_config = None | ||
|
||
# runtime settings | ||
dist_params = dict(backend='nccl') | ||
log_level = 'INFO' | ||
work_dir = f'./work_dirs/{exp_name}' | ||
load_from = None | ||
resume_from = None | ||
workflow = [('train', 1)] |
118 changes: 118 additions & 0 deletions
118
configs/video_interpolators/tof/tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
exp_name = 'tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k' | ||
|
||
# pretrained SPyNet | ||
source = 'https://download.openmmlab.com/mmediting/video_interpolators/toflow' | ||
spynet_file = 'pretrained_spynet_kitti_20220321-dbcc1cc1.pth' | ||
load_pretrained_spynet = f'{source}/{spynet_file}' | ||
|
||
# model settings | ||
model = dict( | ||
type='BasicInterpolator', | ||
generator=dict( | ||
type='TOFlowVFI', | ||
rgb_mean=[0.485, 0.456, 0.406], | ||
rgb_std=[0.229, 0.224, 0.225], | ||
flow_cfg=dict(norm_cfg=None, pretrained=load_pretrained_spynet)), | ||
pixel_loss=dict(type='CharbonnierLoss', loss_weight=1.0, reduction='mean')) | ||
# model training and testing settings | ||
train_cfg = None | ||
test_cfg = dict(metrics=['PSNR', 'SSIM'], crop_border=0) | ||
|
||
# dataset settings | ||
train_dataset_type = 'VFIVimeo90KDataset' | ||
|
||
train_pipeline = [ | ||
dict( | ||
type='LoadImageFromFileList', | ||
io_backend='disk', | ||
key='inputs', | ||
channel_order='rgb', | ||
backend='pillow'), | ||
dict( | ||
type='LoadImageFromFile', | ||
io_backend='disk', | ||
key='target', | ||
channel_order='rgb', | ||
backend='pillow'), | ||
dict(type='RescaleToZeroOne', keys=['inputs', 'target']), | ||
dict(type='FramesToTensor', keys=['inputs']), | ||
dict(type='ImageToTensor', keys=['target']), | ||
dict( | ||
type='Collect', | ||
keys=['inputs', 'target'], | ||
meta_keys=['inputs_path', 'target_path', 'key']) | ||
] | ||
|
||
demo_pipeline = [ | ||
dict( | ||
type='LoadImageFromFileList', | ||
io_backend='disk', | ||
key='inputs', | ||
channel_order='rgb', | ||
backend='pillow'), | ||
dict(type='RescaleToZeroOne', keys=['inputs']), | ||
dict(type='FramesToTensor', keys=['inputs']), | ||
dict(type='Collect', keys=['inputs'], meta_keys=['inputs_path', 'key']) | ||
] | ||
|
||
root_dir = 'data/vimeo_triplet' | ||
data = dict( | ||
workers_per_gpu=1, | ||
train_dataloader=dict(samples_per_gpu=1, drop_last=True), # 1 gpu | ||
val_dataloader=dict(samples_per_gpu=1), | ||
test_dataloader=dict(samples_per_gpu=1), | ||
|
||
# train | ||
train=dict( | ||
type='RepeatDataset', | ||
times=1000, | ||
dataset=dict( | ||
type=train_dataset_type, | ||
folder=f'{root_dir}/sequences', | ||
ann_file=f'{root_dir}/tri_trainlist.txt', | ||
pipeline=train_pipeline, | ||
test_mode=False)), | ||
# val | ||
val=dict( | ||
type=train_dataset_type, | ||
folder=f'{root_dir}/sequences', | ||
ann_file=f'{root_dir}/tri_validlist.txt', | ||
pipeline=train_pipeline, | ||
test_mode=True), | ||
# test | ||
test=dict( | ||
type=train_dataset_type, | ||
folder=f'{root_dir}/sequences', | ||
ann_file=f'{root_dir}/tri_testlist.txt', | ||
pipeline=train_pipeline, | ||
test_mode=True), | ||
) | ||
|
||
# optimizer | ||
optimizers = dict( | ||
generator=dict(type='Adam', lr=5e-5, betas=(0.9, 0.99), weight_decay=1e-4)) | ||
|
||
# learning policy | ||
total_iters = 1000000 | ||
lr_config = dict( | ||
policy='Step', | ||
by_epoch=False, | ||
gamma=0.5, | ||
step=[200000, 400000, 600000, 800000]) | ||
|
||
checkpoint_config = dict(interval=5000, save_optimizer=True, by_epoch=False) | ||
# remove gpu_collect=True in non distributed training | ||
evaluation = dict(interval=5000, save_image=True, gpu_collect=True) | ||
log_config = dict( | ||
interval=100, hooks=[ | ||
dict(type='TextLoggerHook', by_epoch=False), | ||
]) | ||
visual_config = None | ||
|
||
# runtime settings | ||
dist_params = dict(backend='nccl') | ||
log_level = 'INFO' | ||
work_dir = f'./work_dirs/{exp_name}' | ||
load_from = None | ||
resume_from = None | ||
workflow = [('train', 1)] |
Oops, something went wrong.