Skip to content

Commit

Permalink
[Feature] Add a config of TOFlow (open-mmlab#811)
Browse files Browse the repository at this point in the history
* [Feature] Add config of TOFlow

* Update

* Update

* Update

* Update

* Update

* Update

* Update
  • Loading branch information
Yshuo-Li authored Apr 1, 2022
1 parent 347538b commit 3300f27
Show file tree
Hide file tree
Showing 9 changed files with 719 additions and 5 deletions.
13 changes: 8 additions & 5 deletions .dev_scripts/github/update_model_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -234,11 +234,14 @@ def parse_md(md_file):
except ValueError:
metrics_data = metrics_data.replace(' ', '')
else:
metrics_data = [
float(d) for d in metrics_data.split('/')
]
metrics[key] = dict(
PSNR=metrics_data[0], SSIM=metrics_data[1])
try:
metrics_data = [
float(d) for d in metrics_data.split('/')
]
metrics[key] = dict(
PSNR=metrics_data[0], SSIM=metrics_data[1])
except ValueError:
pass

model = {
'Name':
Expand Down
46 changes: 46 additions & 0 deletions configs/video_interpolators/tof/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# TOFlow (IJCV'2019)

> [Video Enhancement with Task-Oriented Flow](https://arxiv.org/abs/1711.09078)
<!-- [ALGORITHM] -->

## Abstract

<!-- [ABSTRACT] -->

Many video enhancement algorithms rely on optical flow to register frames in a video sequence. Precise flow estimation is however intractable; and optical flow itself is often a sub-optimal representation for particular video processing tasks. In this paper, we propose task-oriented flow (TOFlow), a motion representation learned in a self-supervised, task-specific manner. We design a neural network with a trainable motion estimation component and a video processing component, and train them jointly to learn the task-oriented flow. For evaluation, we build Vimeo-90K, a large-scale, high-quality video dataset for low-level video processing. TOFlow outperforms traditional optical flow on standard benchmarks as well as our Vimeo-90K dataset in three video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution.

<!-- [IMAGE] -->
<div align=center >
<img src="https://user-images.githubusercontent.com/7676947/144035477-2480d580-1409-4a7c-88d5-c13a3dbd62ac.png" width="400"/>
</div >

## Results and models

Evaluated on RGB channels.
The metrics are `PSNR / SSIM` .

| Method | Pretrained SPyNet | Vimeo90k-triplet | Download |
| :----------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------: | :--------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| [tof_vfi_spynet_chair_nobn_1xb1_vimeo90k](/configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py) | [spynet_chairs_final](https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_chair_20220321-4d82e91b.pth) | 33.3294 / 0.9465 | [model](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k_20220321-2fc9e258.pth) \| [log](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k_20220321-2fc9e258.log.json) |
| [tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k](/configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py) | [spynet_chairs_final](https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_kitti_20220321-dbcc1cc1.pth) | 33.3339 / 0.9466 | [model](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k_20220321-3f7ca4cd.pth) \| [log](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k_20220321-3f7ca4cd.log.json) |
| [tof_vfi_spynet_sintel_clean_nobn_1xb1_vimeo90k](/configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py) | [spynet_chairs_final](https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_sintel_clean_20220321-0756630b.pth) | 33.3170 / 0.9464 | [model](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_sintel_clean_nobn_1xb1_vimeo90k_20220321-6e52a6fd.pth) \| [log](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_sintel_clean_nobn_1xb1_vimeo90k_20220321-6e52a6fd.log.json) |
| [tof_vfi_spynet_sintel_final_nobn_1xb1_vimeo90k](/configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py) | [spynet_chairs_final](https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_sintel_final_20220321-5e89dcec.pth) | 33.3237 / 0.9465 | [model](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_sintel_final_nobn_1xb1_vimeo90k_20220321-8ab70dbb.pth) \| [log](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_sintel_final_nobn_1xb1_vimeo90k_20220321-8ab70dbb.log.json) |
| [tof_vfi_spynet_pytoflow_nobn_1xb1_vimeo90k](/configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py) | [spynet_chairs_final](https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_pytoflow_20220321-5bab842d.pth) | 33.3426 / 0.9467 | [model](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_pytoflow_nobn_1xb1_vimeo90k_20220321-5f4b243e.pth) \| [log](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_pytoflow_nobn_1xb1_vimeo90k_20220321-5f4b243e.log.json) |

Note: These pretrained SPyNets don't contain BN layer since `batch_size=1`, which is consistent with `https://github.com/Coldog2333/pytoflow`.

## Citation

```bibtex
@article{xue2019video,
title={Video enhancement with task-oriented flow},
author={Xue, Tianfan and Chen, Baian and Wu, Jiajun and Wei, Donglai and Freeman, William T},
journal={International Journal of Computer Vision},
volume={127},
number={8},
pages={1106--1125},
year={2019},
publisher={Springer}
}
```
74 changes: 74 additions & 0 deletions configs/video_interpolators/tof/metafile.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
Collections:
- Metadata:
Architecture:
- TOFlow
Name: TOFlow
Paper:
- https://arxiv.org/abs/1711.09078
README: configs/video_interpolators/tof/README.md
Models:
- Config: configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py
In Collection: TOFlow
Metadata:
Training Data: VIMEO90K
Name: tof_vfi_spynet_chair_nobn_1xb1_vimeo90k
Results:
- Dataset: VIMEO90K
Metrics:
Vimeo90k-triplet:
PSNR: 33.3294
SSIM: 0.9465
Task: Video_interpolators
Weights: https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k_20220321-2fc9e258.pth
- Config: configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py
In Collection: TOFlow
Metadata:
Training Data: VIMEO90K
Name: tof_vfi_spynet_chair_nobn_1xb1_vimeo90k
Results:
- Dataset: VIMEO90K
Metrics:
Vimeo90k-triplet:
PSNR: 33.3339
SSIM: 0.9466
Task: Video_interpolators
Weights: https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k_20220321-3f7ca4cd.pth
- Config: configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py
In Collection: TOFlow
Metadata:
Training Data: VIMEO90K
Name: tof_vfi_spynet_chair_nobn_1xb1_vimeo90k
Results:
- Dataset: VIMEO90K
Metrics:
Vimeo90k-triplet:
PSNR: 33.317
SSIM: 0.9464
Task: Video_interpolators
Weights: https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_sintel_clean_nobn_1xb1_vimeo90k_20220321-6e52a6fd.pth
- Config: configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py
In Collection: TOFlow
Metadata:
Training Data: VIMEO90K
Name: tof_vfi_spynet_chair_nobn_1xb1_vimeo90k
Results:
- Dataset: VIMEO90K
Metrics:
Vimeo90k-triplet:
PSNR: 33.3237
SSIM: 0.9465
Task: Video_interpolators
Weights: https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_sintel_final_nobn_1xb1_vimeo90k_20220321-8ab70dbb.pth
- Config: configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py
In Collection: TOFlow
Metadata:
Training Data: VIMEO90K
Name: tof_vfi_spynet_chair_nobn_1xb1_vimeo90k
Results:
- Dataset: VIMEO90K
Metrics:
Vimeo90k-triplet:
PSNR: 33.3426
SSIM: 0.9467
Task: Video_interpolators
Weights: https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_pytoflow_nobn_1xb1_vimeo90k_20220321-5f4b243e.pth
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
exp_name = 'tof_vfi_spynet_chair_nobn_1xb1_vimeo90k'

# pretrained SPyNet
source = 'https://download.openmmlab.com/mmediting/video_interpolators/toflow'
spynet_file = 'pretrained_spynet_chair_20220321-4d82e91b.pth'
load_pretrained_spynet = f'{source}/{spynet_file}'

# model settings
model = dict(
type='BasicInterpolator',
generator=dict(
type='TOFlowVFI',
rgb_mean=[0.485, 0.456, 0.406],
rgb_std=[0.229, 0.224, 0.225],
flow_cfg=dict(norm_cfg=None, pretrained=load_pretrained_spynet)),
pixel_loss=dict(type='CharbonnierLoss', loss_weight=1.0, reduction='mean'))
# model training and testing settings
train_cfg = None
test_cfg = dict(metrics=['PSNR', 'SSIM'], crop_border=0)

# dataset settings
train_dataset_type = 'VFIVimeo90KDataset'

train_pipeline = [
dict(
type='LoadImageFromFileList',
io_backend='disk',
key='inputs',
channel_order='rgb',
backend='pillow'),
dict(
type='LoadImageFromFile',
io_backend='disk',
key='target',
channel_order='rgb',
backend='pillow'),
dict(type='RescaleToZeroOne', keys=['inputs', 'target']),
dict(type='FramesToTensor', keys=['inputs']),
dict(type='ImageToTensor', keys=['target']),
dict(
type='Collect',
keys=['inputs', 'target'],
meta_keys=['inputs_path', 'target_path', 'key'])
]

demo_pipeline = [
dict(
type='LoadImageFromFileList',
io_backend='disk',
key='inputs',
channel_order='rgb',
backend='pillow'),
dict(type='RescaleToZeroOne', keys=['inputs']),
dict(type='FramesToTensor', keys=['inputs']),
dict(type='Collect', keys=['inputs'], meta_keys=['inputs_path', 'key'])
]

root_dir = 'data/vimeo_triplet'
data = dict(
workers_per_gpu=1,
train_dataloader=dict(samples_per_gpu=1, drop_last=True), # 1 gpu
val_dataloader=dict(samples_per_gpu=1),
test_dataloader=dict(samples_per_gpu=1),

# train
train=dict(
type='RepeatDataset',
times=1000,
dataset=dict(
type=train_dataset_type,
folder=f'{root_dir}/sequences',
ann_file=f'{root_dir}/tri_trainlist.txt',
pipeline=train_pipeline,
test_mode=False)),
# val
val=dict(
type=train_dataset_type,
folder=f'{root_dir}/sequences',
ann_file=f'{root_dir}/tri_validlist.txt',
pipeline=train_pipeline,
test_mode=True),
# test
test=dict(
type=train_dataset_type,
folder=f'{root_dir}/sequences',
ann_file=f'{root_dir}/tri_testlist.txt',
pipeline=train_pipeline,
test_mode=True),
)

# optimizer
optimizers = dict(
generator=dict(type='Adam', lr=5e-5, betas=(0.9, 0.99), weight_decay=1e-4))

# learning policy
total_iters = 1000000
lr_config = dict(
policy='Step',
by_epoch=False,
gamma=0.5,
step=[200000, 400000, 600000, 800000])

checkpoint_config = dict(interval=5000, save_optimizer=True, by_epoch=False)
# remove gpu_collect=True in non distributed training
evaluation = dict(interval=5000, save_image=True, gpu_collect=True)
log_config = dict(
interval=100, hooks=[
dict(type='TextLoggerHook', by_epoch=False),
])
visual_config = None

# runtime settings
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = f'./work_dirs/{exp_name}'
load_from = None
resume_from = None
workflow = [('train', 1)]
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
exp_name = 'tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k'

# pretrained SPyNet
source = 'https://download.openmmlab.com/mmediting/video_interpolators/toflow'
spynet_file = 'pretrained_spynet_kitti_20220321-dbcc1cc1.pth'
load_pretrained_spynet = f'{source}/{spynet_file}'

# model settings
model = dict(
type='BasicInterpolator',
generator=dict(
type='TOFlowVFI',
rgb_mean=[0.485, 0.456, 0.406],
rgb_std=[0.229, 0.224, 0.225],
flow_cfg=dict(norm_cfg=None, pretrained=load_pretrained_spynet)),
pixel_loss=dict(type='CharbonnierLoss', loss_weight=1.0, reduction='mean'))
# model training and testing settings
train_cfg = None
test_cfg = dict(metrics=['PSNR', 'SSIM'], crop_border=0)

# dataset settings
train_dataset_type = 'VFIVimeo90KDataset'

train_pipeline = [
dict(
type='LoadImageFromFileList',
io_backend='disk',
key='inputs',
channel_order='rgb',
backend='pillow'),
dict(
type='LoadImageFromFile',
io_backend='disk',
key='target',
channel_order='rgb',
backend='pillow'),
dict(type='RescaleToZeroOne', keys=['inputs', 'target']),
dict(type='FramesToTensor', keys=['inputs']),
dict(type='ImageToTensor', keys=['target']),
dict(
type='Collect',
keys=['inputs', 'target'],
meta_keys=['inputs_path', 'target_path', 'key'])
]

demo_pipeline = [
dict(
type='LoadImageFromFileList',
io_backend='disk',
key='inputs',
channel_order='rgb',
backend='pillow'),
dict(type='RescaleToZeroOne', keys=['inputs']),
dict(type='FramesToTensor', keys=['inputs']),
dict(type='Collect', keys=['inputs'], meta_keys=['inputs_path', 'key'])
]

root_dir = 'data/vimeo_triplet'
data = dict(
workers_per_gpu=1,
train_dataloader=dict(samples_per_gpu=1, drop_last=True), # 1 gpu
val_dataloader=dict(samples_per_gpu=1),
test_dataloader=dict(samples_per_gpu=1),

# train
train=dict(
type='RepeatDataset',
times=1000,
dataset=dict(
type=train_dataset_type,
folder=f'{root_dir}/sequences',
ann_file=f'{root_dir}/tri_trainlist.txt',
pipeline=train_pipeline,
test_mode=False)),
# val
val=dict(
type=train_dataset_type,
folder=f'{root_dir}/sequences',
ann_file=f'{root_dir}/tri_validlist.txt',
pipeline=train_pipeline,
test_mode=True),
# test
test=dict(
type=train_dataset_type,
folder=f'{root_dir}/sequences',
ann_file=f'{root_dir}/tri_testlist.txt',
pipeline=train_pipeline,
test_mode=True),
)

# optimizer
optimizers = dict(
generator=dict(type='Adam', lr=5e-5, betas=(0.9, 0.99), weight_decay=1e-4))

# learning policy
total_iters = 1000000
lr_config = dict(
policy='Step',
by_epoch=False,
gamma=0.5,
step=[200000, 400000, 600000, 800000])

checkpoint_config = dict(interval=5000, save_optimizer=True, by_epoch=False)
# remove gpu_collect=True in non distributed training
evaluation = dict(interval=5000, save_image=True, gpu_collect=True)
log_config = dict(
interval=100, hooks=[
dict(type='TextLoggerHook', by_epoch=False),
])
visual_config = None

# runtime settings
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = f'./work_dirs/{exp_name}'
load_from = None
resume_from = None
workflow = [('train', 1)]
Loading

0 comments on commit 3300f27

Please sign in to comment.