[Feature] Add a config of TOFlow (open-mmlab#811)

* [Feature] Add config of TOFlow * Update * Update * Update * Update * Update * Update * Update
VongolaWu · Apr 1, 2022 · 3300f27 · 3300f27
1 parent 347538b
commit 3300f27
Show file tree

Hide file tree

Showing 9 changed files with 719 additions and 5 deletions.
diff --git a/.dev_scripts/github/update_model_index.py b/.dev_scripts/github/update_model_index.py
@@ -234,11 +234,14 @@ def parse_md(md_file):
                             except ValueError:
                                 metrics_data = metrics_data.replace(' ', '')
                         else:
-                            metrics_data = [
-                                float(d) for d in metrics_data.split('/')
-                            ]
-                            metrics[key] = dict(
-                                PSNR=metrics_data[0], SSIM=metrics_data[1])
+                            try:
+                                metrics_data = [
+                                    float(d) for d in metrics_data.split('/')
+                                ]
+                                metrics[key] = dict(
+                                    PSNR=metrics_data[0], SSIM=metrics_data[1])
+                            except ValueError:
+                                pass
 
                     model = {
                         'Name':

diff --git a/configs/video_interpolators/tof/README.md b/configs/video_interpolators/tof/README.md
@@ -0,0 +1,46 @@
+# TOFlow (IJCV'2019)
+
+> [Video Enhancement with Task-Oriented Flow](https://arxiv.org/abs/1711.09078)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+<!-- [ABSTRACT] -->
+
+Many video enhancement algorithms rely on optical flow to register frames in a video sequence. Precise flow estimation is however intractable; and optical flow itself is often a sub-optimal representation for particular video processing tasks. In this paper, we propose task-oriented flow (TOFlow), a motion representation learned in a self-supervised, task-specific manner. We design a neural network with a trainable motion estimation component and a video processing component, and train them jointly to learn the task-oriented flow. For evaluation, we build Vimeo-90K, a large-scale, high-quality video dataset for low-level video processing. TOFlow outperforms traditional optical flow on standard benchmarks as well as our Vimeo-90K dataset in three video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution.
+
+<!-- [IMAGE] -->
+<div align=center >
+ <img src="https://user-images.githubusercontent.com/7676947/144035477-2480d580-1409-4a7c-88d5-c13a3dbd62ac.png" width="400"/>
+</div >
+
+## Results and models
+
+Evaluated on RGB channels.
+The metrics are `PSNR / SSIM` .
+
+|                                                  Method                                                      |                                                           Pretrained SPyNet                                                              | Vimeo90k-triplet |                                                                                                                                Download                                                                                                                                    |
+| :----------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------: | :--------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| [tof_vfi_spynet_chair_nobn_1xb1_vimeo90k](/configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py) | [spynet_chairs_final](https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_chair_20220321-4d82e91b.pth) | 33.3294 / 0.9465 | [model](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k_20220321-2fc9e258.pth) \| [log](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k_20220321-2fc9e258.log.json) |
+| [tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k](/configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py) | [spynet_chairs_final](https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_kitti_20220321-dbcc1cc1.pth) | 33.3339 / 0.9466 | [model](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k_20220321-3f7ca4cd.pth) \| [log](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k_20220321-3f7ca4cd.log.json) |
+| [tof_vfi_spynet_sintel_clean_nobn_1xb1_vimeo90k](/configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py) | [spynet_chairs_final](https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_sintel_clean_20220321-0756630b.pth) | 33.3170 / 0.9464 | [model](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_sintel_clean_nobn_1xb1_vimeo90k_20220321-6e52a6fd.pth) \| [log](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_sintel_clean_nobn_1xb1_vimeo90k_20220321-6e52a6fd.log.json) |
+| [tof_vfi_spynet_sintel_final_nobn_1xb1_vimeo90k](/configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py) | [spynet_chairs_final](https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_sintel_final_20220321-5e89dcec.pth) | 33.3237 / 0.9465 | [model](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_sintel_final_nobn_1xb1_vimeo90k_20220321-8ab70dbb.pth) \| [log](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_sintel_final_nobn_1xb1_vimeo90k_20220321-8ab70dbb.log.json) |
+| [tof_vfi_spynet_pytoflow_nobn_1xb1_vimeo90k](/configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py) | [spynet_chairs_final](https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_pytoflow_20220321-5bab842d.pth) | 33.3426 / 0.9467 | [model](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_pytoflow_nobn_1xb1_vimeo90k_20220321-5f4b243e.pth) \| [log](https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_pytoflow_nobn_1xb1_vimeo90k_20220321-5f4b243e.log.json) |
+
+Note: These pretrained SPyNets don't contain BN layer since `batch_size=1`, which is consistent with `https://github.com/Coldog2333/pytoflow`.
+
+## Citation
+
+```bibtex
+@article{xue2019video,
+  title={Video enhancement with task-oriented flow},
+  author={Xue, Tianfan and Chen, Baian and Wu, Jiajun and Wei, Donglai and Freeman, William T},
+  journal={International Journal of Computer Vision},
+  volume={127},
+  number={8},
+  pages={1106--1125},
+  year={2019},
+  publisher={Springer}
+}
+```
diff --git a/configs/video_interpolators/tof/metafile.yml b/configs/video_interpolators/tof/metafile.yml
@@ -0,0 +1,74 @@
+Collections:
+- Metadata:
+    Architecture:
+    - TOFlow
+  Name: TOFlow
+  Paper:
+  - https://arxiv.org/abs/1711.09078
+  README: configs/video_interpolators/tof/README.md
+Models:
+- Config: configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py
+  In Collection: TOFlow
+  Metadata:
+    Training Data: VIMEO90K
+  Name: tof_vfi_spynet_chair_nobn_1xb1_vimeo90k
+  Results:
+  - Dataset: VIMEO90K
+    Metrics:
+      Vimeo90k-triplet:
+        PSNR: 33.3294
+        SSIM: 0.9465
+    Task: Video_interpolators
+  Weights: https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k_20220321-2fc9e258.pth
+- Config: configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py
+  In Collection: TOFlow
+  Metadata:
+    Training Data: VIMEO90K
+  Name: tof_vfi_spynet_chair_nobn_1xb1_vimeo90k
+  Results:
+  - Dataset: VIMEO90K
+    Metrics:
+      Vimeo90k-triplet:
+        PSNR: 33.3339
+        SSIM: 0.9466
+    Task: Video_interpolators
+  Weights: https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k_20220321-3f7ca4cd.pth
+- Config: configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py
+  In Collection: TOFlow
+  Metadata:
+    Training Data: VIMEO90K
+  Name: tof_vfi_spynet_chair_nobn_1xb1_vimeo90k
+  Results:
+  - Dataset: VIMEO90K
+    Metrics:
+      Vimeo90k-triplet:
+        PSNR: 33.317
+        SSIM: 0.9464
+    Task: Video_interpolators
+  Weights: https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_sintel_clean_nobn_1xb1_vimeo90k_20220321-6e52a6fd.pth
+- Config: configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py
+  In Collection: TOFlow
+  Metadata:
+    Training Data: VIMEO90K
+  Name: tof_vfi_spynet_chair_nobn_1xb1_vimeo90k
+  Results:
+  - Dataset: VIMEO90K
+    Metrics:
+      Vimeo90k-triplet:
+        PSNR: 33.3237
+        SSIM: 0.9465
+    Task: Video_interpolators
+  Weights: https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_sintel_final_nobn_1xb1_vimeo90k_20220321-8ab70dbb.pth
+- Config: configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py
+  In Collection: TOFlow
+  Metadata:
+    Training Data: VIMEO90K
+  Name: tof_vfi_spynet_chair_nobn_1xb1_vimeo90k
+  Results:
+  - Dataset: VIMEO90K
+    Metrics:
+      Vimeo90k-triplet:
+        PSNR: 33.3426
+        SSIM: 0.9467
+    Task: Video_interpolators
+  Weights: https://download.openmmlab.com/mmediting/video_interpolators/toflow/tof_vfi_spynet_pytoflow_nobn_1xb1_vimeo90k_20220321-5f4b243e.pth
diff --git a/configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py b/configs/video_interpolators/tof/tof_vfi_spynet_chair_nobn_1xb1_vimeo90k.py
@@ -0,0 +1,118 @@
+exp_name = 'tof_vfi_spynet_chair_nobn_1xb1_vimeo90k'
+
+# pretrained SPyNet
+source = 'https://download.openmmlab.com/mmediting/video_interpolators/toflow'
+spynet_file = 'pretrained_spynet_chair_20220321-4d82e91b.pth'
+load_pretrained_spynet = f'{source}/{spynet_file}'
+
+# model settings
+model = dict(
+    type='BasicInterpolator',
+    generator=dict(
+        type='TOFlowVFI',
+        rgb_mean=[0.485, 0.456, 0.406],
+        rgb_std=[0.229, 0.224, 0.225],
+        flow_cfg=dict(norm_cfg=None, pretrained=load_pretrained_spynet)),
+    pixel_loss=dict(type='CharbonnierLoss', loss_weight=1.0, reduction='mean'))
+# model training and testing settings
+train_cfg = None
+test_cfg = dict(metrics=['PSNR', 'SSIM'], crop_border=0)
+
+# dataset settings
+train_dataset_type = 'VFIVimeo90KDataset'
+
+train_pipeline = [
+    dict(
+        type='LoadImageFromFileList',
+        io_backend='disk',
+        key='inputs',
+        channel_order='rgb',
+        backend='pillow'),
+    dict(
+        type='LoadImageFromFile',
+        io_backend='disk',
+        key='target',
+        channel_order='rgb',
+        backend='pillow'),
+    dict(type='RescaleToZeroOne', keys=['inputs', 'target']),
+    dict(type='FramesToTensor', keys=['inputs']),
+    dict(type='ImageToTensor', keys=['target']),
+    dict(
+        type='Collect',
+        keys=['inputs', 'target'],
+        meta_keys=['inputs_path', 'target_path', 'key'])
+]
+
+demo_pipeline = [
+    dict(
+        type='LoadImageFromFileList',
+        io_backend='disk',
+        key='inputs',
+        channel_order='rgb',
+        backend='pillow'),
+    dict(type='RescaleToZeroOne', keys=['inputs']),
+    dict(type='FramesToTensor', keys=['inputs']),
+    dict(type='Collect', keys=['inputs'], meta_keys=['inputs_path', 'key'])
+]
+
+root_dir = 'data/vimeo_triplet'
+data = dict(
+    workers_per_gpu=1,
+    train_dataloader=dict(samples_per_gpu=1, drop_last=True),  # 1 gpu
+    val_dataloader=dict(samples_per_gpu=1),
+    test_dataloader=dict(samples_per_gpu=1),
+
+    # train
+    train=dict(
+        type='RepeatDataset',
+        times=1000,
+        dataset=dict(
+            type=train_dataset_type,
+            folder=f'{root_dir}/sequences',
+            ann_file=f'{root_dir}/tri_trainlist.txt',
+            pipeline=train_pipeline,
+            test_mode=False)),
+    # val
+    val=dict(
+        type=train_dataset_type,
+        folder=f'{root_dir}/sequences',
+        ann_file=f'{root_dir}/tri_validlist.txt',
+        pipeline=train_pipeline,
+        test_mode=True),
+    # test
+    test=dict(
+        type=train_dataset_type,
+        folder=f'{root_dir}/sequences',
+        ann_file=f'{root_dir}/tri_testlist.txt',
+        pipeline=train_pipeline,
+        test_mode=True),
+)
+
+# optimizer
+optimizers = dict(
+    generator=dict(type='Adam', lr=5e-5, betas=(0.9, 0.99), weight_decay=1e-4))
+
+# learning policy
+total_iters = 1000000
+lr_config = dict(
+    policy='Step',
+    by_epoch=False,
+    gamma=0.5,
+    step=[200000, 400000, 600000, 800000])
+
+checkpoint_config = dict(interval=5000, save_optimizer=True, by_epoch=False)
+# remove gpu_collect=True in non distributed training
+evaluation = dict(interval=5000, save_image=True, gpu_collect=True)
+log_config = dict(
+    interval=100, hooks=[
+        dict(type='TextLoggerHook', by_epoch=False),
+    ])
+visual_config = None
+
+# runtime settings
+dist_params = dict(backend='nccl')
+log_level = 'INFO'
+work_dir = f'./work_dirs/{exp_name}'
+load_from = None
+resume_from = None
+workflow = [('train', 1)]
diff --git a/configs/video_interpolators/tof/tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k.py b/configs/video_interpolators/tof/tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k.py
@@ -0,0 +1,118 @@
+exp_name = 'tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k'
+
+# pretrained SPyNet
+source = 'https://download.openmmlab.com/mmediting/video_interpolators/toflow'
+spynet_file = 'pretrained_spynet_kitti_20220321-dbcc1cc1.pth'
+load_pretrained_spynet = f'{source}/{spynet_file}'
+
+# model settings
+model = dict(
+    type='BasicInterpolator',
+    generator=dict(
+        type='TOFlowVFI',
+        rgb_mean=[0.485, 0.456, 0.406],
+        rgb_std=[0.229, 0.224, 0.225],
+        flow_cfg=dict(norm_cfg=None, pretrained=load_pretrained_spynet)),
+    pixel_loss=dict(type='CharbonnierLoss', loss_weight=1.0, reduction='mean'))
+# model training and testing settings
+train_cfg = None
+test_cfg = dict(metrics=['PSNR', 'SSIM'], crop_border=0)
+
+# dataset settings
+train_dataset_type = 'VFIVimeo90KDataset'
+
+train_pipeline = [
+    dict(
+        type='LoadImageFromFileList',
+        io_backend='disk',
+        key='inputs',
+        channel_order='rgb',
+        backend='pillow'),
+    dict(
+        type='LoadImageFromFile',
+        io_backend='disk',
+        key='target',
+        channel_order='rgb',
+        backend='pillow'),
+    dict(type='RescaleToZeroOne', keys=['inputs', 'target']),
+    dict(type='FramesToTensor', keys=['inputs']),
+    dict(type='ImageToTensor', keys=['target']),
+    dict(
+        type='Collect',
+        keys=['inputs', 'target'],
+        meta_keys=['inputs_path', 'target_path', 'key'])
+]
+
+demo_pipeline = [
+    dict(
+        type='LoadImageFromFileList',
+        io_backend='disk',
+        key='inputs',
+        channel_order='rgb',
+        backend='pillow'),
+    dict(type='RescaleToZeroOne', keys=['inputs']),
+    dict(type='FramesToTensor', keys=['inputs']),
+    dict(type='Collect', keys=['inputs'], meta_keys=['inputs_path', 'key'])
+]
+
+root_dir = 'data/vimeo_triplet'
+data = dict(
+    workers_per_gpu=1,
+    train_dataloader=dict(samples_per_gpu=1, drop_last=True),  # 1 gpu
+    val_dataloader=dict(samples_per_gpu=1),
+    test_dataloader=dict(samples_per_gpu=1),
+
+    # train
+    train=dict(
+        type='RepeatDataset',
+        times=1000,
+        dataset=dict(
+            type=train_dataset_type,
+            folder=f'{root_dir}/sequences',
+            ann_file=f'{root_dir}/tri_trainlist.txt',
+            pipeline=train_pipeline,
+            test_mode=False)),
+    # val
+    val=dict(
+        type=train_dataset_type,
+        folder=f'{root_dir}/sequences',
+        ann_file=f'{root_dir}/tri_validlist.txt',
+        pipeline=train_pipeline,
+        test_mode=True),
+    # test
+    test=dict(
+        type=train_dataset_type,
+        folder=f'{root_dir}/sequences',
+        ann_file=f'{root_dir}/tri_testlist.txt',
+        pipeline=train_pipeline,
+        test_mode=True),
+)
+
+# optimizer
+optimizers = dict(
+    generator=dict(type='Adam', lr=5e-5, betas=(0.9, 0.99), weight_decay=1e-4))
+
+# learning policy
+total_iters = 1000000
+lr_config = dict(
+    policy='Step',
+    by_epoch=False,
+    gamma=0.5,
+    step=[200000, 400000, 600000, 800000])
+
+checkpoint_config = dict(interval=5000, save_optimizer=True, by_epoch=False)
+# remove gpu_collect=True in non distributed training
+evaluation = dict(interval=5000, save_image=True, gpu_collect=True)
+log_config = dict(
+    interval=100, hooks=[
+        dict(type='TextLoggerHook', by_epoch=False),
+    ])
+visual_config = None
+
+# runtime settings
+dist_params = dict(backend='nccl')
+log_level = 'INFO'
+work_dir = f'./work_dirs/{exp_name}'
+load_from = None
+resume_from = None
+workflow = [('train', 1)]