Skip to content

Commit bc07b8f

Browse files
Yshuo-Liwangruohui
authored andcommitted
[Feature] Add config file of FLAVR (open-mmlab#867)
* [Feature] Add config file of FLAVR * Update
1 parent 0b6bad9 commit bc07b8f

File tree

4 files changed

+236
-0
lines changed

4 files changed

+236
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# FLAVR (arXiv'2020)
2+
3+
> [FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation](https://arxiv.org/pdf/2012.08512.pdf)
4+
5+
<!-- [ALGORITHM] -->
6+
7+
## Abstract
8+
9+
<!-- [ABSTRACT] -->
10+
11+
Most modern frame interpolation approaches rely on explicit bidirectional optical flows between adjacent frames, thus are sensitive to the accuracy of underlying flow estimation in handling occlusions while additionally introducing computational bottlenecks unsuitable for efficient deployment. In this work, we propose a flow-free approach that is completely end-to-end trainable for multi-frame video interpolation. Our method, FLAVR, is designed to reason about non-linear motion trajectories and complex occlusions implicitly from unlabeled videos and greatly simplifies the process of training, testing and deploying frame interpolation models. Furthermore, FLAVR delivers up to 6× speed up compared to the current state-of-the-art methods for multi-frame interpolation while consistently demonstrating superior qualitative and quantitative results compared with prior methods on popular benchmarks including Vimeo-90K, Adobe-240FPS, and GoPro. Finally, we show that frame interpolation is a competitive self-supervised pre-training task for videos via demonstrating various novel applications of FLAVR including action recognition, optical flow estimation, motion magnification, and video object tracking. Code and trained models are provided in the supplementary material.
12+
13+
<!-- [IMAGE] -->
14+
15+
<div align=center >
16+
<img src="https://user-images.githubusercontent.com/56712176/169070212-52acdcea-d732-4441-9983-276e2e40b195.png" width="400"/>
17+
</div >
18+
19+
## Results and models
20+
21+
Evaluated on RGB channels.
22+
The metrics are `PSNR / SSIM` .
23+
24+
| Method | scale | Vimeo90k-triplet | Download |
25+
| :------------------------------------------------------------------------------------------------------------------: | :---: | :---------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
26+
| [flavr_in4out1_g8b4_vimeo90k_septuplet](/configs/video_interpolators/flavr/flavr_in4out1_g8b4_vimeo90k_septuplet.py) | x2 | 36.3340 / 0.96015 | [model](https://download.openmmlab.com/mmediting/video_interpolators/flavr/flavr_in4out1_g8b4_vimeo90k_septupli-c2468995.pth) \| [log](https://download.openmmlab.com/mmediting/video_interpolators/flavr/flavr_in4out1_g8b4_vimeo90k_septupli-c2468995.log.json) |
27+
28+
Note: FLAVR for x8 VFI task will supported in the future.
29+
30+
## Citation
31+
32+
```bibtex
33+
@article{kalluri2020flavr,
34+
title={Flavr: Flow-agnostic video representations for fast frame interpolation},
35+
author={Kalluri, Tarun and Pathak, Deepak and Chandraker, Manmohan and Tran, Du},
36+
journal={arXiv preprint arXiv:2012.08512},
37+
year={2020}
38+
}
39+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
exp_name = 'flavr_in4out1_g8b4_vimeo90k_septuplet'
2+
3+
# model settings
4+
model = dict(
5+
type='BasicInterpolator',
6+
generator=dict(
7+
type='FLAVRNet',
8+
num_input_frames=4,
9+
num_output_frames=1,
10+
mid_channels_list=[512, 256, 128, 64],
11+
encoder_layers_list=[2, 2, 2, 2],
12+
bias=False,
13+
norm_cfg=None,
14+
join_type='concat',
15+
up_mode='transpose'),
16+
pixel_loss=dict(type='L1Loss', loss_weight=1.0, reduction='mean'))
17+
# model training and testing settings
18+
train_cfg = None
19+
test_cfg = dict(metrics=['PSNR', 'SSIM', 'MAE'], crop_border=0)
20+
21+
# dataset settings
22+
train_dataset_type = 'VFIVimeo90K7FramesDataset'
23+
val_dataset_type = 'VFIVimeo90K7FramesDataset'
24+
25+
train_pipeline = [
26+
dict(
27+
type='LoadImageFromFileList',
28+
io_backend='disk',
29+
key='inputs',
30+
channel_order='rgb',
31+
backend='pillow'),
32+
dict(
33+
type='LoadImageFromFileList',
34+
io_backend='disk',
35+
key='target',
36+
channel_order='rgb',
37+
backend='pillow'),
38+
dict(type='FixedCrop', keys=['inputs', 'target'], crop_size=(256, 256)),
39+
dict(
40+
type='Flip',
41+
keys=['inputs', 'target'],
42+
flip_ratio=0.5,
43+
direction='horizontal'),
44+
dict(
45+
type='Flip',
46+
keys=['inputs', 'target'],
47+
flip_ratio=0.5,
48+
direction='vertical'),
49+
dict(
50+
type='ColorJitter',
51+
keys=['inputs', 'target'],
52+
channel_order='rgb',
53+
brightness=0.05,
54+
contrast=0.05,
55+
saturation=0.05,
56+
hue=0.05),
57+
dict(type='TemporalReverse', keys=['inputs'], reverse_ratio=0.5),
58+
dict(type='RescaleToZeroOne', keys=['inputs', 'target']),
59+
dict(type='FramesToTensor', keys=['inputs', 'target']),
60+
dict(
61+
type='Collect',
62+
keys=['inputs', 'target'],
63+
meta_keys=['inputs_path', 'target_path', 'key'])
64+
]
65+
66+
valid_pipeline = [
67+
dict(
68+
type='LoadImageFromFileList',
69+
io_backend='disk',
70+
key='inputs',
71+
channel_order='rgb',
72+
backend='pillow'),
73+
dict(
74+
type='LoadImageFromFileList',
75+
io_backend='disk',
76+
key='target',
77+
channel_order='rgb',
78+
backend='pillow'),
79+
dict(type='RescaleToZeroOne', keys=['inputs', 'target']),
80+
dict(type='FramesToTensor', keys=['inputs', 'target']),
81+
dict(
82+
type='Collect',
83+
keys=['inputs', 'target'],
84+
meta_keys=['inputs_path', 'target_path', 'key'])
85+
]
86+
87+
demo_pipeline = [
88+
dict(
89+
type='LoadImageFromFileList',
90+
io_backend='disk',
91+
key='inputs',
92+
channel_order='rgb',
93+
backend='pillow'),
94+
dict(type='RescaleToZeroOne', keys=['inputs']),
95+
dict(type='FramesToTensor', keys=['inputs']),
96+
dict(type='Collect', keys=['inputs'], meta_keys=['inputs_path', 'key'])
97+
]
98+
99+
root_dir = 'data/vimeo90k'
100+
data = dict(
101+
workers_per_gpu=16,
102+
train_dataloader=dict(samples_per_gpu=4), # 8 gpu
103+
val_dataloader=dict(samples_per_gpu=1),
104+
test_dataloader=dict(samples_per_gpu=1),
105+
106+
# train
107+
train=dict(
108+
type=train_dataset_type,
109+
folder=f'{root_dir}/GT',
110+
ann_file=f'{root_dir}/sep_trainlist.txt',
111+
pipeline=train_pipeline,
112+
input_frames=[1, 3, 5, 7],
113+
target_frames=[4],
114+
test_mode=False),
115+
# val
116+
val=dict(
117+
type=train_dataset_type,
118+
folder=f'{root_dir}/GT',
119+
ann_file=f'{root_dir}/sep_testlist.txt',
120+
pipeline=valid_pipeline,
121+
input_frames=[1, 3, 5, 7],
122+
target_frames=[4],
123+
test_mode=True),
124+
# test
125+
test=dict(
126+
type=train_dataset_type,
127+
folder=f'{root_dir}/GT',
128+
ann_file=f'{root_dir}/sep_testlist.txt',
129+
pipeline=valid_pipeline,
130+
input_frames=[1, 3, 5, 7],
131+
target_frames=[4],
132+
test_mode=True),
133+
)
134+
135+
# optimizer
136+
optimizers = dict(generator=dict(type='Adam', lr=2e-4, betas=(0.9, 0.99)))
137+
138+
# learning policy
139+
total_iters = 1000000 # >=200*64612/64
140+
lr_config = dict(
141+
policy='Reduce',
142+
by_epoch=False,
143+
mode='max',
144+
val_metric='PSNR',
145+
epoch_base_valid=True, # Support epoch base valid in iter base runner.
146+
factor=0.5,
147+
patience=10,
148+
cooldown=20,
149+
verbose=True)
150+
151+
checkpoint_config = dict(interval=2020, save_optimizer=True, by_epoch=False)
152+
153+
evaluation = dict(interval=2020, save_image=False, gpu_collect=True)
154+
log_config = dict(
155+
interval=100,
156+
hooks=[
157+
dict(type='TextLoggerHook', by_epoch=False),
158+
dict(
159+
type='TensorboardLoggerHook',
160+
log_dir=f'work_dirs/{exp_name}/tb_log/',
161+
interval=100,
162+
ignore_last=False,
163+
reset_flag=False,
164+
by_epoch=False),
165+
])
166+
visual_config = None
167+
168+
# runtime settings
169+
dist_params = dict(backend='nccl')
170+
log_level = 'INFO'
171+
work_dir = f'./work_dirs/{exp_name}'
172+
load_from = None
173+
resume_from = None
174+
workflow = [('train', 1)]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
Collections:
2+
- Metadata:
3+
Architecture:
4+
- FLAVR
5+
Name: FLAVR
6+
Paper:
7+
- https://arxiv.org/pdf/2012.08512.pdf
8+
README: configs/video_interpolators/flavr/README.md
9+
Models:
10+
- Config: configs/video_interpolators/flavr/flavr_in4out1_g8b4_vimeo90k_septuplet.py
11+
In Collection: FLAVR
12+
Metadata:
13+
Training Data: VIMEO90K
14+
Name: flavr_in4out1_g8b4_vimeo90k_septuplet
15+
Results:
16+
- Dataset: VIMEO90K
17+
Metrics:
18+
Vimeo90k-triplet:
19+
PSNR: 36.334
20+
SSIM: 0.96015
21+
Task: Video_interpolators
22+
Weights: https://download.openmmlab.com/mmediting/video_interpolators/flavr/flavr_in4out1_g8b4_vimeo90k_septupli-c2468995.pth

model-index.yml

+1
Original file line numberDiff line numberDiff line change
@@ -26,4 +26,5 @@ Import:
2626
- configs/synthesizers/cyclegan/metafile.yml
2727
- configs/synthesizers/pix2pix/metafile.yml
2828
- configs/video_interpolators/cain/metafile.yml
29+
- configs/video_interpolators/flavr/metafile.yml
2930
- configs/video_interpolators/tof/metafile.yml

0 commit comments

Comments
 (0)