[Feature] Support SmoothNet (open-mmlab#1279)

* add smoothnet * refactor smothnet model and filter * Add unit test for SmoothNetFilter * support root-index * add docstring * add md and update ReadMe * update readme * allow targets share filter * remove devug code * Update smoothnet_h36m.md * fix lint * fix unittest Co-authored-by: ly015 <liyining0712@gmail.com>
wjkim81 · Apr 2, 2022 · bf98fea · bf98fea
1 parent 66e6fbe
commit bf98fea
Show file tree

Hide file tree

Showing 16 changed files with 389 additions and 10 deletions.
diff --git a/.dev_scripts/github/update_model_index.py b/.dev_scripts/github/update_model_index.py
@@ -334,7 +334,10 @@ def update_model_index():
 
 if __name__ == '__main__':
 
-    file_list = [fn for fn in sys.argv[1:] if osp.basename(fn) != 'README.md']
+    file_list = [
+        fn for fn in sys.argv[1:]
+        if osp.basename(fn) != 'README.md' and '_base_' not in fn
+    ]
 
     if not file_list:
         sys.exit(0)

diff --git a/README.md b/README.md
@@ -149,6 +149,7 @@ A summary can be found in the [Model Zoo](https://mmpose.readthedocs.io/en/lates
 * [x] [UDP](https://mmpose.readthedocs.io/en/latest/papers/techniques.html#udp-cvpr-2020) (CVPR'2020)
 * [x] [Albumentations](https://mmpose.readthedocs.io/en/latest/papers/techniques.html#albumentations-information-2020) (Information'2020)
 * [x] [SoftWingloss](https://mmpose.readthedocs.io/en/latest/papers/techniques.html#softwingloss-tip-2021) (TIP'2021)
+* [x] [SmoothNet](/configs/_base_/filters/smoothnet_h36m.md) (arXiv'2021)
 
 </details>
 

diff --git a/README_CN.md b/README_CN.md
@@ -148,6 +148,7 @@ MMPose 也提供了其他更详细的教程:
 * [x] [UDP](https://mmpose.readthedocs.io/zh_CN/latest/papers/techniques.html#udp-cvpr-2020) (CVPR'2020)
 * [x] [Albumentations](https://mmpose.readthedocs.io/zh_CN/latest/papers/techniques.html#albumentations-information-2020) (Information'2020)
 * [x] [SoftWingloss](https://mmpose.readthedocs.io/zh_CN/latest/papers/techniques.html#softwingloss-tip-2021) (TIP'2021)
+* [x] [SmoothNet](/configs/_base_/filters/smoothnet_h36m.md) (arXiv'2021)
 
 </details>
 

diff --git a/configs/_base_/filters/smoothnet_h36m.md b/configs/_base_/filters/smoothnet_h36m.md
@@ -0,0 +1,45 @@
+<!-- [OTHERS] -->
+
+<details>
+<summary align="right"><a href="https://arxiv.org/abs/2112.13715">SmoothNet (arXiv'2021)</a></summary>
+
+```bibtex
+@article{zeng2021smoothnet,
+  title={SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos},
+  author={Zeng, Ailing and Yang, Lei and Ju, Xuan and Li, Jiefeng and Wang, Jianyi and Xu, Qiang},
+  journal={arXiv preprint arXiv:2112.13715},
+  year={2021}
+}
+```
+
+</details>
+
+<!-- [DATASET] -->
+
+<details>
+<summary align="right"><a href="https://ieeexplore.ieee.org/abstract/document/6682899/">Human3.6M (TPAMI'2014)</a></summary>
+
+```bibtex
+@article{h36m_pami,
+  author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu,  Cristian},
+  title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
+  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
+  publisher = {IEEE Computer Society},
+  volume = {36},
+  number = {7},
+  pages = {1325-1339},
+  month = {jul},
+  year = {2014}
+}
+```
+
+</details>
+
+The following SmoothNet model checkpoints are available for pose smoothing. The table shows the the performance of [SimpleBaseline3D](https://arxiv.org/abs/1705.03098) on [Human3.6M](https://ieeexplore.ieee.org/abstract/document/6682899/) dataset without/with the SmoothNet plugin, and compares the SmoothNet models with 4 different window sizes (8, 16, 32 and 64). The metrics are MPJPE(mm), P-MEJPE(mm) and Acceleration Error (mm/frame^2).
+
+| Arch  | Window Size | MPJPE<sup>w/o</sup> | MPJPE<sup>w</sup> | P-MPJPE<sup>w/o</sup> | P-MPJPE<sup>w</sup> | AC. Err<sup>w/o</sup> | AC. Err<sup>w</sup> | ckpt |
+| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| [smoothnet_ws8](/configs/_base_/filters/smoothnet_t8_h36m.py) | 8 | 54.48 | 53.15 | 42.20 | 41.32 | 19.18 | 1.87 | [ckpt](https://download.openmmlab.com/mmpose/plugin/smoothnet/smoothnet_ws8_h36m.pth) |
+| [smoothnet_ws16](/configs/_base_/filters/smoothnet_t16_h36m.py) | 16 | 54.48 | 52.74 | 42.20 | 41.20 | 19.18 | 1.22 | [ckpt](https://download.openmmlab.com/mmpose/plugin/smoothnet/smoothnet_ws16_h36m.pth) |
+| [smoothnet_ws32](/configs/_base_/filters/smoothnet_t32_h36m.py) | 32 | 54.48 | 52.47 | 42.20 | 40.84 | 19.18 | 0.99 | [ckpt](https://download.openmmlab.com/mmpose/plugin/smoothnet/smoothnet_ws32_h36m.pth) |
+| [smoothnet_ws64](/configs/_base_/filters/smoothnet_t64_h36m.py) | 64 | 54.48 | 53.37 | 42.20 | 40.77 | 19.18 | 0.92 | [ckpt](https://download.openmmlab.com/mmpose/plugin/smoothnet/smoothnet_ws64_h36m.pth) |
diff --git a/configs/_base_/filters/smoothnet_t16_h36m.py b/configs/_base_/filters/smoothnet_t16_h36m.py
@@ -0,0 +1,13 @@
+# Config for SmoothNet filter trained on Human3.6M data with a window size of
+# 16. The model is trained using root-centered keypoint coordinates around the
+# pelvis (index:0), thus we set root_index=0 for the filter
+filter_cfg = dict(
+    type='SmoothNetFilter',
+    window_size=16,
+    output_size=16,
+    checkpoint='https://download.openmmlab.com/mmpose/plugin/smoothnet/'
+    'smoothnet_ws16_h36m.pth',
+    hidden_size=512,
+    res_hidden_size=256,
+    num_blocks=3,
+    root_index=0)
diff --git a/configs/_base_/filters/smoothnet_t32_h36m.py b/configs/_base_/filters/smoothnet_t32_h36m.py
@@ -0,0 +1,13 @@
+# Config for SmoothNet filter trained on Human3.6M data with a window size of
+# 32. The model is trained using root-centered keypoint coordinates around the
+# pelvis (index:0), thus we set root_index=0 for the filter
+filter_cfg = dict(
+    type='SmoothNetFilter',
+    window_size=32,
+    output_size=32,
+    checkpoint='https://download.openmmlab.com/mmpose/plugin/smoothnet/'
+    'smoothnet_ws32_h36m.pth',
+    hidden_size=512,
+    res_hidden_size=256,
+    num_blocks=3,
+    root_index=0)
diff --git a/configs/_base_/filters/smoothnet_t64_h36m.py b/configs/_base_/filters/smoothnet_t64_h36m.py
@@ -0,0 +1,13 @@
+# Config for SmoothNet filter trained on Human3.6M data with a window size of
+# 64. The model is trained using root-centered keypoint coordinates around the
+# pelvis (index:0), thus we set root_index=0 for the filter
+filter_cfg = dict(
+    type='SmoothNetFilter',
+    window_size=64,
+    output_size=64,
+    checkpoint='https://download.openmmlab.com/mmpose/plugin/smoothnet/'
+    'smoothnet_ws64_h36m.pth',
+    hidden_size=512,
+    res_hidden_size=256,
+    num_blocks=3,
+    root_index=0)
diff --git a/configs/_base_/filters/smoothnet_t8_h36m.py b/configs/_base_/filters/smoothnet_t8_h36m.py
@@ -0,0 +1,13 @@
+# Config for SmoothNet filter trained on Human3.6M data with a window size of
+# 8. The model is trained using root-centered keypoint coordinates around the
+# pelvis (index:0), thus we set root_index=0 for the filter
+filter_cfg = dict(
+    type='SmoothNetFilter',
+    window_size=8,
+    output_size=8,
+    checkpoint='https://download.openmmlab.com/mmpose/plugin/smoothnet/'
+    'smoothnet_ws8_h36m.pth',
+    hidden_size=512,
+    res_hidden_size=256,
+    num_blocks=3,
+    root_index=0)
diff --git a/demo/body3d_two_stage_video_demo.py b/demo/body3d_two_stage_video_demo.py
@@ -104,7 +104,7 @@ def main():
     parser.add_argument(
         '--out-video-root',
         type=str,
-        default=None,
+        default='vis_results',
         help='Root of the output video file. '
         'Default not saving the visualization video.')
     parser.add_argument(

diff --git a/docs/en/papers/techniques/smoothnet.md b/docs/en/papers/techniques/smoothnet.md
@@ -0,0 +1,29 @@
+# SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos
+
+<!-- [ALGORITHM] -->
+
+<details>
+<summary align="right"><a href="https://arxiv.org/abs/2112.13715">SmoothNet (arXiv'2021)</a></summary>
+
+```bibtex
+@article{zeng2021smoothnet,
+  title={SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos},
+  author={Zeng, Ailing and Yang, Lei and Ju, Xuan and Li, Jiefeng and Wang, Jianyi and Xu, Qiang},
+  journal={arXiv preprint arXiv:2112.13715},
+  year={2021}
+}
+```
+
+</details>
+
+## Abstract
+
+<!-- [ABSTRACT] -->
+
+When analyzing human motion videos, the output jitters from existing pose estimators are highly-unbalanced. Most frames only suffer from slight jitters, while significant jitters occur in those frames with occlusion or poor image quality. Such complex poses often persist in videos, leading to consecutive frames with poor estimation results and large jitters. Existing pose smoothing solutions based on temporal convolutional networks, recurrent neural networks, or low-pass filters cannot deal with such a long-term jitter problem without considering the significant and persistent errors within the jittering video segment. Motivated by the above observation, we propose a novel plug-and-play refinement network, namely SMOOTHNET, which can be attached to any existing pose estimators to improve its temporal smoothness and enhance its per-frame precision simultaneously. Especially, SMOOTHNET is a simple yet effective data-driven fully-connected network with large receptive fields, effectively mitigating the impact of long-term jitters with unreliable estimation results. We conduct extensive experiments on twelve backbone networks with seven datasets across 2D and 3D pose estimation, body recovery, and downstream tasks. Our results demonstrate that the proposed SMOOTHNET consistently outperforms existing solutions, especially on those clips with high errors and long-term jitters.
+
+<!-- [IMAGE] -->
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/15977946/161272519-0165c0e2-f0e8-45ad-88dd-ddb49fc81bda.png">
+</div>
diff --git a/mmpose/core/post_processing/smoother.py b/mmpose/core/post_processing/smoother.py
@@ -57,11 +57,20 @@ def __init__(self,
         if isinstance(filter_cfg, str):
             filter_cfg = Config.fromfile(filter_cfg).filter_cfg
         self.filter_cfg = filter_cfg
+        self._filter = build_filter(filter_cfg)
         self.keypoint_dim = keypoint_dim
         self.key = keypoint_key
-        self.padding_size = build_filter(filter_cfg).window_size - 1
+        self.padding_size = self._filter.window_size - 1
         self.history = {}
 
+    def _get_filter(self):
+        fltr = self._filter
+        if not fltr.shareable:
+            # If the filter is not shareable, build a new filter for the next
+            # requires
+            self._filter = build_filter(self.filter_cfg)
+        return fltr
+
     def _collate_pose(self, results):
         """Collate the pose results to pose sequences.
 
@@ -193,7 +202,7 @@ def smooth(self, results):
                     pose = np.concatenate((pose_history, pose), axis=0)
             else:
                 # For new target, build a new filter
-                pose_filter = build_filter(self.filter_cfg)
+                pose_filter = self._get_filter()
 
             # Update the history information
             if self.padding_size > 0:

diff --git a/mmpose/core/post_processing/temporal_filters/__init__.py b/mmpose/core/post_processing/temporal_filters/__init__.py
@@ -3,7 +3,9 @@
 from .gaussian_filter import GaussianFilter
 from .one_euro_filter import OneEuroFilter
 from .savizky_golay_filter import SavizkyGolayFilter
+from .smoothnet_filter import SmoothNetFilter
 
 __all__ = [
-    'build_filter', 'GaussianFilter', 'OneEuroFilter', 'SavizkyGolayFilter'
+    'build_filter', 'GaussianFilter', 'OneEuroFilter', 'SavizkyGolayFilter',
+    'SmoothNetFilter'
 ]
diff --git a/mmpose/core/post_processing/temporal_filters/filter.py b/mmpose/core/post_processing/temporal_filters/filter.py
@@ -11,13 +11,20 @@ class TemporalFilter(metaclass=ABCMeta):
         window_size (int): the size of the sliding window.
     """
 
+    # If the filter can be shared by multiple humans or targets
+    _shareable: bool = True
+
     def __init__(self, window_size=1):
         self._window_size = window_size
 
     @property
     def window_size(self):
         return self._window_size
 
+    @property
+    def shareable(self):
+        return self._shareable
+
     @abstractmethod
     def __call__(self, x):
         """Apply filter to a pose sequence.

diff --git a/mmpose/core/post_processing/temporal_filters/one_euro_filter.py b/mmpose/core/post_processing/temporal_filters/one_euro_filter.py
@@ -70,6 +70,9 @@ class OneEuroFilter(TemporalFilter):
             decreases speed lag.
     """
 
+    # Not shareable because the filter holds status of a specific target
+    _shareable: bool = False
+
     def __init__(self, min_cutoff=0.004, beta=0.7):
         # OneEuroFilter has Markov Property and maintains status variables
         # within the class, thus has a windows_size of 1