forked from open-mmlab/mmsegmentation
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Feature] Support SmoothNet (open-mmlab#1279)
* add smoothnet * refactor smothnet model and filter * Add unit test for SmoothNetFilter * support root-index * add docstring * add md and update ReadMe * update readme * allow targets share filter * remove devug code * Update smoothnet_h36m.md * fix lint * fix unittest Co-authored-by: ly015 <liyining0712@gmail.com>
- Loading branch information
1 parent
66e6fbe
commit bf98fea
Showing
16 changed files
with
389 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
<!-- [OTHERS] --> | ||
|
||
<details> | ||
<summary align="right"><a href="https://arxiv.org/abs/2112.13715">SmoothNet (arXiv'2021)</a></summary> | ||
|
||
```bibtex | ||
@article{zeng2021smoothnet, | ||
title={SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos}, | ||
author={Zeng, Ailing and Yang, Lei and Ju, Xuan and Li, Jiefeng and Wang, Jianyi and Xu, Qiang}, | ||
journal={arXiv preprint arXiv:2112.13715}, | ||
year={2021} | ||
} | ||
``` | ||
|
||
</details> | ||
|
||
<!-- [DATASET] --> | ||
|
||
<details> | ||
<summary align="right"><a href="https://ieeexplore.ieee.org/abstract/document/6682899/">Human3.6M (TPAMI'2014)</a></summary> | ||
|
||
```bibtex | ||
@article{h36m_pami, | ||
author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu, Cristian}, | ||
title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments}, | ||
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence}, | ||
publisher = {IEEE Computer Society}, | ||
volume = {36}, | ||
number = {7}, | ||
pages = {1325-1339}, | ||
month = {jul}, | ||
year = {2014} | ||
} | ||
``` | ||
|
||
</details> | ||
|
||
The following SmoothNet model checkpoints are available for pose smoothing. The table shows the the performance of [SimpleBaseline3D](https://arxiv.org/abs/1705.03098) on [Human3.6M](https://ieeexplore.ieee.org/abstract/document/6682899/) dataset without/with the SmoothNet plugin, and compares the SmoothNet models with 4 different window sizes (8, 16, 32 and 64). The metrics are MPJPE(mm), P-MEJPE(mm) and Acceleration Error (mm/frame^2). | ||
|
||
| Arch | Window Size | MPJPE<sup>w/o</sup> | MPJPE<sup>w</sup> | P-MPJPE<sup>w/o</sup> | P-MPJPE<sup>w</sup> | AC. Err<sup>w/o</sup> | AC. Err<sup>w</sup> | ckpt | | ||
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | ||
| [smoothnet_ws8](/configs/_base_/filters/smoothnet_t8_h36m.py) | 8 | 54.48 | 53.15 | 42.20 | 41.32 | 19.18 | 1.87 | [ckpt](https://download.openmmlab.com/mmpose/plugin/smoothnet/smoothnet_ws8_h36m.pth) | | ||
| [smoothnet_ws16](/configs/_base_/filters/smoothnet_t16_h36m.py) | 16 | 54.48 | 52.74 | 42.20 | 41.20 | 19.18 | 1.22 | [ckpt](https://download.openmmlab.com/mmpose/plugin/smoothnet/smoothnet_ws16_h36m.pth) | | ||
| [smoothnet_ws32](/configs/_base_/filters/smoothnet_t32_h36m.py) | 32 | 54.48 | 52.47 | 42.20 | 40.84 | 19.18 | 0.99 | [ckpt](https://download.openmmlab.com/mmpose/plugin/smoothnet/smoothnet_ws32_h36m.pth) | | ||
| [smoothnet_ws64](/configs/_base_/filters/smoothnet_t64_h36m.py) | 64 | 54.48 | 53.37 | 42.20 | 40.77 | 19.18 | 0.92 | [ckpt](https://download.openmmlab.com/mmpose/plugin/smoothnet/smoothnet_ws64_h36m.pth) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Config for SmoothNet filter trained on Human3.6M data with a window size of | ||
# 16. The model is trained using root-centered keypoint coordinates around the | ||
# pelvis (index:0), thus we set root_index=0 for the filter | ||
filter_cfg = dict( | ||
type='SmoothNetFilter', | ||
window_size=16, | ||
output_size=16, | ||
checkpoint='https://download.openmmlab.com/mmpose/plugin/smoothnet/' | ||
'smoothnet_ws16_h36m.pth', | ||
hidden_size=512, | ||
res_hidden_size=256, | ||
num_blocks=3, | ||
root_index=0) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Config for SmoothNet filter trained on Human3.6M data with a window size of | ||
# 32. The model is trained using root-centered keypoint coordinates around the | ||
# pelvis (index:0), thus we set root_index=0 for the filter | ||
filter_cfg = dict( | ||
type='SmoothNetFilter', | ||
window_size=32, | ||
output_size=32, | ||
checkpoint='https://download.openmmlab.com/mmpose/plugin/smoothnet/' | ||
'smoothnet_ws32_h36m.pth', | ||
hidden_size=512, | ||
res_hidden_size=256, | ||
num_blocks=3, | ||
root_index=0) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Config for SmoothNet filter trained on Human3.6M data with a window size of | ||
# 64. The model is trained using root-centered keypoint coordinates around the | ||
# pelvis (index:0), thus we set root_index=0 for the filter | ||
filter_cfg = dict( | ||
type='SmoothNetFilter', | ||
window_size=64, | ||
output_size=64, | ||
checkpoint='https://download.openmmlab.com/mmpose/plugin/smoothnet/' | ||
'smoothnet_ws64_h36m.pth', | ||
hidden_size=512, | ||
res_hidden_size=256, | ||
num_blocks=3, | ||
root_index=0) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Config for SmoothNet filter trained on Human3.6M data with a window size of | ||
# 8. The model is trained using root-centered keypoint coordinates around the | ||
# pelvis (index:0), thus we set root_index=0 for the filter | ||
filter_cfg = dict( | ||
type='SmoothNetFilter', | ||
window_size=8, | ||
output_size=8, | ||
checkpoint='https://download.openmmlab.com/mmpose/plugin/smoothnet/' | ||
'smoothnet_ws8_h36m.pth', | ||
hidden_size=512, | ||
res_hidden_size=256, | ||
num_blocks=3, | ||
root_index=0) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos | ||
|
||
<!-- [ALGORITHM] --> | ||
|
||
<details> | ||
<summary align="right"><a href="https://arxiv.org/abs/2112.13715">SmoothNet (arXiv'2021)</a></summary> | ||
|
||
```bibtex | ||
@article{zeng2021smoothnet, | ||
title={SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos}, | ||
author={Zeng, Ailing and Yang, Lei and Ju, Xuan and Li, Jiefeng and Wang, Jianyi and Xu, Qiang}, | ||
journal={arXiv preprint arXiv:2112.13715}, | ||
year={2021} | ||
} | ||
``` | ||
|
||
</details> | ||
|
||
## Abstract | ||
|
||
<!-- [ABSTRACT] --> | ||
|
||
When analyzing human motion videos, the output jitters from existing pose estimators are highly-unbalanced. Most frames only suffer from slight jitters, while significant jitters occur in those frames with occlusion or poor image quality. Such complex poses often persist in videos, leading to consecutive frames with poor estimation results and large jitters. Existing pose smoothing solutions based on temporal convolutional networks, recurrent neural networks, or low-pass filters cannot deal with such a long-term jitter problem without considering the significant and persistent errors within the jittering video segment. Motivated by the above observation, we propose a novel plug-and-play refinement network, namely SMOOTHNET, which can be attached to any existing pose estimators to improve its temporal smoothness and enhance its per-frame precision simultaneously. Especially, SMOOTHNET is a simple yet effective data-driven fully-connected network with large receptive fields, effectively mitigating the impact of long-term jitters with unreliable estimation results. We conduct extensive experiments on twelve backbone networks with seven datasets across 2D and 3D pose estimation, body recovery, and downstream tasks. Our results demonstrate that the proposed SMOOTHNET consistently outperforms existing solutions, especially on those clips with high errors and long-term jitters. | ||
|
||
<!-- [IMAGE] --> | ||
|
||
<div align=center> | ||
<img src="https://user-images.githubusercontent.com/15977946/161272519-0165c0e2-f0e8-45ad-88dd-ddb49fc81bda.png"> | ||
</div> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.