In the present repository you can find the source code of the Attention! A Lightweight 2D Hand Pose Estimation Approach paper
Vision based human pose estimation is an non-invasive technology for human-computer interaction (HCI). Direct use of the hand as an input device provides an attractive interaction method, with minimum need for specialized equipment, such as exoskeletons, gloves etc, but a camera and a processing platform. Various applications exploit algorithms which have the capability of estimating a hand's pose. Such applications include control of robotics systems, video games, computer-generated imagery (CGI) etc. In this letter, we present a novel Convolutional Neural Network architecture, reinforced with a Self-Attention module that it could be deployed on an embedded system, due to its lightweight nature, with just 1,9 Million parameters.
The presented architecture is based on the very successful idea of DenseNets. In a DenseNet, each layer obtains additional inputs from all preceding ones and propagates its own feature-maps to all subsequent layers, by a channel-wise concatenation.
Dense Block with growth rate k
We implement the inverted bottleneck block, enhanced by an Attention Augmented Convolutional layer, which output is added to the product of the Depthwise Separable Convolutional layer, as shown to the following figure.
AUC | EPE (px) | ||
---|---|---|---|
Mean | Median | ||
MPII+NZSL Dataset | |||
Zimm. et al. (ICCV 2017) | 0.17 | 59.4 | - |
Bouk. et al. (CVPR 2019) |
0.50 | 18.95 | - |
Ours |
0.55 | 16.1 | 11 |
LSMV Dataset | |||
Gomez-Donoso et al. | - | 10 | - |
Li et al. | - | 8 | - |
Ours | 0.89 | 3.3 | 2.5 |
Stereo Hand Pose Dataset | |||
Zimm et al. (ICCV 2017) | 0.81 | 5 | 5.5 |
Ours | 0.92 | 2.2 | 1.8 |
FreiHand Dataset | |||
Ours | 0.87 | 4 | 3.1 |
Arch 1 | Arch 2 | Arch 3 | Arch 4 | Arch 5 | Arch 6 | Arch 7 | Arch 8 | Arch 9 | Arch 10 | Arch 11 | Arch 12 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Attention module | * | - | - | * | * | - | * | - | - | * | - | * |
Pooling Method | Blur | Blur | Average | Average | Blur | Average | Average | Blur | Max | Max | Max | Max |
Activation Function | Mish | Mish | Mish | Mish | ReLU | ReLU | ReLU | ReLU | Mish | Mish | ReLU | ReLU |
If you find this paper useful in your research, please consider citing:
@ARTICLE{9171866,
author={Santavas, Nicholas and Kansizoglou, Ioannis and Bampis, Loukas and Karakasis, Evangelos and Gasteratos, Antonios},
journal={IEEE Sensors Journal},
title={Attention! A Lightweight 2D Hand Pose Estimation Approach},
year={2021},
volume={21},
number={10},
pages={11488-11496},
doi={10.1109/JSEN.2020.3018172}}