Pose estimation find the keypoints belong to the people in the image. There are two methods exist for pose estimation.
- Bottom-Up first finds the keypoints and associates them into different people in the image. (Generally faster and lower accuracy)
- Top-Down first detect people in the image and estimate the keypoints. (Generally computationally intensive but better accuracy)
This repo will only include top-down pose estimation models.
COCO-val with 56.4 Detector AP
Model | Backbone | Image Size | AP | AP50 | AP75 | Params (M) |
FLOPs (B) |
FPS | Weights |
---|---|---|---|---|---|---|---|---|---|
PoseHRNet | HRNet-w32 | 256x192 | 74.4 | 90.5 | 81.9 | 29 | 7 | 25 | download |
HRNet-w48 | 256x192 | 75.1 | 90.6 | 82.2 | 64 | 15 | 24 | download | |
SimDR | HRNet-w32 | 256x192 | 75.3 | - | - | 31 | 7 | 25 | download |
HRNet-w48 | 256x192 | 75.9 | 90.4 | 82.7 | 66 | 15 | 24 | download |
Note: FPS is tested on a GTX1660ti with one person per frame including pre-processing, model inference and post-processing. Both detection and pose models are in PyTorch FP32.
COCO-test with 60.9 Detector AP (click to expand)
Model | Backbone | Image Size | AP | AP50 | AP75 | Params (M) |
FLOPs (B) |
Weights |
---|---|---|---|---|---|---|---|---|
SimDR* | HRNet-w48 | 256x192 | 75.4 | 92.4 | 82.7 | 66 | 15 | download |
RLEPose | HRNet-w48 | 384x288 | 75.7 | 92.3 | 82.9 | - | - | - |
UDP+PSA | HRNet-w48 | 256x192 | 78.9 | 93.6 | 85.8 | 70 | 16 | - |
Download Backbone Models' Weights (click to expand)
Model | Weights |
---|---|
HRNet-w32 | download |
HRNet-w48 | download |
- torch >= 1.8.1
- torchvision >= 0.9.1
Other requirements can be installed with pip install -r requirements.txt
.
Clone the repository recursively:
$ git clone --recursive https://github.com/sithu31296/pose-estimation.git
- Download a YOLOv5m trained on CrowdHuman dataset from here. (The weights are from deepakcrk/yolov5-crowdhuman.)
- Download a pose estimation model's weights from the tables.
- Run the following command.
$ python infer.py --source TEST_SOURCE --det-model DET_MODEL_PATH --pose-model POSE_MODEL_PATH --img-size 640
Arguments:
source
: Testing sources- To test an image, set to image file path. (For example,
assests/test.jpg
) - To test a folder containing images, set to folder name. (For example,
assests/
) - To test a video, set to video file path. (For example,
assests/video.mp4
) - To test with a webcam, set to
0
.
- To test an image, set to image file path. (For example,
det-model
: YOLOv5 model's weights pathpose-model
: Pose estimation model's weights path
Example inference results (image credit: [1, 2]):
- https://github.com/leoxiaobin/deep-high-resolution-net.pytorch
- https://github.com/ultralytics/yolov5
@article{WangSCJDZLMTWLX19,
title={Deep High-Resolution Representation Learning for Visual Recognition},
author={Jingdong Wang and Ke Sun and Tianheng Cheng and
Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and
Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
journal = {TPAMI}
year={2019}
}
@misc{li20212d,
title={Is 2D Heatmap Representation Even Necessary for Human Pose Estimation?},
author={Yanjie Li and Sen Yang and Shoukui Zhang and Zhicheng Wang and Wankou Yang and Shu-Tao Xia and Erjin Zhou},
year={2021},
eprint={2107.03332},
archivePrefix={arXiv},
primaryClass={cs.CV}
}