RTMO is a one-stage pose estimation model that achieves performance comparable to RTMPose. It has the following key advantages:
- Faster inference speed when multiple people are present - RTMO runs faster than RTMPose on images with more than 4 persons. This makes it well-suited for real-time multi-person pose estimation.
- No dependency on human detectors - Since RTMO is a one-stage model, it does not rely on an auxiliary human detector. This simplifies the pipeline and deployment.
👉🏼 TRY RTMO NOW
python demo/inferencer_demo.py $IMAGE --pose2d rtmo --vis-out-dir vis_results
rtmlib demo
rtmlib provides simple and easy-to-use API for inference with RTMPose models.
- Support OpenCV/ONNXRuntime/OpenVINO inference and does not require Pytorch or MMCV.
- Super user-friendly API for inference and visualization.
- Support both CPU and GPU inference.
- Automatically download onnx models from OpenMMLab model zoo.
- Support all series of RTMPose models (RTMPose, DWPose, RTMO, RTMW etc.)
Real-time multi-person pose estimation presents significant challenges in balancing speed and precision. While two-stage top-down methods slow down as the number of people in the image increases, existing one-stage methods often fail to simultaneously deliver high accuracy and real-time performance. This paper introduces RTMO, a one-stage pose estimation framework that seamlessly integrates coordinate classification by representing keypoints using dual 1-D heatmaps within the YOLO architecture, achieving accuracy comparable to top-down methods while maintaining high speed. We propose a dynamic coordinate classifier and a tailored loss function for heatmap learning, specifically designed to address the incompatibilities between coordinate classification and dense prediction models. RTMO outperforms state-of-the-art one-stage pose estimators, achieving 1.1% higher AP on COCO while operating about 9 times faster with the same backbone. Our largest model, RTMO-l, attains 74.8% AP on COCO val2017 and 141 FPS on a single V100 GPU, demonstrating its efficiency and accuracy.
Refer to our paper for more details.
2023/12/13
: The RTMO paper and models are released!
Model | Train Set | Latency (ms) | AP | AP50 | AP75 | AR | AR50 | Download |
---|---|---|---|---|---|---|---|---|
RTMO-s | COCO | 8.9 | 0.677 | 0.878 | 0.737 | 0.715 | 0.908 | ckpt |
RTMO-m | COCO | 12.4 | 0.709 | 0.890 | 0.778 | 0.747 | 0.920 | ckpt |
RTMO-l | COCO | 19.1 | 0.724 | 0.899 | 0.788 | 0.762 | 0.927 | ckpt |
RTMO-t | body7 | - | 0.574 | 0.803 | 0.613 | 0.611 | 0.836 | ckpt | onnx |
RTMO-s | body7 | 8.9 | 0.686 | 0.879 | 0.744 | 0.723 | 0.908 | ckpt | onnx |
RTMO-m | body7 | 12.4 | 0.726 | 0.899 | 0.790 | 0.763 | 0.926 | ckpt | onnx |
RTMO-l | body7 | 19.1 | 0.748 | 0.911 | 0.813 | 0.786 | 0.939 | ckpt | onnx |
- The latency is evaluated on a single V100 GPU with ONNXRuntime backend.
- "body7" refers to a combined dataset composed of AI Challenger, COCO, CrowdPose, Halpe, MPII, PoseTrack18 and sub-JHMDB.
Model | Train Set | AP | AP50 | AP75 | AP (E) | AP (M) | AP (H) | Download |
---|---|---|---|---|---|---|---|---|
RTMO-s | CrowdPose | 0.673 | 0.882 | 0.729 | 0.737 | 0.682 | 0.591 | ckpt |
RTMO-m | CrowdPose | 0.711 | 0.897 | 0.771 | 0.774 | 0.719 | 0.634 | ckpt |
RTMO-l | CrowdPose | 0.732 | 0.907 | 0.793 | 0.792 | 0.741 | 0.653 | ckpt |
RTMO-l | body7 | 0.838 | 0.947 | 0.893 | 0.888 | 0.847 | 0.772 | ckpt |
Please follow this instruction to prepare the training and testing datasets.
Under the root directory of mmpose, run the following command to train models:
sh tools/dist_train.sh $CONFIG $NUM_GPUS --work-dir $WORK_DIR --amp
- Automatic Mixed Precision (AMP) technique is used to reduce GPU memory consumption during training.
Under the root directory of mmpose, run the following command to evaluate models:
sh tools/dist_test.sh $CONFIG $PATH_TO_CHECKPOINT $NUM_GPUS
See here for more training and evaluation details.
MMDeploy provides tools for easy deployment of RTMO models. [Install Now]
⭕ Notice:
-
PyTorch 1.12+ is required to export the ONNX model of RTMO!
-
MMDeploy v1.3.1+ is required to deploy RTMO.
Under mmdeploy root, run:
python tools/deploy.py \
configs/mmpose/pose-detection_rtmo_onnxruntime_dynamic-640x640.py \
$RTMO_CONFIG $RTMO_CHECKPOINT \
$MMPOSE_ROOT/tests/data/coco/000000197388.jpg \
--work-dir $WORK_DIR --dump-info \
[--show] [--device $DEVICE]
Install TensorRT and build custom ops first.
Then under mmdeploy root, run:
python tools/deploy.py \
configs/mmpose/pose-detection_rtmo_tensorrt-fp16_dynamic-640x640.py \
$RTMO_CONFIG $RTMO_CHECKPOINT \
$MMPOSE_ROOT/tests/data/coco/000000197388.jpg \
--work-dir $WORK_DIR --dump-info \
--device cuda:0 [--show]
This conversion takes several minutes. GPU is required for TensorRT model exportation.
If this project benefits your work, please kindly consider citing the original paper and MMPose:
@misc{lu2023rtmo,
title={{RTMO}: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation},
author={Peng Lu and Tao Jiang and Yining Li and Xiangtai Li and Kai Chen and Wenming Yang},
year={2023},
eprint={2312.07526},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{mmpose2020,
title={OpenMMLab Pose Estimation Toolbox and Benchmark},
author={MMPose Contributors},
howpublished = {\url{https://github.com/open-mmlab/mmpose}},
year={2020}
}