Skip to content

Latest commit

 

History

History

rtmo

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation

RTMO is a one-stage pose estimation model that achieves performance comparable to RTMPose. It has the following key advantages:

  • Faster inference speed when multiple people are present - RTMO runs faster than RTMPose on images with more than 4 persons. This makes it well-suited for real-time multi-person pose estimation.
  • No dependency on human detectors - Since RTMO is a one-stage model, it does not rely on an auxiliary human detector. This simplifies the pipeline and deployment.

👉🏼 TRY RTMO NOW

python demo/inferencer_demo.py $IMAGE --pose2d rtmo --vis-out-dir vis_results

rtmlib demo

rtmlib provides simple and easy-to-use API for inference with RTMPose models.

  • Support OpenCV/ONNXRuntime/OpenVINO inference and does not require Pytorch or MMCV.
  • Super user-friendly API for inference and visualization.
  • Support both CPU and GPU inference.
  • Automatically download onnx models from OpenMMLab model zoo.
  • Support all series of RTMPose models (RTMPose, DWPose, RTMO, RTMW etc.)

📜 Introduction

Real-time multi-person pose estimation presents significant challenges in balancing speed and precision. While two-stage top-down methods slow down as the number of people in the image increases, existing one-stage methods often fail to simultaneously deliver high accuracy and real-time performance. This paper introduces RTMO, a one-stage pose estimation framework that seamlessly integrates coordinate classification by representing keypoints using dual 1-D heatmaps within the YOLO architecture, achieving accuracy comparable to top-down methods while maintaining high speed. We propose a dynamic coordinate classifier and a tailored loss function for heatmap learning, specifically designed to address the incompatibilities between coordinate classification and dense prediction models. RTMO outperforms state-of-the-art one-stage pose estimators, achieving 1.1% higher AP on COCO while operating about 9 times faster with the same backbone. Our largest model, RTMO-l, attains 74.8% AP on COCO val2017 and 141 FPS on a single V100 GPU, demonstrating its efficiency and accuracy.


Refer to our paper for more details.

🎉 News

  • 2023/12/13: The RTMO paper and models are released!

🗂️ Model Zoo

Results on COCO val2017 dataset

Model Train Set Latency (ms) AP AP50 AP75 AR AR50 Download
RTMO-s COCO 8.9 0.677 0.878 0.737 0.715 0.908 ckpt
RTMO-m COCO 12.4 0.709 0.890 0.778 0.747 0.920 ckpt
RTMO-l COCO 19.1 0.724 0.899 0.788 0.762 0.927 ckpt
RTMO-t body7 - 0.574 0.803 0.613 0.611 0.836 ckpt | onnx
RTMO-s body7 8.9 0.686 0.879 0.744 0.723 0.908 ckpt | onnx
RTMO-m body7 12.4 0.726 0.899 0.790 0.763 0.926 ckpt | onnx
RTMO-l body7 19.1 0.748 0.911 0.813 0.786 0.939 ckpt | onnx

Results on CrowdPose test dataset

Model Train Set AP AP50 AP75 AP (E) AP (M) AP (H) Download
RTMO-s CrowdPose 0.673 0.882 0.729 0.737 0.682 0.591 ckpt
RTMO-m CrowdPose 0.711 0.897 0.771 0.774 0.719 0.634 ckpt
RTMO-l CrowdPose 0.732 0.907 0.793 0.792 0.741 0.653 ckpt
RTMO-l body7 0.838 0.947 0.893 0.888 0.847 0.772 ckpt

🖥️ Train and Evaluation

Dataset Preparation

Please follow this instruction to prepare the training and testing datasets.

Train

Under the root directory of mmpose, run the following command to train models:

sh tools/dist_train.sh $CONFIG $NUM_GPUS --work-dir $WORK_DIR --amp
  • Automatic Mixed Precision (AMP) technique is used to reduce GPU memory consumption during training.

Evaluation

Under the root directory of mmpose, run the following command to evaluate models:

sh tools/dist_test.sh $CONFIG $PATH_TO_CHECKPOINT $NUM_GPUS

See here for more training and evaluation details.

🛞 Deployment

MMDeploy provides tools for easy deployment of RTMO models. [Install Now]

⭕ Notice:

  • PyTorch 1.12+ is required to export the ONNX model of RTMO!

  • MMDeploy v1.3.1+ is required to deploy RTMO.

ONNX Model Export

Under mmdeploy root, run:

python tools/deploy.py \
    configs/mmpose/pose-detection_rtmo_onnxruntime_dynamic-640x640.py \
    $RTMO_CONFIG $RTMO_CHECKPOINT \
    $MMPOSE_ROOT/tests/data/coco/000000197388.jpg \
    --work-dir $WORK_DIR --dump-info \
    [--show] [--device $DEVICE]

TensorRT Model Export

Install TensorRT and build custom ops first.

Then under mmdeploy root, run:

python tools/deploy.py \
    configs/mmpose/pose-detection_rtmo_tensorrt-fp16_dynamic-640x640.py \
    $RTMO_CONFIG $RTMO_CHECKPOINT \
    $MMPOSE_ROOT/tests/data/coco/000000197388.jpg \
    --work-dir $WORK_DIR --dump-info \
    --device cuda:0 [--show]

This conversion takes several minutes. GPU is required for TensorRT model exportation.

⭐ Citation

If this project benefits your work, please kindly consider citing the original paper and MMPose:

@misc{lu2023rtmo,
      title={{RTMO}: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation},
      author={Peng Lu and Tao Jiang and Yining Li and Xiangtai Li and Kai Chen and Wenming Yang},
      year={2023},
      eprint={2312.07526},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{mmpose2020,
    title={OpenMMLab Pose Estimation Toolbox and Benchmark},
    author={MMPose Contributors},
    howpublished = {\url{https://github.com/open-mmlab/mmpose}},
    year={2020}
}