Description
Feature type?
Algorithm request
A proposal draft (if any)
The proposal is to extend the quantization support in MMRAZOR to integrate MMPose quantization. This will involve expanding TensorRT and NNCF quantization support to MMPose models. The key challenge is that certain components like RTMOHead
used in real-time multi-object (RTMO) tasks cannot be directly converted into an FXGraph
, which is required for quantization. Addressing this would require modifying the model architecture or implementing custom layers that are compatible with FXGraph-based quantization frameworks.
Inspiration can be drawn from a similar effort done to support quantization in MMDET using MMRAZOR, as seen here. This integration will ensure that MMPose models can be efficiently deployed on edge devices with lower precision (e.g., INT8), reducing inference latency and memory footprint while maintaining high accuracy.
To ensure the quantization process yields tangible benefits, performance benchmarks need to be established to demonstrate gains in model size, inference speed, and accuracy post-quantization.
The quantized model must be deployed on NNCF (CPU) and TensoRT (GPU) engines.
Key issues:
- Some components like
RTMOHead
cannot be directly converted intoFXGraph
. - Model-specific layers or operations may need custom handling for quantization.
Additional context
Quantization support for MMPose is crucial for applications that require real-time pose estimation on low-power devices, such as mobile apps, robotics, and AR/VR systems. The goal is to enhance the usability of MMPose models in these scenarios by providing lightweight, high-performance models through MMRAZOR's quantization workflow. Benchmarking will include a comparison of model accuracy, size, and speed on various edge devices, showing the performance gains of the quantized models.