Introduce Keypoint Models: YoloXPose & RTMO #120

CuriousDolphin · 2025-07-10T14:30:25Z

Key Changes

✨ Introduce keypoints models

add RTMO-S/M/L-COCO keypoint pretrained model
example:

from focoos import ModelManager
from PIL import Image

im = "https://public.focoos.ai/samples/federer.jpg"
model = ModelManager.get("rtmo-s-coco")
detections = model.infer(im,annotate=True, threshold=0.5)
Image.fromarray(detections.image) # visualise or save annotated image

📷 Unified Inference API

Standardize infer Method Signatures

consistent infer() method across FocoosModel, InferModel, and RemoteModel with unified parameters: infer(image, threshold=0.5, annotate=False) and use unified image loader for infer methods (with also remote image support)
add default threshold to 0.5
Remove dependency on external annotate_image() function calls
Streamlined workflow: get detections and visual annotations in a single call

example torch and exported model:

from focoos import ModelManager, RuntimeType
from PIL import Image


im = "https://public.focoos.ai/samples/motogp.jpg" # remote image, can also be local path, numpy array, or PIL image
model = ModelManager.get("fai-detr-l-obj365")
detections = model.infer(im,annotate=True, threshold=0.5) # annotatate param 
# Image.fromarray(detections.image) # visualise or save annotated image

# export model
model = model.export(RuntimeType.ONNX_CUDA32)
res = model.infer(im, annotate=True, threshold=0.5)

Image.fromarray(detections.image) # visualise or save annotated image

example with remote inference:

from focoos import FocoosHUB
from PIL import Image

hub = FocoosHUB()
model_ref = "fai-detr-l-obj365" # use any of pretrained model on app.focoos.ai or your own model reference
remote_model = hub.get_remote_model(model_ref)

im = "https://public.focoos.ai/samples/federer.jpg"

detections = remote_model.infer(im,annotate=True, threshold=0.5)

Image.fromarray(detections.image) # visualise or save annotated image

Enhanced FocoosDetections Structure

add new image field: stores annotated results as base64 string or numpy array
migrated from Pydantic to pure Python dataclasses for better performance,
Improved serialization and memory usage
add new keypoints field
add pprint and print_infer methods to unify detections prints.

⌨️ CLI

add new CLI command: focoos gradio to launch a Gradio interface for image and video inference using Focoos pretrained models.

🕹️ Trainer

fix missing model preprocessing when amp=True (Automatic Mixed Precision) is enabled
add COSINE scheduler quadratic warmup
add KeypointEvaluator
enhance logging with additional info
Update Visualizer (preview hook) to save RGB images instead of BGR
Restore TensorBoard Hook

📖 ModelRegistry

model registry now support automatic loading json configs from registry folder instead of declare model configs manually

🏞️ Processor

add image_size into init instead of preprocess methods
improve image loader performance
add non-blocking image transfer
optimize preprocessor speed
add focoos palette to annotators

📖 Docs

add RTMO docs
update Readme, Docs and notebook with from focoos import x for all exported classes and functions instead of absolute path

- Renamed the internal activation function utility from `_get_activation_fn` to `get_activation_fn` for clarity and consistency. - Updated all references to the renamed function across the codebase to ensure proper functionality. - Enhanced the `get_activation_fn` function to accept a default activation parameter, improving flexibility in activation function selection.

github-actions · 2025-07-10T14:31:42Z

Tests	Skipped	Failures	Errors	Time
228	0 💤	0 ❌	0 🔥	20.063s ⏱️

- Introduced a new DarkNet backbone with configurable sizes and activation functions. - Implemented Bottleneck and C2f modules for efficient feature extraction. - Updated test cases to include DarkNet configurations, ensuring comprehensive testing across all backbone types. - Enhanced the backbone build function to support the new DarkNet model.

- Updated numpy to version 2.2.6 - Updated torch to version 2.7.1 - Updated onnx-related dependencies to versions 1.18.0 and 0.3.1 - Added DarkNet and DarkNetConfig to BackboneManager and ConfigBackboneManager respectively - Introduced tensorrt-cu12 and tensorrt-cu12-libs dependencies for enhanced GPU support These changes impact the model management system by expanding the available backbone options, thus improving the overall functionality and user experience.

- Replaced private attributes (_out_features, _out_feature_strides, _out_feature_channels) with public counterparts (out_features, out_feature_strides, out_feature_channels) across various backbone implementations. - Updated the output_shape method to utilize the new public attributes for consistency. - Introduced a new SPPF and C2f layer in the block module for enhanced feature processing. - Added a test case to validate the output_shape property across backbone types, ensuring correct shape specifications.

- Added YOLOXPoseConfig class for model configuration, including parameters for backbone, keypoints, and normalization. - Introduced KeypointCriterion class to handle various loss functions specific to keypoint detection, including Binary Cross Entropy, IoU, and OKS losses. - Developed YOLOXPose class for the model architecture, integrating backbone, pixel decoder, and head for keypoint prediction. - Created supporting data classes for keypoint targets and model outputs to streamline data handling. - Implemented utility functions for bounding box operations and non-maximum suppression to enhance model performance. - Established a comprehensive structure for the YOLOXPose model, improving modularity and maintainability.

- Added KEYPOINT detection type to the Task enum in ports.py. - Introduced keypoints attribute in FocoosDet class for keypoint detection outputs. - Updated Instances class to include areas for better instance representation. - Created RTMOConfig and RTMO classes for the new RTMO model architecture, integrating keypoint detection features. - Implemented utility classes and methods for handling keypoint targets and outputs, improving data management. - Enhanced YOLOXPose model with additional configurations for keypoint processing, ensuring compatibility with the new RTMO model.

Improve user experience by integrating keypoint detection capabilities into the dataset management system. This allows users to work with keypoint tasks seamlessly, enhancing the overall functionality of the framework. Key changes include: - Added `DatasetSplitType` to the `__init__.py` for better dataset organization. - Introduced `YOLOXPOSE` as a new model type in the `ModelFamily` enum. - Enhanced `DatasetMetadata` to support keypoint tasks by validating `thing_classes`. - Implemented `KeypointDatasetMapper` for handling keypoint data transformations. - Updated the dataset catalog to include a new dataset for COCO keypoints. - Modified dataset loading logic to accommodate keypoint tasks. These changes impact the dataset loading and processing pipeline, ensuring that keypoint annotations are handled correctly and efficiently.

…oseProcessor for improved clarity

… backbone

… for clarity on feature processing flow

Improve the efficiency of image tensor conversion by introducing a new method `get_torch_batch` that handles various input formats and allows for optional resizing. This change enhances memory management and ensures consistent tensor shapes for processing. Key changes: - Replace `get_tensors` with `get_torch_batch` across multiple processor classes. - Implement optional resizing of images to a target size for better memory efficiency. Impact: These changes affect all processor classes that handle image inputs, ensuring they can now efficiently process images of varying sizes while maintaining performance. Technical details: The new `get_torch_batch` method standardizes input handling and includes resizing capabilities. It uses `torch.nn.functional.interpolate` for resizing, which optimizes memory usage during batch processing.

- Added support for the CSP backbone in the RTMO model, replacing the previous C2fDarkNet configuration. - Updated model configuration to include new parameters for transformer and CSP layers. - Introduced a new RTMOProcessor for improved image processing and tensor handling. - Added comprehensive tests for the RTMO model to ensure functionality across various configurations. These changes enhance the model's performance and flexibility, allowing for better feature extraction and processing capabilities.

…rations - Removed C2fDarkNet and related configurations from the BackboneManager and ConfigBackboneManager. - Updated keypoint training augmentations to enable cropping and scaling. - Added a new weights URI for the RTMO model configuration. - Deleted obsolete RTMO model files and refactored the model structure for improved clarity and maintainability. These changes streamline the model architecture and enhance the configuration management for better performance and usability.

…plementation - Updated CSP references in BackboneManager and ConfigBackboneManager to use "csp_darknet" for consistency. - Introduced a new CSPDarknet class with comprehensive architecture and configuration settings. - Adjusted imports in the RTMO model files to reflect the new CSPDarknet structure. - Updated test configurations to include the new CSPDarknet model. These changes enhance clarity and maintainability of the backbone architecture, ensuring a more coherent integration of the CSPDarknet model.

- Enhanced the model registry with new RTMO configurations for large, medium, and small models, including updated metrics and weights URIs.

…ations - Adjusted the NMS threshold from 0.7 to 0.65 in the large model configuration. - Updated the score threshold from 0.1 to 0.01 in both medium and small model configurations. - Enhanced the decoder.py to ensure consistent tensor concatenation using `dim=1` instead of `axis=1`. - Added a new method in the decoder for export mode, improving deployment capabilities. - Refined the RTMOHead and DCC classes with clearer documentation and improved parameter handling. These changes aim to make model exportable in ONXX format

- Enhanced the RTMO model documentation to reflect the new HybridEncoder design. - Updated model configurations to replace the default backbone with CSPConfig and added new parameters for transformer layers. - Expanded the available RTMO models, including updated metrics and FPS for small, medium, and large configurations. - Improved example usage in documentation for clarity on model inference and configuration. These changes aim to enhance the model's performance and usability in multi-person pose estimation tasks.

…urations - Added detailed latency metrics for RTMO models, including FPS, execution engines, and performance statistics for different configurations. - Introduced a new `keypoints_threshold` parameter in the inference method to filter keypoints based on confidence scores. - Updated the `.gitignore` file to exclude additional debug and IDE files.

- Removed YOLOXPOSE from the ModelFamily enum. - Added task and model family validation in the new_model method, with warnings for unsupported models. - Updated the FocoosTrainer to handle cases where model creation fails, ensuring the sync to hub is disabled if the model is not created. - Expanded the test suite to include additional RTMO model configurations. These changes improve the robustness of model handling in the Focoos platform and enhance user feedback during model creation.

- Deleted the SPPF, C2f, and Bottleneck classes from block.py to streamline the architecture. - Removed the ConvNormLayerDarknet class from conv.py, simplifying the convolutional layer structure. These changes aim to enhance code clarity and maintainability by eliminating unused components.

andry2327 and others added 2 commits July 10, 2025 14:56

This was linked to issues Jul 11, 2025

[models] implement keypoint models RTMO & YoloXpose #119

Closed

[metrics] ensure to remove NaN values #109

Closed

CuriousDolphin assigned andry2327 Jul 11, 2025

andry2327 added 2 commits July 11, 2025 09:16

CuriousDolphin changed the title ~~Introduce Pose Estimation Model: RTMO~~ Introduce Pose Estimation Models: YoloXPose & RTMO Jul 11, 2025

CuriousDolphin linked an issue Jul 14, 2025 that may be closed by this pull request

[postprocessing][FocoosDet] implement pose estimation output #121

Closed

CuriousDolphin removed a link to an issue Jul 14, 2025

[metrics] ensure to remove NaN values #109

Closed

CuriousDolphin changed the title ~~Introduce Pose Estimation Models: YoloXPose & RTMO~~ Introduce Keypoint Models: YoloXPose & RTMO Jul 14, 2025

andry2327 and others added 17 commits July 16, 2025 09:13

feat: implement took decorator

bf35aa9

fix(data): fix keypoint data handling

1906e91

feat(processor): add yoloxpose processor

f5612d6

fix typo

9e95eab

Merge branch 'main' into feat/implement-rtmo

0baa2b3

fix: a lot of fixes

f3d6f7a

feat: add yoloxpose model to registry

4532cd7

feat: add yoloxpose model to test suite

8525748

feat: update yoloxpose model to use small size

9f09dce

feat: update .gitignore to include notebooks directory

1b5e8e2

feat: replace custom NMS function with torchvision's NMS

bcb922f

refactor: simplify Keypoints constructor by removing numpy array support

49a6d87

refactor: remove resolution attribute from YOLOXPoseConfig and YOLOXP…

06b9c15

…oseProcessor for improved clarity

feat: add support for loading pretrained weights in DarkNet model

f4056b8

refactor: update YOLOXPoseConfig to use pretrained weights in DarkNet…

5a466df

… backbone

CuriousDolphin and others added 14 commits August 1, 2025 14:01

docs: update navigation and add rtmo docs

71a3d0f

feat(cli): add gradio to CLI

554ad35

fix(assets): move assets inside src folder for portability

0caa542

docs(cli): add gradio command to CLI documentation

1919dc2

feat(output): improve detection output formatting and clarity

8a2e917

feat(vision): use ApiClient to download remote image and add cache dir

0372806

feat(latency): add im_load field

a17eed8

feat(vision): optimize image_loader by use torchvision

dca05b0

fix(annotate_frame): handle empty detections

673ee22

feat(latency): rename im_load to imload for consistency

9da2404

feat(backbone): beautify c2fdarknet

43078ed

fix typo

0102736

docs(modeling): add detailed comments to Encoder and YoloNeck classes…

861e134

… for clarity on feature processing flow

feat(processors): add non-blocking image transfer

6a7fbf2

CuriousDolphin linked an issue Aug 6, 2025 that may be closed by this pull request

[processor] optimize processors by adding non block image transfer #132

Closed

CuriousDolphin and others added 13 commits August 6, 2025 14:32

style(annotator): add focoos palette

7ae7056

refactor(model_registry): update model type and streamline imports

997b0f0

feat(models): ultimated the configurations for rtmo models.

cb4360b

- Enhanced the model registry with new RTMO configurations for large, medium, and small models, including updated metrics and weights URIs.

Merge branch 'main' into feat/implement-rtmo

05596f8

andry2327 merged commit 3c8f337 into main Aug 25, 2025
10 checks passed

andry2327 deleted the feat/implement-rtmo branch August 25, 2025 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Keypoint Models: YoloXPose & RTMO #120

Introduce Keypoint Models: YoloXPose & RTMO #120

Uh oh!

CuriousDolphin commented Jul 10, 2025 •

edited by fcdl94

Loading

Uh oh!

github-actions bot commented Jul 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Introduce Keypoint Models: YoloXPose & RTMO #120

Introduce Keypoint Models: YoloXPose & RTMO #120

Uh oh!

Conversation

CuriousDolphin commented Jul 10, 2025 • edited by fcdl94 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

✨ Introduce keypoints models

📷 Unified Inference API

Standardize infer Method Signatures

Enhanced FocoosDetections Structure

⌨️ CLI

🕹️ Trainer

📖 ModelRegistry

🏞️ Processor

📖 Docs

Uh oh!

github-actions bot commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CuriousDolphin commented Jul 10, 2025 •

edited by fcdl94

Loading

github-actions bot commented Jul 10, 2025 •

edited

Loading