Skip to content

Conversation

@CuriousDolphin
Copy link
Member

@CuriousDolphin CuriousDolphin commented Jul 10, 2025

  • update dependencies
  • refactor NN
  • add Darknet Backbone
  • update backbone test
  • modelling YoloXPose
  • modelliing RTMO
  • define keypoint output
  • add Task.KEYPOINT
  • add yoloxpose processor
    • preprocess
    • postprocess
    • export postprocess
  • add RTMO processor
    • preprocess
    • postprocess
    • export postprocess
  • add YoloXPose to registry
  • rename loss names for both models with coherence with wandb
  • check TORCHSCRIPT export
  • handle skeleton on keypoint annotator
  • add RTMO (s) to registry
  • test yoloxpose train on coco-> loss dead with no apparent motivation
  • find a fix for flip_map (maybe avoid to do flip augmentations in keypoint models)
  • take skeleton from dataset do model_config
  • test rtmo train on coco
  • test rtmo on generic keypoint dataset
  • test rtmo export and inference
  • update visualizer
  • upload RTMO pretrained weights on public repo
  • add default keypoint augmentation
  • update documentation and examples

Key Changes

✨ Introduce keypoints models

  • add RTMO-S/M/L-COCO keypoint pretrained model
    example:
from focoos import ModelManager
from PIL import Image

im = "https://public.focoos.ai/samples/federer.jpg"
model = ModelManager.get("rtmo-s-coco")
detections = model.infer(im,annotate=True, threshold=0.5)
Image.fromarray(detections.image) # visualise or save annotated image

📷 Unified Inference API

Standardize infer Method Signatures

  • consistent infer() method across FocoosModel, InferModel, and RemoteModel with unified parameters: infer(image, threshold=0.5, annotate=False) and use unified image loader for infer methods (with also remote image support)
  • add default threshold to 0.5
  • Remove dependency on external annotate_image() function calls
  • Streamlined workflow: get detections and visual annotations in a single call

example torch and exported model:

from focoos import ModelManager, RuntimeType
from PIL import Image


im = "https://public.focoos.ai/samples/motogp.jpg" # remote image, can also be local path, numpy array, or PIL image
model = ModelManager.get("fai-detr-l-obj365")
detections = model.infer(im,annotate=True, threshold=0.5) # annotatate param 
# Image.fromarray(detections.image) # visualise or save annotated image

# export model
model = model.export(RuntimeType.ONNX_CUDA32)
res = model.infer(im, annotate=True, threshold=0.5)

Image.fromarray(detections.image) # visualise or save annotated image

example with remote inference:

from focoos import FocoosHUB
from PIL import Image

hub = FocoosHUB()
model_ref = "fai-detr-l-obj365" # use any of pretrained model on app.focoos.ai or your own model reference
remote_model = hub.get_remote_model(model_ref)

im = "https://public.focoos.ai/samples/federer.jpg"

detections = remote_model.infer(im,annotate=True, threshold=0.5)

Image.fromarray(detections.image) # visualise or save annotated image

Enhanced FocoosDetections Structure

  • add new image field: stores annotated results as base64 string or numpy array
  • migrated from Pydantic to pure Python dataclasses for better performance,
    Improved serialization and memory usage
  • add new keypoints field
  • add pprint and print_infer methods to unify detections prints.

⌨️ CLI

  • add new CLI command: focoos gradio to launch a Gradio interface for image and video inference using Focoos pretrained models.

🕹️ Trainer

  • fix missing model preprocessing when amp=True (Automatic Mixed Precision) is enabled
  • add COSINE scheduler quadratic warmup
  • add KeypointEvaluator
  • enhance logging with additional info
  • Update Visualizer (preview hook) to save RGB images instead of BGR
  • Restore TensorBoard Hook

📖 ModelRegistry

  • model registry now support automatic loading json configs from registry folder instead of declare model configs manually

🏞️ Processor

  • add image_size into init instead of preprocess methods
  • improve image loader performance
  • add non-blocking image transfer
  • optimize preprocessor speed
  • add focoos palette to annotators

📖 Docs

  • add RTMO docs
  • update Readme, Docs and notebook with from focoos import x for all exported classes and functions instead of absolute path

- Renamed the internal activation function utility from `_get_activation_fn` to `get_activation_fn` for clarity and consistency.
- Updated all references to the renamed function across the codebase to ensure proper functionality.
- Enhanced the `get_activation_fn` function to accept a default activation parameter, improving flexibility in activation function selection.
@github-actions
Copy link

github-actions bot commented Jul 10, 2025

Coverage

Tests Skipped Failures Errors Time
228 0 💤 0 ❌ 0 🔥 20.063s ⏱️

andry2327 and others added 2 commits July 10, 2025 14:56
- Introduced a new DarkNet backbone with configurable sizes and activation functions.
- Implemented Bottleneck and C2f modules for efficient feature extraction.
- Updated test cases to include DarkNet configurations, ensuring comprehensive testing across all backbone types.
- Enhanced the backbone build function to support the new DarkNet model.
- Updated numpy to version 2.2.6
- Updated torch to version 2.7.1
- Updated onnx-related dependencies to versions 1.18.0 and 0.3.1
- Added DarkNet and DarkNetConfig to BackboneManager and ConfigBackboneManager respectively
- Introduced tensorrt-cu12 and tensorrt-cu12-libs dependencies for enhanced GPU support

These changes impact the model management system by expanding the available backbone options, thus improving the overall functionality and user experience.
- Replaced private attributes (_out_features, _out_feature_strides, _out_feature_channels) with public counterparts (out_features, out_feature_strides, out_feature_channels) across various backbone implementations.
- Updated the output_shape method to utilize the new public attributes for consistency.
- Introduced a new SPPF and C2f layer in the block module for enhanced feature processing.
- Added a test case to validate the output_shape property across backbone types, ensuring correct shape specifications.
- Added YOLOXPoseConfig class for model configuration, including parameters for backbone, keypoints, and normalization.
- Introduced KeypointCriterion class to handle various loss functions specific to keypoint detection, including Binary Cross Entropy, IoU, and OKS losses.
- Developed YOLOXPose class for the model architecture, integrating backbone, pixel decoder, and head for keypoint prediction.
- Created supporting data classes for keypoint targets and model outputs to streamline data handling.
- Implemented utility functions for bounding box operations and non-maximum suppression to enhance model performance.
- Established a comprehensive structure for the YOLOXPose model, improving modularity and maintainability.
@CuriousDolphin CuriousDolphin changed the title Introduce Pose Estimation Model: RTMO Introduce Pose Estimation Models: YoloXPose & RTMO Jul 11, 2025
@CuriousDolphin CuriousDolphin linked an issue Jul 14, 2025 that may be closed by this pull request
@CuriousDolphin CuriousDolphin changed the title Introduce Pose Estimation Models: YoloXPose & RTMO Introduce Keypoint Models: YoloXPose & RTMO Jul 14, 2025
andry2327 and others added 17 commits July 16, 2025 09:13
- Added KEYPOINT detection type to the Task enum in ports.py.
- Introduced keypoints attribute in FocoosDet class for keypoint detection outputs.
- Updated Instances class to include areas for better instance representation.
- Created RTMOConfig and RTMO classes for the new RTMO model architecture, integrating keypoint detection features.
- Implemented utility classes and methods for handling keypoint targets and outputs, improving data management.
- Enhanced YOLOXPose model with additional configurations for keypoint processing, ensuring compatibility with the new RTMO model.
Improve user experience by integrating keypoint detection capabilities into the dataset management system. This allows users to work with keypoint tasks seamlessly, enhancing the overall functionality of the framework.

Key changes include:
- Added `DatasetSplitType` to the `__init__.py` for better dataset organization.
- Introduced `YOLOXPOSE` as a new model type in the `ModelFamily` enum.
- Enhanced `DatasetMetadata` to support keypoint tasks by validating `thing_classes`.
- Implemented `KeypointDatasetMapper` for handling keypoint data transformations.
- Updated the dataset catalog to include a new dataset for COCO keypoints.
- Modified dataset loading logic to accommodate keypoint tasks.

These changes impact the dataset loading and processing pipeline, ensuring that keypoint annotations are handled correctly and efficiently.
CuriousDolphin and others added 13 commits August 6, 2025 14:32
Improve the efficiency of image tensor conversion by introducing a new method `get_torch_batch` that handles various input formats and allows for optional resizing. This change enhances memory management and ensures consistent tensor shapes for processing.

Key changes:
- Replace `get_tensors` with `get_torch_batch` across multiple processor classes.
- Implement optional resizing of images to a target size for better memory efficiency.

Impact:
These changes affect all processor classes that handle image inputs, ensuring they can now efficiently process images of varying sizes while maintaining performance.

Technical details:
The new `get_torch_batch` method standardizes input handling and includes resizing capabilities. It uses `torch.nn.functional.interpolate` for resizing, which optimizes memory usage during batch processing.
- Added support for the CSP backbone in the RTMO model, replacing the previous C2fDarkNet configuration.
- Updated model configuration to include new parameters for transformer and CSP layers.
- Introduced a new RTMOProcessor for improved image processing and tensor handling.
- Added comprehensive tests for the RTMO model to ensure functionality across various configurations.

These changes enhance the model's performance and flexibility, allowing for better feature extraction and processing capabilities.
…rations

- Removed C2fDarkNet and related configurations from the BackboneManager and ConfigBackboneManager.
- Updated keypoint training augmentations to enable cropping and scaling.
- Added a new weights URI for the RTMO model configuration.
- Deleted obsolete RTMO model files and refactored the model structure for improved clarity and maintainability.

These changes streamline the model architecture and enhance the configuration management for better performance and usability.
…plementation

- Updated CSP references in BackboneManager and ConfigBackboneManager to use "csp_darknet" for consistency.
- Introduced a new CSPDarknet class with comprehensive architecture and configuration settings.
- Adjusted imports in the RTMO model files to reflect the new CSPDarknet structure.
- Updated test configurations to include the new CSPDarknet model.

These changes enhance clarity and maintainability of the backbone architecture, ensuring a more coherent integration of the CSPDarknet model.
- Enhanced the model registry with new RTMO configurations for large, medium, and small models, including updated metrics and weights URIs.
…ations

- Adjusted the NMS threshold from 0.7 to 0.65 in the large model configuration.
- Updated the score threshold from 0.1 to 0.01 in both medium and small model configurations.
- Enhanced the decoder.py to ensure consistent tensor concatenation using `dim=1` instead of `axis=1`.
- Added a new method in the decoder for export mode, improving deployment capabilities.
- Refined the RTMOHead and DCC classes with clearer documentation and improved parameter handling.

These changes aim to make model exportable in ONXX format
- Enhanced the RTMO model documentation to reflect the new HybridEncoder design.
- Updated model configurations to replace the default backbone with CSPConfig and added new parameters for transformer layers.
- Expanded the available RTMO models, including updated metrics and FPS for small, medium, and large configurations.
- Improved example usage in documentation for clarity on model inference and configuration.

These changes aim to enhance the model's performance and usability in multi-person pose estimation tasks.
…urations

- Added detailed latency metrics for RTMO models, including FPS, execution engines, and performance statistics for different configurations.
- Introduced a new `keypoints_threshold` parameter in the inference method to filter keypoints based on confidence scores.
- Updated the `.gitignore` file to exclude additional debug and IDE files.
- Removed YOLOXPOSE from the ModelFamily enum.
- Added task and model family validation in the new_model method, with warnings for unsupported models.
- Updated the FocoosTrainer to handle cases where model creation fails, ensuring the sync to hub is disabled if the model is not created.
- Expanded the test suite to include additional RTMO model configurations.

These changes improve the robustness of model handling in the Focoos platform and enhance user feedback during model creation.
- Deleted the SPPF, C2f, and Bottleneck classes from block.py to streamline the architecture.
- Removed the ConvNormLayerDarknet class from conv.py, simplifying the convolutional layer structure.

These changes aim to enhance code clarity and maintainability by eliminating unused components.
@andry2327 andry2327 merged commit 3c8f337 into main Aug 25, 2025
10 checks passed
@andry2327 andry2327 deleted the feat/implement-rtmo branch August 25, 2025 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment