Skip to content

Conversation

glenn-jocher
Copy link
Member

@glenn-jocher glenn-jocher commented Oct 25, 2021

Fix for #5160 (comment)

πŸ› οΈ PR Summary

Made with ❀️ by Ultralytics Actions

🌟 Summary

Improved multi-GPU training support by ensuring model parameter scaling accounts for the wrapped model.

πŸ“Š Key Changes

  • Modified the retrieval of the nl (number of detection layers) to use de_parallel function when the model is in Distributed Data Parallel (DDP) mode.

🎯 Purpose & Impact

  • Purpose: The change ensures that when a model is being used across multiple GPUs, the detection layers count is correctly retrieved even when the model is wrapped for parallel processing.
  • Impact: This improvement could lead to more accurate scaling of hyperparameters (hyp) during multi-GPU training, which enhances model performance and training stability. Users employing DDP will benefit from accurate hyperparameters adjustments irrespective of the number of GPUs used. πŸš€

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Docker Multi-GPU DDP training hang on destroy_process_group() with wandb option 3
1 participant