Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant Variations in Training Results with Same Dataset and Parameters #13341

Open
1 task done
timiil opened this issue Oct 3, 2024 · 2 comments
Open
1 task done
Labels
question Further information is requested

Comments

@timiil
Copy link

timiil commented Oct 3, 2024

Search before asking

Question

Hi everyone,

We’ve encountered a noticeable discrepancy in the performance metrics when training the same model (yolov8n.pt) on the same dataset but with different hardware and similar training parameters. The results, specifically the mAP (50-95), vary significantly across different setups.

Base Model: yolov8n.pt

Training Parameters:

No. Hardware Epochs Batch Size mAP (50-95)
1 A6000(48GB vram) 100 16 0.961
2 4090(24GB vram) 100 12 0.93
3 4090(24GB vram) 150 12 0.92
4 L20(48GB vram) 100 16 0.976

We’ve also tried enabling or disabling coslr, but it seems to have little to no effect on the outcome.

Could anyone shed light on what might be causing this inconsistency? Additionally, what strategies could we adopt to achieve better performance on more limited hardware setups?

Thank you in advance for your help!

Additional

No response

@timiil timiil added the question Further information is requested label Oct 3, 2024
@UltralyticsAssistant
Copy link
Member

UltralyticsAssistant commented Oct 3, 2024

👋 Hello @timiil, thank you for bringing this to our attention! 🚀 This is an automated response to help guide you, and one of our Ultralytics engineers will assist you soon.

Please ensure you are following our ⭐️ Tutorials for accurate setup, including checking out our Custom Data Training and Tips for Best Training Results.

Since you are experiencing variations in results, could you provide a minimum reproducible example to help us better understand and debug the issue? A consistent setup between different hardware is crucial, and there might be nuances that the example could highlight.

For verifying the setup, please ensure:

Requirements

Python>=3.8.0 with all requirements.txt installed, including PyTorch>=1.8. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 can be run in various environments with all dependencies preinstalled:

For more insights on your question, additional details like dataset image examples and training logs would be helpful.

We also invite you to explore our latest model - YOLOv8 🚀, which might offer enhanced performance for your tasks.

Thank you for your patience and contribution! 😊

@pderrenger
Copy link
Member

@timiil variations in training results can often be attributed to differences in hardware architecture, which may affect computation precision and optimization. To mitigate these discrepancies, ensure consistent software environments across setups, including CUDA and PyTorch versions. Additionally, consider using mixed precision training with --amp to optimize performance on limited hardware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants