Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is yolov5 training slow? #10254

Closed
1 task done
daigang896 opened this issue Nov 22, 2022 · 15 comments
Closed
1 task done

Why is yolov5 training slow? #10254

daigang896 opened this issue Nov 22, 2022 · 15 comments
Labels
question Further information is requested Stale Stale and schedule for closing soon

Comments

@daigang896
Copy link

Search before asking

Question

Why is yolov5 training slow? Use the yolov5m6 pretraining model. Does anyone have the same problem?

Additional

No response

@daigang896 daigang896 added the question Further information is requested label Nov 22, 2022
@glenn-jocher
Copy link
Member

glenn-jocher commented Nov 22, 2022

👋 Hello! Thanks for asking about training speed issues. YOLOv5 🚀 can be trained on CPU (slowest), single-GPU, or multi-GPU (fastest). If you would like to increase your training speed some options are:

  • Increase --batch-size
  • Reduce --img-size
  • Reduce model size, i.e. from YOLOv5x -> YOLOv5l -> YOLOv5m -> YOLOv5s
  • Train with multi-GPU DDP at larger --batch-size
  • Train on cached data: python train.py --cache (RAM caching) or --cache disk (disk caching)
  • Train on faster GPUs, i.e.: P100 -> V100 -> A100
  • Train on free GPU backends with up to 16GB of CUDA memory: Open In Colab Open In Kaggle

Good luck 🍀 and let us know if you have any other questions!

@daigang896
Copy link
Author

Hello.
Train on a NVIDIA RTX A6000 48G card. When the batchsize is increased, the speed of each iteration becomes slow. But the GPU memory is sufficient, but the speed can not be improved. What is the bottleneck?

@Laughing-q
Copy link
Member

@daigang896 It seems the bottleneck is data-loading as you've increased batch-size, maybe using more workers will help you.

yolov5/train.py

Line 457 in 7398d2d

parser.add_argument('--workers', type=int, default=8, help='max dataloader workers (per RANK in DDP mode)')

@glenn-jocher
Copy link
Member

@daigang896 also try --cache ram or --cache disk to reduce dataloading bottlenecks.

@daigang896
Copy link
Author

Thanks, I'll try it.

@daigang896
Copy link
Author

@glenn-jocher @Laughing-q
Hello,
More --workers values did not work. The set --workers==--batchsize=16 did not find that the training speed of each iteration was faster. The CPU utilization is low. I don't know what the problem is.

@daigang896
Copy link
Author

@glenn-jocher
Try -- cache ram found insufficient memory, try -- cache disk found no improvement in training speed.

@David-19940718
Copy link

David-19940718 commented Nov 25, 2022

@glenn-jocher Try -- cache ram found insufficient memory, try -- cache disk found no improvement in training speed.

For the most situation, follow the instructions by the author advise will be tackled.

In your case, I think that it may be caused by your machine, you can try another machine and repeate once time if supported.

Note that, it is very important to load the data into the memory, so, don't forget to add this line -- cache ram.

@daigang896
Copy link
Author

Hello.
At present, the training speed has been significantly improved and the problem has been solved by updating the card driver, cuda, cudnn and pytorch, and using yolov5 6.2 code.

@glenn-jocher
Copy link
Member

@daigang896 great!!

@github-actions
Copy link
Contributor

github-actions bot commented Dec 31, 2022

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Dec 31, 2022
@Robotatron
Copy link

I saw no difference in training time with --cache or without when using a SSD, interesting.
Also using a smaller image size (e.g. 160 with batch size of 960) was training SLOWER then using a bigger image size (e.g. 240 with bs of 221)

@github-actions github-actions bot removed the Stale Stale and schedule for closing soon label Jan 9, 2023
@bartlomiejgadzicki-digica

Hi there @glenn-jocher, do symlinks affect training speed? I keep multiple versions of my datasets and this way I can avoid storing the same images multiple times. Do you think it can be harmful in any way?

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2023

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Mar 5, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 16, 2023
@glenn-jocher
Copy link
Member

@bartlomiejgadzicki-digica symlinks generally do not affect training speed significantly, as they are simply pointers to the original data. However, they can introduce a slight overhead during data loading, so their impact on training speed might be negligible. Maintaining multiple dataset versions through symlinking is a smart storage solution. As long as your data loading and training procedures are not impacted, feel free to continue using symlinks to efficiently manage your datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale Stale and schedule for closing soon
Projects
None yet
Development

No branches or pull requests

6 participants