Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timing cache to accelerate consequent .engine export #13386

Merged
merged 3 commits into from
Nov 8, 2024

Conversation

imyhxy
Copy link
Contributor

@imyhxy imyhxy commented Oct 25, 2024

It's time consuming to export .engine format model with --half option. TensorRT provides an option to use timing cache to accelerate the consequent export process. This PR add this option to the export.py.

Export yolov5m.pt without timing cache take 565.4s:

python export.py --weights weights/yolov5m.pt --include engine --opset 17 --half --device 0
[10/25/2024-14:58:27] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 5 MiB, GPU 213 MiB
[10/25/2024-14:58:27] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 3283 MiB
TensorRT: export success ✅ 565.4s, saved as weights/yolov5m.engine (42.3 MB)

Export complete (566.4s)

And shrink to 11.5s with timing cache (the second run):

python export.py --weights weights/yolov5m.pt --include engine --opset 17 --half --device 0 --cache runs/timing.cache
[10/25/2024-15:08:19] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 5 MiB, GPU 44 MiB
[10/25/2024-15:08:19] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 4374 MiB
[10/25/2024-15:08:19] [TRT] [I] Serialized 27 bytes of code generator cache.
[10/25/2024-15:08:19] [TRT] [I] Serialized 14672 timing cache entries
TensorRT: export success ✅ 11.5s, saved as weights/yolov5m.engine (42.3 MB)

Export complete (12.3s)

Reference for timing cache

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Enhancements to YOLOv5's TensorRT export functionality include adding a timing cache feature, improving export efficiency and performance.

📊 Key Changes

  • Introduced a cache parameter to the export_engine function, allowing users to specify a file path for storing the TensorRT timing cache.
  • Adjusted the run function to pass the cache parameter during TensorRT export.
  • Updated the CLI to include a --cache argument that specifies the timing cache file path.
  • Minor code formatting improvements in train configuration comments.

🎯 Purpose & Impact

  • Efficiency Boost: The timing cache provides performance improvements during TensorRT export by reusing kernel choice decisions, which can result in faster export times and potential runtime optimization.
  • User Control: Users gain more control over the export process by storing and utilizing timing data, particularly beneficial for model deployment scenarios requiring multiple runs.
  • Improved Documentation: Clearer code comments enhance understanding for developers and maintainers, fostering better configuration management.

@UltralyticsAssistant UltralyticsAssistant added documentation Improvements or additions to documentation enhancement New feature or request python labels Oct 25, 2024
@imyhxy
Copy link
Contributor Author

imyhxy commented Oct 25, 2024

I have read the CLA Document and I sign the CLA

@UltralyticsAssistant
Copy link
Member

👋 Hello @imyhxy, thank you for submitting an ultralytics/yolov5 🚀 pull request! Your contribution to enhancing the export functionality with TensorRT timing cache is appreciated and has the potential to benefit many users with improved performance.

To ensure a seamless integration of your work, please review the following checklist:

  • Define a Purpose: You've clearly explained the purpose of your fix in the PR description. If there are any relevant issues you can link, please do so for additional context.
  • Synchronize with Source: Confirm your PR is synchronized with the ultralytics/yolov5 main branch. If it's behind, update it by clicking the 'Update branch' button or by running git pull and git merge main locally.
  • Ensure CI Checks Pass: Verify all Ultralytics Continuous Integration (CI) checks are passing. If any checks fail, please address the issues.
  • Update Documentation: Update the relevant documentation for this new feature to ensure users know how to use it effectively.
  • Add Tests: Include or update tests to cover your changes, and confirm that all tests are passing.
  • Sign the CLA: Please ensure you have signed our Contributor License Agreement if this is your first Ultralytics PR by writing "I have read the CLA Document and I sign the CLA" in a new message.
  • Minimize Changes: Limit your changes to the minimum necessary for your feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." — Bruce Lee

For more guidance, please refer to our Contributing Guide. Don’t hesitate to leave a comment if you have any questions. An Ultralytics engineer will review your PR and provide further assistance soon. Thank you for contributing to Ultralytics! 🚀😊

@glenn-jocher
Copy link
Member

@imyhxy wow this is super fast! Can you apply this upgrade to https://github.com/ultralytics/ultralytics too please? Thanks!

@glenn-jocher glenn-jocher merged commit 1435a8e into ultralytics:master Nov 8, 2024
8 checks passed
@glenn-jocher
Copy link
Member

@imyhxy PR merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants