Add timing cache to accelerate consequent `.engine` export #13386

imyhxy · 2024-10-25T07:43:23Z

It's time consuming to export .engine format model with --half option. TensorRT provides an option to use timing cache to accelerate the consequent export process. This PR add this option to the export.py.

Export yolov5m.pt without timing cache take 565.4s:

python export.py --weights weights/yolov5m.pt --include engine --opset 17 --half --device 0

[10/25/2024-14:58:27] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 5 MiB, GPU 213 MiB
[10/25/2024-14:58:27] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 3283 MiB
TensorRT: export success ✅ 565.4s, saved as weights/yolov5m.engine (42.3 MB)

Export complete (566.4s)

And shrink to 11.5s with timing cache (the second run):

python export.py --weights weights/yolov5m.pt --include engine --opset 17 --half --device 0 --cache runs/timing.cache

[10/25/2024-15:08:19] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 5 MiB, GPU 44 MiB
[10/25/2024-15:08:19] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 4374 MiB
[10/25/2024-15:08:19] [TRT] [I] Serialized 27 bytes of code generator cache.
[10/25/2024-15:08:19] [TRT] [I] Serialized 14672 timing cache entries
TensorRT: export success ✅ 11.5s, saved as weights/yolov5m.engine (42.3 MB)

Export complete (12.3s)

Reference for timing cache

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Enhancements to YOLOv5's TensorRT export functionality include adding a timing cache feature, improving export efficiency and performance.

📊 Key Changes

Introduced a cache parameter to the export_engine function, allowing users to specify a file path for storing the TensorRT timing cache.
Adjusted the run function to pass the cache parameter during TensorRT export.
Updated the CLI to include a --cache argument that specifies the timing cache file path.
Minor code formatting improvements in train configuration comments.

🎯 Purpose & Impact

Efficiency Boost: The timing cache provides performance improvements during TensorRT export by reusing kernel choice decisions, which can result in faster export times and potential runtime optimization.
User Control: Users gain more control over the export process by storing and utilizing timing data, particularly beneficial for model deployment scenarios requiring multiple runs.
Improved Documentation: Clearer code comments enhance understanding for developers and maintainers, fostering better configuration management.

imyhxy · 2024-10-25T07:43:55Z

I have read the CLA Document and I sign the CLA

UltralyticsAssistant · 2024-10-25T07:44:04Z

👋 Hello @imyhxy, thank you for submitting an ultralytics/yolov5 🚀 pull request! Your contribution to enhancing the export functionality with TensorRT timing cache is appreciated and has the potential to benefit many users with improved performance.

To ensure a seamless integration of your work, please review the following checklist:

✅ Define a Purpose: You've clearly explained the purpose of your fix in the PR description. If there are any relevant issues you can link, please do so for additional context.
✅ Synchronize with Source: Confirm your PR is synchronized with the ultralytics/yolov5 main branch. If it's behind, update it by clicking the 'Update branch' button or by running git pull and git merge main locally.
✅ Ensure CI Checks Pass: Verify all Ultralytics Continuous Integration (CI) checks are passing. If any checks fail, please address the issues.
✅ Update Documentation: Update the relevant documentation for this new feature to ensure users know how to use it effectively.
✅ Add Tests: Include or update tests to cover your changes, and confirm that all tests are passing.
✅ Sign the CLA: Please ensure you have signed our Contributor License Agreement if this is your first Ultralytics PR by writing "I have read the CLA Document and I sign the CLA" in a new message.
✅ Minimize Changes: Limit your changes to the minimum necessary for your feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." — Bruce Lee

For more guidance, please refer to our Contributing Guide. Don’t hesitate to leave a comment if you have any questions. An Ultralytics engineer will review your PR and provide further assistance soon. Thank you for contributing to Ultralytics! 🚀😊

glenn-jocher · 2024-11-08T22:22:52Z

@imyhxy wow this is super fast! Can you apply this upgrade to https://github.com/ultralytics/ultralytics too please? Thanks!

glenn-jocher · 2024-11-08T22:24:35Z

@imyhxy PR merged!

imyhxy added 2 commits October 24, 2024 14:49

fix: typos

8ee1670

feat: enable timing cache for engine export

10bf52d

UltralyticsAssistant added documentation Improvements or additions to documentation enhancement New feature or request python labels Oct 25, 2024

Auto-format by https://ultralytics.com/actions

c7e819c

UltralyticsAssistant removed the python label Oct 27, 2024

glenn-jocher merged commit 1435a8e into ultralytics:master Nov 8, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add timing cache to accelerate consequent `.engine` export #13386

Add timing cache to accelerate consequent `.engine` export #13386

imyhxy commented Oct 25, 2024 •

edited by UltralyticsAssistant

Loading

imyhxy commented Oct 25, 2024

UltralyticsAssistant commented Oct 25, 2024

glenn-jocher commented Nov 8, 2024

glenn-jocher commented Nov 8, 2024

Add timing cache to accelerate consequent .engine export #13386

Add timing cache to accelerate consequent .engine export #13386

Conversation

imyhxy commented Oct 25, 2024 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

imyhxy commented Oct 25, 2024

UltralyticsAssistant commented Oct 25, 2024

glenn-jocher commented Nov 8, 2024

glenn-jocher commented Nov 8, 2024

Add timing cache to accelerate consequent `.engine` export #13386

Add timing cache to accelerate consequent `.engine` export #13386

imyhxy commented Oct 25, 2024 •

edited by UltralyticsAssistant

Loading