-
Notifications
You must be signed in to change notification settings - Fork 80
Benchmarks: Micro benchmark - add ncu profile support in cublaslt-gemm #740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
abuccts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls fix errors in unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds NCU (NVIDIA Nsight Compute) profiling support to the cublaslt-gemm micro benchmark, enabling detailed kernel analysis including DRAM throughput, compute throughput, and launch arguments.
- Adds two new command-line arguments:
--enable_ncu_profilingand--profiling_metrics - Modifies command execution to use NCU when profiling is enabled
- Updates result parsing to handle both standard and NCU profiled output formats
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| superbench/benchmarks/micro_benchmarks/cublaslt_function.py | Adds NCU profiling arguments, command wrapping, and CSV output parsing |
| tests/benchmarks/micro_benchmarks/test_cublaslt_function.py | Updates test cases to include new profiling arguments and adds NCU output parsing test |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #740 +/- ##
==========================================
- Coverage 85.74% 85.71% -0.04%
==========================================
Files 102 102
Lines 7640 7678 +38
==========================================
+ Hits 6551 6581 +30
- Misses 1089 1097 +8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Description
This PR adds NCU (NVIDIA Nsight Compute) profiling support to the cublaslt-gemm micro benchmark, enabling detailed kernel analysis including DRAM throughput, compute throughput, and launch arguments.
Major Revision