Skip to content

Conversation

@yukirora
Copy link
Contributor

@yukirora yukirora commented Sep 16, 2025

Description
This PR adds NCU (NVIDIA Nsight Compute) profiling support to the cublaslt-gemm micro benchmark, enabling detailed kernel analysis including DRAM throughput, compute throughput, and launch arguments.

Major Revision

  • Add --enable_ncu_profiling and --profiling_metrics for ncu profiling
  • Modifies command execution to use NCU when profiling is enabled
  • Updates result parsing to handle both standard and NCU profiled output formats

@yukirora yukirora requested a review from a team as a code owner September 16, 2025 09:34
@yukirora yukirora added the benchmarks SuperBench Benchmarks label Sep 16, 2025
@cp5555 cp5555 added the micro-benchmarks Micro Benchmark Test for SuperBench Benchmarks label Sep 18, 2025
Copy link
Member

@abuccts abuccts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls fix errors in unit tests

@abuccts abuccts requested a review from Copilot September 22, 2025 07:26
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds NCU (NVIDIA Nsight Compute) profiling support to the cublaslt-gemm micro benchmark, enabling detailed kernel analysis including DRAM throughput, compute throughput, and launch arguments.

  • Adds two new command-line arguments: --enable_ncu_profiling and --profiling_metrics
  • Modifies command execution to use NCU when profiling is enabled
  • Updates result parsing to handle both standard and NCU profiled output formats

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
superbench/benchmarks/micro_benchmarks/cublaslt_function.py Adds NCU profiling arguments, command wrapping, and CSV output parsing
tests/benchmarks/micro_benchmarks/test_cublaslt_function.py Updates test cases to include new profiling arguments and adds NCU output parsing test

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@codecov
Copy link

codecov bot commented Sep 28, 2025

Codecov Report

❌ Patch coverage is 81.81818% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.71%. Comparing base (fe23426) to head (638ba7f).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...h/benchmarks/micro_benchmarks/cublaslt_function.py 81.81% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #740      +/-   ##
==========================================
- Coverage   85.74%   85.71%   -0.04%     
==========================================
  Files         102      102              
  Lines        7640     7678      +38     
==========================================
+ Hits         6551     6581      +30     
- Misses       1089     1097       +8     
Flag Coverage Δ
cpu-python3.10-unit-test 70.94% <81.81%> (+0.04%) ⬆️
cpu-python3.12-unit-test 70.94% <81.81%> (+0.04%) ⬆️
cpu-python3.7-unit-test 70.39% <81.81%> (+0.04%) ⬆️
cuda-unit-test 83.61% <81.81%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@guoshzhao guoshzhao mentioned this pull request Oct 2, 2025
30 tasks
@yukirora yukirora merged commit f6e65a9 into main Oct 23, 2025
26 of 27 checks passed
@yukirora yukirora deleted the yutji/cublaslt-profile branch October 23, 2025 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmarks SuperBench Benchmarks micro-benchmarks Micro Benchmark Test for SuperBench Benchmarks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants