Skip to content

Conversation

@joseph-isaacs
Copy link
Contributor

@joseph-isaacs joseph-isaacs commented Oct 6, 2025

Looks like we need to JIT

benchmark runs:

gpu_decompress_kernel_only/u32/1GB
                        time:   [5.6860 ms 5.6878 ms 5.6892 ms]
                        thrpt:  [175.77 GiB/s 175.81 GiB/s 175.87 GiB/s]

gpu_decompress_kernel_only/u32/10GB
                        time:   [56.376 ms 56.384 ms 56.392 ms]
                        thrpt:  [177.33 GiB/s 177.35 GiB/s 177.38 GiB/s]
          
gpu_for_decompress_kernel_only/u32/1GB
                        time:   [19.657 ms 19.674 ms 19.694 ms]
                        thrpt:  [50.778 GiB/s 50.827 GiB/s 50.871 GiB/s]

gpu_for_decompress_kernel_only/u32/10GB
                        time:   [203.45 ms 203.81 ms 204.21 ms]
                        thrpt:  [48.969 GiB/s 49.064 GiB/s 49.151 GiB/s]

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@codecov
Copy link

codecov bot commented Oct 6, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.63%. Comparing base (e604d22) to head (1f66e5e).
⚠️ Report is 15 commits behind head on develop.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs added the changelog/feature A new feature label Oct 6, 2025
u
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@robert3005
Copy link
Contributor

You can add launch.record_kernel_launch(sys::CUevent_flags::CU_EVENT_DEFAULT) to the launch arguments and then the launch returns a tuple of before and after events

@joseph-isaacs
Copy link
Contributor Author

Yep. but I want to know the time to run both kernels

u
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs changed the title wip[gpu]: for kernel and framework for running pipeline feat[gpu]: for kernel and framework for running pipeline Oct 7, 2025
@joseph-isaacs joseph-isaacs marked this pull request as ready for review October 7, 2025 13:35
@joseph-isaacs joseph-isaacs enabled auto-merge (squash) October 7, 2025 13:36
@codspeed-hq
Copy link

codspeed-hq bot commented Oct 7, 2025

CodSpeed Performance Report

Merging #4857 will not alter performance

Comparing ji/bench-gpu-scan (1f66e5e) with develop (61ef9b6)1

Summary

✅ 1172 untouched

Footnotes

  1. No successful run was found on develop (159c8a1) during the generation of this report, so 61ef9b6 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

Copy link
Contributor

@robert3005 robert3005 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions to simplify time measurement

@joseph-isaacs
Copy link
Contributor Author

Its not clear how the simplification works when multiple launches?

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs merged commit 1e6f8a1 into develop Oct 7, 2025
41 checks passed
@joseph-isaacs joseph-isaacs deleted the ji/bench-gpu-scan branch October 7, 2025 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants