Skip to content

Conversation

@joseph-isaacs
Copy link
Contributor

@joseph-isaacs joseph-isaacs commented Oct 7, 2025

I have only implemented a PoC fused FoR-BP kernel, I don't want to implemented them all since there will be a lot of duplication. I think we likely need to compile these at runtime.

I have also fixed up the kernels build system.

Fused is fast

gpu_for_bp_fused_decompress_kernel_only/u32/1GB
                        time:   [5.5376 ms 5.5410 ms 5.5443 ms]
                        thrpt:  [180.37 GiB/s 180.47 GiB/s 180.58 GiB/s]

Signed-off-by: Joe Isaacs joe.isaacs@live.co.uk

@joseph-isaacs joseph-isaacs changed the title wip[gpu]: for kernel and framework for running pipeline wip[gpu]: fused AoT for and bitpacking kernel Oct 7, 2025
@codecov
Copy link

codecov bot commented Oct 7, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.63%. Comparing base (b537c67) to head (7dc9fe8).
⚠️ Report is 3 commits behind head on develop.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
# Conflicts:
#	fls-gpu-kernel-gen/src/bit_unpack.rs
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs marked this pull request as ready for review October 8, 2025 12:37
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs added the changelog/performance A performance improvement label Oct 8, 2025
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
@joseph-isaacs joseph-isaacs changed the title wip[gpu]: fused AoT for and bitpacking kernel perf[gpu]: fused AoT for and bitpacking kernel Oct 8, 2025
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
launch.arg(&self.packed);
launch.arg(&self.unpacked);
launch.arg(&self.reference);
launch.record_kernel_launch(CU_EVENT_DEFAULT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this

@joseph-isaacs joseph-isaacs merged commit 680e107 into develop Oct 8, 2025
42 checks passed
@joseph-isaacs joseph-isaacs deleted the ji/aot-fused branch October 8, 2025 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/performance A performance improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants