-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Add cutlass support for blackwell fp8 blockwise gemm #14383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
It looks like I'm curious to compare it to the blockwise kernels I wrote in the blackwell-rebase-feb20 branch of deepinfra/vllm , since we had to change the scale factor to be fp8 |
2f5591d
to
546b495
Compare
Thank you @wenscarl! Please correct the |
3a34259
to
07e8083
Compare
@tylertitsworth Can you please take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! A couple of comments but looks good overall
csrc/quantization/cutlass_w8a8/c3x/scaled_mm_blockwise_sm100_fp8.cu
Outdated
Show resolved
Hide resolved
csrc/quantization/cutlass_w8a8/c3x/scaled_mm_blockwise_sm100_fp8_dispatch.cuh
Outdated
Show resolved
Hide resolved
csrc/quantization/cutlass_w8a8/c3x/scaled_mm_blockwise_sm100_fp8_dispatch.cuh
Outdated
Show resolved
Hide resolved
This pull request has merge conflicts that must be resolved before it can be |
287cfb3
to
ca3a3e2
Compare
ca3a3e2
to
87d109d
Compare
a06d3da
to
3dfa546
Compare
Looks good to me now, thank you! Please merge in latest main to fix the |
csrc/quantization/cutlass_w8a8/c3x/scaled_mm_blockwise_sm100_fp8_dispatch.cuh
Outdated
Show resolved
Hide resolved
csrc/quantization/cutlass_w8a8/c3x/scaled_mm_blockwise_sm100_fp8_dispatch.cuh
Outdated
Show resolved
Hide resolved
This pull request has merge conflicts that must be resolved before it can be |
703ec2c
to
1d17dd1
Compare
1d17dd1
to
92d6da8
Compare
92d6da8
to
86d58fd
Compare
7ca702c
to
f207fec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now, thanks!
Head branch was pushed to by a user without write access
f207fec
to
cc21ba5
Compare
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang <shuw@nvidia.com>
cc21ba5
to
b6783db
Compare
This PR adds support for cutlass blackwell blockwise gemm for fp8.