-
Notifications
You must be signed in to change notification settings - Fork 49
[FEAT] Add support for AITER bpreshuffle block scale gemm #717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: llama_fp8_03122025
Are you sure you want to change the base?
[FEAT] Add support for AITER bpreshuffle block scale gemm #717
Conversation
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com>
| B: torch.Tensor, | ||
| As: torch.Tensor, | ||
| Bs: torch.Tensor, | ||
| block_size: list[int], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we have this parameter if block_size is not used?
| B: torch.Tensor, | ||
| As: torch.Tensor, | ||
| Bs: torch.Tensor, | ||
| block_size: list[int], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, the block_size parameter
| self.use_aiter_and_is_supported: int = int( | ||
| current_platform.is_rocm() and envs.VLLM_ROCM_USE_AITER | ||
| and envs.VLLM_ROCM_USE_AITER_LINEAR | ||
| and current_platform.is_fp8_fnuz()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about mi355 ?
Purpose
This PR add the use of
aiter.gemm_a8w8_blockscale_bpreshuffleif shuffle is enabled.It also remove the direct_register_custom_op overhead from
aiter.gemm_a8w8_blockscaleaiter.gemm_a8w8_blockscale_bpreshuffleis used if and only ifuse_swizzleis True.How to Tune
https://github.com/ROCm/aiter/tree/main/csrc/ck_gemm_a8w8_blockscale_bpreshuffle
https://github.com/ROCm/aiter/tree/main/csrc/ck_gemm_a8w8_blockscale
Alternative guide: https://github.com/EmbeddedLLM/vllmtests/tree/main/kernels/blockscalegemm
Test Plan
Evaluate the lm_eval score of
Qwen/Qwen3-235B-A22B-Instruct-2507-FP8before and after.Evaluate the benchmark performance.
Test Result
lm_eval score
Performance
There are a few cases being evaluated
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.