Skip to content

Commit 8d549e4

Browse files
BoyuanFeng0xrushi
authored andcommitted
use combo kernel to fuse qk-norm and qk-rope (vllm-project#26682)
Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
1 parent 744064d commit 8d549e4

File tree

1 file changed

+10
-0
lines changed

1 file changed

+10
-0
lines changed

vllm/config/compilation.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -513,6 +513,16 @@ def __post_init__(self) -> None:
513513
if isinstance(self.pass_config, dict):
514514
self.pass_config = PassConfig(**self.pass_config)
515515

516+
if (
517+
is_torch_equal_or_newer("2.9.0.dev")
518+
and "combo_kernels" not in self.inductor_compile_config
519+
and "benchmark_combo_kernel" not in self.inductor_compile_config
520+
):
521+
# use horizontal fusion, which is useful for fusing qk-norm and
522+
# qk-rope when query and key have different shapes.
523+
self.inductor_compile_config["combo_kernels"] = True
524+
self.inductor_compile_config["benchmark_combo_kernel"] = True
525+
516526
# migrate the deprecated flags
517527
if not self.use_cudagraph:
518528
logger.warning(

0 commit comments

Comments
 (0)