Closed
Description
Describe the bug
Hi Triton team,
I have been trying to integrate matmul_ogs
kernel into vLLM. But I notice that in the reference implementation matmul_ogs
,
triton/bench/triton_bench/matmul_ogs.py
Lines 615 to 618 in e341f7f
after matmul between activation and expert weight, all expert output for a certain token is just added together instead of applying the expert weight in
rdata.gate_scal
. Since matmul_ogs
and matmul_ogs_torch
are equivalent, I think the same also applies to matmul_ogs
? Is that normal, or am I missing something?
Thanks in advance
Environment details
Triton 3.3
CPU: Intel Xeon Gold 6126 CPU @ 2.60GHz
GPU: RTX 8xA6000, driver version 560.35.05, CUDA 12.6