Skip to content

Triton matmul_ogs Kernel missing multiplying expert weight #6527

Closed
@zyongye

Description

@zyongye

Describe the bug

Hi Triton team,
I have been trying to integrate matmul_ogs kernel into vLLM. But I notice that in the reference implementation matmul_ogs,

for i, (lo, hi) in enumerate(offs):
dst_idx = scatter_indx.dst_indx[lo:hi] // n_expts_act
msk = dst_idx != -1
out[dst_idx[msk], :] += y[lo:hi, :][msk, :].float()

after matmul between activation and expert weight, all expert output for a certain token is just added together instead of applying the expert weight in rdata.gate_scal. Since matmul_ogs and matmul_ogs_torchare equivalent, I think the same also applies to matmul_ogs? Is that normal, or am I missing something?

Thanks in advance

Environment details

Triton 3.3
CPU: Intel Xeon Gold 6126 CPU @ 2.60GHz
GPU: RTX 8xA6000, driver version 560.35.05, CUDA 12.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions