feat(vit_cuda_kernels):add norm quant and some fused ops #886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

SangChengC merged 1 commit into ModelTC:add-lightllm-kernel from theNiemand:vit_add_cuda_kernels

May 9, 2025

theNiemand commented May 9, 2025

vit fp8w8a8量化推理相关算子优化

新增算子

rmsnorm_bf16，性能较pytorch较大提升
pre_tp_norm，融合了tp_norm的通信前操作
post_tp_norm，融合了tp_norm的通信后操作
pre_token_quant，逐token FP8量化，性能较vllm的quant极大提升，较sgl的quant性能更好
gelu_per_token_quant，融合了GELU激活 + 逐token FP8量化
add_norm_quant，融合了attention与mlp模块间的，add norm quant操作
cutlass_scaled_mm_bias_ls，融合了量化矩阵乘、反量化和可选的bias和ls weight


          feat(vit_cuda_kernels):add norm quant and some fused ops

09f5858

SangChengC closed this

SangChengC reopened this

SangChengC merged commit 8b5f18b into ModelTC:add-lightllm-kernel

theNiemand deleted the vit_add_cuda_kernels branch

May 9, 2025 12:32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet