Skip to content

feat(vit_cuda_kernels):add norm quant and some fused ops #886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

theNiemand
Copy link

vit fp8w8a8量化推理相关算子优化

新增算子

  1. rmsnorm_bf16,性能较pytorch较大提升
  2. pre_tp_norm,融合了tp_norm的通信前操作
  3. post_tp_norm,融合了tp_norm的通信后操作
  4. pre_token_quant,逐token FP8量化,性能较vllm的quant极大提升,较sgl的quant性能更好
  5. gelu_per_token_quant,融合了GELU激活 + 逐token FP8量化
  6. add_norm_quant,融合了attention与mlp模块间的,add norm quant操作
  7. cutlass_scaled_mm_bias_ls,融合了量化矩阵乘、反量化和可选的bias和ls weight

@SangChengC SangChengC closed this May 9, 2025
@SangChengC SangChengC reopened this May 9, 2025
@SangChengC SangChengC merged commit 8b5f18b into ModelTC:add-lightllm-kernel May 9, 2025
@theNiemand theNiemand deleted the vit_add_cuda_kernels branch May 9, 2025 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants