Skip to content

Conversation

@WanRui37
Copy link

FastDeploy中的MoE GroupGEMM支持INT8*INT8实现的RFC

@paddle-bot
Copy link

paddle-bot bot commented Oct 16, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请检查PR提交格式和内容是否完备,具体请参考示例模版
Your PR has been submitted. Thanks for your contribution!
Please check its format and content. For this, you can refer to Template and Demo.

- 目前业内`MoE GroupGEMM`没有支持`INT8*INT8`的实现

# 四、设计思路与实现方案
1. 一些参考的代码路径
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

快速实现可以参考FD已有的wfp8afp8 triton算子,同时可以参考下vllm和TensorRT-LLM的实现方案。不限制CUDA和triton实现方案。如果在完成算子的基础上,可以加入更进一步算子融合(例如GLM4.5-AIR MoE融合共享专家层)。

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感谢感谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants