We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
多卡直接用huggingface transformers的trainer,训练mixtral,遇见这种错误,请问您们有遇到过吗? 看起来像是,因为moe的专家路由,导致每张卡的通信量不一致了?