-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix conflict between Tutel and top-2 gate in MoE layer #2053
Conversation
Thank you for the PR @yetiansh :) It looks good to me. Alex had added tutel support so let me tag him and ask for a quick review. @alexandremuzio - can you please review this real quick? |
@yetiansh - can you please follow the guide here and update your PR? I see its failing in format checks. https://github.com/microsoft/DeepSpeed/blob/master/CONTRIBUTING.md |
Looks good to me. Thanks! |
Thanks @alexandremuzio @awan-10. I've run the |
Hi, is this PR still active? @awan-10 @alexandremuzio |
Sorry for the delay in getting back @yetiansh. I approved this PR so tests can run. Will merge it as soon as tests pass. Thank you! |
|
||
if self.use_tutel: | ||
logger.info('Using Tutel optimizations.') | ||
elif use_tutel and not TUTEL_INSTALLED: | ||
logger.warning("Tutel optimization requested but not installed. " | ||
"Proceeding without Tutel.") | ||
elif use_tutel and TUTEL_INSTALLED and gate.k != 1: | ||
logger.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we wrap this in a if torch.distributed.get_rank() ==0:
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it is possible. But I wonder should we also wrap other warnings and infos? For example, L480 and L482-483?
Previously, using both Tutel optimization and top-2 gating in MoE model training would fail. If we enable both Tutel and top-2,
MoELayer
would try to unpack top-2 gate's output at here, which would fail because top-2 gate does not produce these number of outputs.Fix by checking the gate's type when constructing
MoELayer
.