-
Notifications
You must be signed in to change notification settings - Fork 626
Moe bf16 ep #4144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Moe bf16 ep #4144
Conversation
| # we don't need to read this, it would be passed to ray workers | ||
| # If Ray is launched from outside, it may fail to access the environment variables. | ||
| os.getenv('DEEPEP_MAX_BATCH_SIZE', None) | ||
| os.getenv('DEEPEP_MAX_TOKENS_PER_RANK', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to set those envs manually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DLBlas read these vars to build buffer.
https://github.com/DeepLink-org/DLBlas/blob/1710a860f654ddf50907251ec51670910368ee45/dlblas/layers/moe/token_dispatcher.py#L43
| hidden_dim: int, | ||
| top_k: int, | ||
| layer_idx: int = 0, | ||
| chunk_size: Optional[int] = 32 * 1024, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chunk_size is not used by FusedMoENormal
backends/moe.py and nn/moe.py has been refactored.
Reuse token dispatcher in DLBlas