Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MLP/lm_head tp grain size setting. #6828

Merged
merged 8 commits into from
Dec 16, 2024

Conversation

Yejing-Lai
Copy link
Contributor

This PR aims to add MLP/lm_head tp size granularity setting to deepspeed.init_inference() API. It will be more flexible to set the MLP/lm_head sharding grain size.

DNN library favors tensor size in granularity of power of 2, we pick 64 as a default size.

We aim to be able to set the MLP/lm_head tp grain size flexibly. This is a preliminary solution. If there is a better solution, we can discuss it together. Thanks~

@Yejing-Lai Yejing-Lai requested a review from awan-10 as a code owner December 6, 2024 08:49
@Yejing-Lai
Copy link
Contributor Author

@delock Please review, thanks~

@delock
Copy link
Collaborator

delock commented Dec 7, 2024

Hi @tjruwase @awan-10 , this PR from @Yejing-Lai adds a new parameter to init_inference to set the MLP/lm_head granularity. The reason we need this parameter is different library will need different granularity to perform optimal. i.e. oneDNN needs granularity of 64. In quantization, granularity better set to the size of quantization group. Other library may need different granularity setting.

We know it extends init_inference where we think it's the right place to set autotp inference havior. Want's to hear your thoughts on this, thanks!

@tjruwase tjruwase requested review from loadams and removed request for awan-10 December 10, 2024 20:03
@tjruwase
Copy link
Contributor

@delock, @Yejing-Lai, thanks for the PR. I agree that updating init_inference is right choice. However, I am curious whether it is better to make grain_size a sub-field of the existing tensor_parallel field.

tensor_parallel: DeepSpeedTPConfig = Field({}, alias="tp")
"""
Configuration for tensor parallelism used to split the model across several
GPUs. Expects a dictionary containing values for :any:`DeepSpeedTPConfig`.
"""

Did you guys consider this choice? Either way, I have approved the PR.

@Yejing-Lai
Copy link
Contributor Author

@delock, @Yejing-Lai, thanks for the PR. I agree that updating init_inference is right choice. However, I am curious whether it is better to make grain_size a sub-field of the existing tensor_parallel field.

tensor_parallel: DeepSpeedTPConfig = Field({}, alias="tp")
"""
Configuration for tensor parallelism used to split the model across several
GPUs. Expects a dictionary containing values for :any:`DeepSpeedTPConfig`.
"""

Did you guys consider this choice? Either way, I have approved the PR.

Thank you for your advice. I updated the code. Please review~

@tjruwase
Copy link
Contributor

Thank you for your advice. I updated the code. Please review~

@Yejing-Lai, thanks for updating the code. LGTM.

@loadams loadams merged commit da771ed into microsoft:master Dec 16, 2024
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants