-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delay importing deepspeed comm due for perf #810
Conversation
@jychen-habana could you review this change, and also run the test which will be impacted with this change? I was trying to run but getting some error saying npz file is missing(required for quantization). Please validate if deepspeed part is working ok. Also, is this mixtral model always use deepspeed for multi-card run? |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, good catch!
I'll wait for @jychen-habana if there are additional tests to carry out.
Also, is this mixtral model always use deepspeed for multi-card run?
For inference yes, or are you talking about training?
I tried with a setup that the deepspeed is installed but not used. That will incur the error: DeepSpeed backend not set. So, better adding a condition of is_initialized.
Looks good to me. Added a little change for the case that deepspeed installed but not used. For FP8 quantization, the error "npz file missing" means that in quantization mode, it can not find the measured bf16 data in "./hqt_output". So, for single-card quantization just use single-card measurement. For Mixtral inference multi-card, yes, it's deepspeed TP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Traceback (most recent call last): |
What does this PR do?
We are seeing 2-3 secs total run time slowness with most of the mpi multicard tests with deepspeed installed compare to the one without. We found this part is being called for every single model run which causing the slowdown.
https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/transformers/models/mixtral/modeling_mixtral.py#L65