Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay importing deepspeed comm due for perf #810

Merged
merged 3 commits into from
Mar 17, 2024
Merged

Conversation

jiminha
Copy link
Collaborator

@jiminha jiminha commented Mar 16, 2024

What does this PR do?

We are seeing 2-3 secs total run time slowness with most of the mpi multicard tests with deepspeed installed compare to the one without. We found this part is being called for every single model run which causing the slowdown.
https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/transformers/models/mixtral/modeling_mixtral.py#L65

@jiminha jiminha requested a review from regisss as a code owner March 16, 2024 02:19
@jiminha jiminha requested review from jychen21 and removed request for regisss March 16, 2024 02:19
@jiminha
Copy link
Collaborator Author

jiminha commented Mar 16, 2024

@jychen-habana could you review this change, and also run the test which will be impacted with this change? I was trying to run but getting some error saying npz file is missing(required for quantization). Please validate if deepspeed part is working ok.

Also, is this mixtral model always use deepspeed for multi-card run?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good catch!

I'll wait for @jychen-habana if there are additional tests to carry out.

Also, is this mixtral model always use deepspeed for multi-card run?

For inference yes, or are you talking about training?

I tried with a setup that the deepspeed is installed but not used. That will incur the error: DeepSpeed backend not set.
So, better adding a condition of is_initialized.
@jychen21
Copy link
Collaborator

@jiminha @regisss

Looks good to me. Added a little change for the case that deepspeed installed but not used.

For FP8 quantization, the error "npz file missing" means that in quantization mode, it can not find the measured bf16 data in "./hqt_output". So, for single-card quantization just use single-card measurement.

For Mixtral inference multi-card, yes, it's deepspeed TP.

Copy link
Collaborator

@jychen21 jychen21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@regisss regisss merged commit c7a5498 into main Mar 17, 2024
9 checks passed
@regisss regisss deleted the jha/perfwithoutds branch March 17, 2024 11:38
@SushantGautam
Copy link

Traceback (most recent call last):
File "/optimum-habana/examples/text-generation/run_generation.py", line 626, in
main()
File "/optimum-habana/examples/text-generation/run_generation.py", line 278, in main
model, tokenizer, generation_config = initialize_model(args, logger)
File "/optimum-habana/examples/text-generation/utils.py", line 379, in initialize_model
setup_model(args, model_dtype, model_kwargs, logger)
File "/optimum-habana/examples/text-generation/utils.py", line 168, in setup_model
habana_quantization_toolkit.prep_model(model)
File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/prepare_quant/prepare_model.py", line 12, in prep_model
prepare_model(model) # registers hooks
File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/_hook_method/init.py", line 45, in prepare_model
return quantize_hooks(model, mod_list)
File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/_hook_method/quantize.py", line 63, in quantize_hooks
measurement=load_measurements(config['measure_file'])
File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/_hook_method/measure.py", line 125, in load_measurements
d = load_file(fname_np, np.ndarray, fail_on_file_not_exist=config['scale_method'] not in [ScaleMethod.WITHOUT_SCALE, ScaleMethod.UNIT_SCALE])
File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/_hook_method/common.py", line 109, in load_file
raise FileNotFoundError(f"Failed to load file {fname}")
FileNotFoundError: Failed to load file ./hqt_output/measure_hooks_maxabs.npz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants