Delay importing deepspeed comm due for perf #810

jiminha · 2024-03-16T02:19:44Z

What does this PR do?

We are seeing 2-3 secs total run time slowness with most of the mpi multicard tests with deepspeed installed compare to the one without. We found this part is being called for every single model run which causing the slowdown.
https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/transformers/models/mixtral/modeling_mixtral.py#L65

jiminha · 2024-03-16T02:21:53Z

@jychen-habana could you review this change, and also run the test which will be impacted with this change? I was trying to run but getting some error saying npz file is missing(required for quantization). Please validate if deepspeed part is working ok.

Also, is this mixtral model always use deepspeed for multi-card run?

HuggingFaceDocBuilderDev · 2024-03-16T02:23:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

regisss

LGTM, good catch!

I'll wait for @jychen-habana if there are additional tests to carry out.

Also, is this mixtral model always use deepspeed for multi-card run?

For inference yes, or are you talking about training?

optimum/habana/transformers/models/mixtral/modeling_mixtral.py

I tried with a setup that the deepspeed is installed but not used. That will incur the error: DeepSpeed backend not set. So, better adding a condition of is_initialized.

jychen21 · 2024-03-17T02:54:51Z

@jiminha @regisss

Looks good to me. Added a little change for the case that deepspeed installed but not used.

For FP8 quantization, the error "npz file missing" means that in quantization mode, it can not find the measured bf16 data in "./hqt_output". So, for single-card quantization just use single-card measurement.

For Mixtral inference multi-card, yes, it's deepspeed TP.

jychen21

LGTM

SushantGautam · 2024-05-07T22:21:45Z

Traceback (most recent call last):
File "/optimum-habana/examples/text-generation/run_generation.py", line 626, in
main()
File "/optimum-habana/examples/text-generation/run_generation.py", line 278, in main
model, tokenizer, generation_config = initialize_model(args, logger)
File "/optimum-habana/examples/text-generation/utils.py", line 379, in initialize_model
setup_model(args, model_dtype, model_kwargs, logger)
File "/optimum-habana/examples/text-generation/utils.py", line 168, in setup_model
habana_quantization_toolkit.prep_model(model)
File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/prepare_quant/prepare_model.py", line 12, in prep_model
prepare_model(model) # registers hooks
File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/_hook_method/init.py", line 45, in prepare_model
return quantize_hooks(model, mod_list)
File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/_hook_method/quantize.py", line 63, in quantize_hooks
measurement=load_measurements(config['measure_file'])
File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/_hook_method/measure.py", line 125, in load_measurements
d = load_file(fname_np, np.ndarray, fail_on_file_not_exist=config['scale_method'] not in [ScaleMethod.WITHOUT_SCALE, ScaleMethod.UNIT_SCALE])
File "/usr/local/lib/python3.10/dist-packages/habana_quantization_toolkit/_hook_method/common.py", line 109, in load_file
raise FileNotFoundError(f"Failed to load file {fname}")
FileNotFoundError: Failed to load file ./hqt_output/measure_hooks_maxabs.npz

Delay importing deepspeed comm due for perf

db3faaa

jiminha requested a review from regisss as a code owner March 16, 2024 02:19

jiminha requested review from jychen21 and removed request for regisss March 16, 2024 02:19

regisss approved these changes Mar 16, 2024

View reviewed changes

optimum/habana/transformers/models/mixtral/modeling_mixtral.py Outdated Show resolved Hide resolved

Update modeling_mixtral.py

6ec808c

I tried with a setup that the deepspeed is installed but not used. That will incur the error: DeepSpeed backend not set. So, better adding a condition of is_initialized.

jychen21 approved these changes Mar 17, 2024

View reviewed changes

Remove print

9742915

regisss merged commit c7a5498 into main Mar 17, 2024
9 checks passed

regisss deleted the jha/perfwithoutds branch March 17, 2024 11:38

astachowiczhabana mentioned this pull request Jun 13, 2024

Upgrade to OH 1.11+TF=4.38 HabanaAI/optimum-habana-fork#205

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delay importing deepspeed comm due for perf #810

Delay importing deepspeed comm due for perf #810

jiminha commented Mar 16, 2024

jiminha commented Mar 16, 2024

HuggingFaceDocBuilderDev commented Mar 16, 2024

regisss left a comment •

edited

Loading

jychen21 commented Mar 17, 2024

jychen21 left a comment

SushantGautam commented May 7, 2024

Delay importing deepspeed comm due for perf #810

Delay importing deepspeed comm due for perf #810

Conversation

jiminha commented Mar 16, 2024

What does this PR do?

jiminha commented Mar 16, 2024

HuggingFaceDocBuilderDev commented Mar 16, 2024

regisss left a comment • edited Loading

Choose a reason for hiding this comment

jychen21 commented Mar 17, 2024

jychen21 left a comment

Choose a reason for hiding this comment

SushantGautam commented May 7, 2024

regisss left a comment •

edited

Loading