deciLM support #1133

sywangyi · 2024-07-12T04:52:21Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

sywangyi · 2024-07-12T04:53:29Z

@yao-matrix

sywangyi · 2024-07-12T04:54:17Z

Hi, @regisss do we have plan to optimize the remote-code model like deci in habana? If yes, could it do like this way?

HuggingFaceDocBuilderDev · 2024-07-12T04:56:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

sywangyi · 2024-07-17T02:49:07Z

throught(tokens/s)	A100	Gaudi2
BF16	43.5	121.7
FP32	42.67	54.59

python run_generation.py
--model_name_or_path Deci/DeciLM-7B
--use_kv_cache
--max_new_tokens 100
--batch_size 1
--bf16
--use_hpu_graphs
--prompt "DeepSpeed is a machine learning framework" \

imangohari1 · 2024-07-17T23:13:32Z

Hi @sywangyi
I have the following suggestions for this PR.
Let me know if you have any questions.

Pls. test the deciLM with different in/out token and bs sizes and share the results.
Pls. test the enabling with different deciLM model sizes, 6b and 7b and -instruc versions, as well. https://huggingface.co/collections/Deci/decilm-models-65a7fb6a65e4f1a5eb14917a
Pls made sure to run make style (if not already).
Can we add a fast CI test for this?

sywangyi · 2024-07-17T23:25:56Z

@imangohari1 my question is just to align could we support remote code model like this way? there's no remote code model support in the optimum-habana now. If it's yes. I will have a follow-up about your suggestion.

imangohari1 · 2024-07-17T23:37:47Z

@imangohari1 my question is just to align could we support remote code model like this way? there's no remote code model support in the optimum-habana now. If it's yes. I will have a follow-up about your suggestion.

I am not sure. Let's get @regisss input.
If the implemenation here is unclear, I suggest changing this to draft @sywangyi

sywangyi · 2024-07-18T00:39:42Z

@imangohari1 my question is just to align could we support remote code model like this way? there's no remote code model support in the optimum-habana now. If it's yes. I will have a follow-up about your suggestion.

I am not sure. Let's get @regisss input. If the implemenation here is unclear, I suggest changing this to draft @sywangyi

I've made the PR open to draw everyone's attention to it which would accelerate the decision making. Thanks for your feedback.

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

regisss · 2024-07-18T08:20:03Z

Yes, it's fine doing like that. Please specify the exact commit hash your code is based on.
I guess we still need to set trust_remote_code to True when instantiating the model right?

sywangyi · 2024-07-18T08:28:19Z

trust_remote_code

trust_remote_code is not needed any more in this way. it will use the model implementation in optimum-habana. because I registered it in adapt_transformers_to_gaudi
transformers.AutoConfig.register("deci", DeciLMConfig)
transformers.AutoModelForCausalLM.register(DeciLMConfig, DeciLMForCausalLM)

imangohari1

Hi @sywangyi
I have tested these changes and it works with DeciLM-7B and it works, but it does not work with DeciLM-6b (below details).
Please:

Revisit/test for DeciLM-6b/DeciLM-6b-instruct models. https://huggingface.co/Deci/DeciLM-6b
Sync/rebase at the top of OH main.

Thank you.

Tests

deciLM7B, new token 1028, bs 16

python run_generation.py --model_name_or_path /datasets/huggingface/hub/models--Deci--DeciLM-7B/snapshots/c3c9f4226801dc0433f32aebffe0aac68ee2f051/ --use_kv_cache --max_new_tokens 1028 --batch_size 16 --bf16 --use_hpu_graphs --prompt "DeepSpeed is a machine learning framework"

Stats:
--------------------------------------------------------------------------------------------------------------
Throughput (including tokenization) = 440.5303867789949 tokens/second
Number of HPU graphs                = 14
Memory allocated                    = 17.44 GB
Max memory allocated                = 17.7 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 131.36527838598704 seconds
--------------------------------------------------------------------------------------------------------------

deciLM7B, new token 512, bs 32

python run_generation.py --model_name_or_path /datasets/huggingface/hub/models--Deci--DeciLM-7B/snapshots/c3c9f4226801dc0433f32aebffe0aac68ee2f051/ --use_kv_cache --max_new_tokens 512 --batch_size 32 --bf16 --use_hpu_graphs --prompt "DeepSpeed is a machine learning framework"

Stats:
--------------------------------------------------------------------------------------------------------------
Throughput (including tokenization) = 859.8175255470775 tokens/second
Number of HPU graphs                = 14
Memory allocated                    = 16.68 GB
Max memory allocated                = 17.21 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 78.38171231899469 seconds
--------------------------------------------------------------------------------------------------------------

deciLM7B-instuct, new token 256, bs 64

python run_generation.py --model_name_or_path /datasets/huggingface/hub/models--Deci--DeciLM-7B-instruct/snapshots/4adc7aa9efe61b47b0a98b2cc94527d9c45c3b4f/ --use_kv_cache --max_new_tokens 256 --batch_size 64 --bf16 --use_hpu_graphs --prompt "DeepSpeed is a machine learning framework"

--------------------------------------------------------------------------------------------------------------
Throughput (including tokenization) = 1724.740007723637 tokens/second
Number of HPU graphs                = 14
Memory allocated                    = 16.74 GB
Max memory allocated                = 17.8 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 45.174995980996755 seconds
--------------------------------------------------------------------------------------------------------------

deciLM6b, new token 100, bs 1

 python run_generation.py --model_name_or_path Deci/DeciLM-6b --use_kv_cache --max_new_tokens 100 --batch_size 1 --bf16 --use_hpu_graphs --prompt "DeepSpeed is a machine learning framework"

Traceback (most recent call last):
  File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/run_generation.py", line 666, in <module>
    main()
  File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/run_generation.py", line 309, in main
    model, assistant_model, tokenizer, generation_config = initialize_model(args, logger)
  File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/utils.py", line 509, in initialize_model
    setup_model(args, model_dtype, model_kwargs, logger)
  File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/utils.py", line 214, in setup_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 523, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 931, in from_pretrained
    trust_remote_code = resolve_trust_remote_code(
  File "/usr/local/lib/python3.10/dist-packages/transformers/dynamic_module_utils.py", line 627, in resolve_trust_remote_code
    raise ValueError(
ValueError: Loading Deci/DeciLM-6b requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.

deciLM6b-instruct, new token 100, bs 1

 python run_generation.py --model_name_or_path Deci/DeciLM-6b-instruct --use_kv_cache --max_new_tokens 100 --batch_size 1 --bf16 --use_hpu_graphs --prompt "DeepSpeed is a machine learning framework"

Traceback (most recent call last):
  File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/run_generation.py", line 666, in <module>
    main()
  File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/run_generation.py", line 309, in main
    model, assistant_model, tokenizer, generation_config = initialize_model(args, logger)
  File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/utils.py", line 509, in initialize_model
    setup_model(args, model_dtype, model_kwargs, logger)
  File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/utils.py", line 214, in setup_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 523, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 931, in from_pretrained
    trust_remote_code = resolve_trust_remote_code(
  File "/usr/local/lib/python3.10/dist-packages/transformers/dynamic_module_utils.py", line 627, in resolve_trust_remote_code
    raise ValueError(
ValueError: Loading Deci/DeciLM-6b-instruct requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.

sywangyi · 2024-07-19T01:43:03Z

Hi, @imangohari1 , Thanks for the benchmark. Deci/DeciLM-6b-instruct and DeciLM-6b does not contain model type "deci" in https://huggingface.co/Deci/DeciLM-6b/blob/main/config.json. so it does not support local code mode. see logic in transformers. https://github.com/huggingface/transformers/blob/main/src/transformers/models/auto/configuration_auto.py#L973-L974
meanwhile the static shape optimization is determined by model type as well. see https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/transformers/generation/utils.py#L634. shall we need to hack the code to support it? the Deci 6B downloads is far behind of Deci 7B serial.

imangohari1 · 2024-07-19T16:27:06Z

Hi, @imangohari1 , Thanks for the benchmark. Deci/DeciLM-6b-instruct and DeciLM-6b does not contain model type "deci" in https://huggingface.co/Deci/DeciLM-6b/blob/main/config.json. so it does not support local code mode. see logic in transformers. https://github.com/huggingface/transformers/blob/main/src/transformers/models/auto/configuration_auto.py#L973-L974 meanwhile the static shape optimization is determined by model type as well. see https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/transformers/generation/utils.py#L634. shall we need to hack the code to support it? the Deci 6B downloads is far behind of Deci 7B serial.

@sywangyi
Thanks.
I was able to get DeciLM-6b running by adding "model_type": "deci", in the json file here (below).
I am not sure what hack you were thinking but I think having an option to run that model might be useful.

python run_generation.py --model_name_or_path Deci/DeciLM-6b --use_kv_cache --max_new_tokens 1024 --batch_size 16 --bf16 --use_hpu_graphs --prompt "DeepSpeed is a machine learning framework"


Stats:
---------------------------------------------------------------------------------------------------------------
Throughput (including tokenization) = 483.99941399787946 tokens/second
Number of HPU graphs                = 14
Memory allocated                    = 14.61 GB
Max memory allocated                = 14.88 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 120.89111913500528 seconds
---------------------------------------------------------------------------------------------------------------

python run_generation.py --model_name_or_path Deci/DeciLM-6b --use_kv_cache --max_new_tokens 100 --batch_size 1 --bf16 --use_hpu_graphs --prompt "DeepSpeed is a machine learning framework"


Stats:
---------------------------------------------------------------------------------------------------------------
Throughput (including tokenization) = 140.92129844475383 tokens/second
Number of HPU graphs                = 14
Memory allocated                    = 10.74 GB
Max memory allocated                = 10.75 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 7.28320421501121 seconds
---------------------------------------------------------------------------------------------------------------

sywangyi · 2024-07-22T02:15:44Z

@imangohari1 hack is like following.

it's only for single card generation case, need to apply to multi-card, peft generation case as well. also need to apply to language modeling example if you need to enable finetune for deci-6B. the problem is "model_type is missing in config.json, and local class could not be called automatically from AutoModelForCausalLM, so it needed to be called manually.

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

sywangyi · 2024-07-23T03:07:24Z

so, I think the correct solution is to port similar change like https://huggingface.co/Deci/DeciLM-7B/commit/0be2d64c57344399a148a5f9e9129b7d6a07aac0 to deci 6B. @imangohari1 WDYT?
I open a discussion, see https://huggingface.co/Deci/DeciLM-6b/discussions/6

imangohari1 · 2024-07-23T14:22:44Z

so, I think the correct solution is to port similar change like https://huggingface.co/Deci/DeciLM-7B/commit/0be2d64c57344399a148a5f9e9129b7d6a07aac0 to deci 6B. @imangohari1 WDYT? I open a discussion, see https://huggingface.co/Deci/DeciLM-6b/discussions/6

@sywangyi I agree. Let's see what respond we get from the model owners.

imangohari1 · 2024-07-29T14:47:37Z

so, I think the correct solution is to port similar change like https://huggingface.co/Deci/DeciLM-7B/commit/0be2d64c57344399a148a5f9e9129b7d6a07aac0 to deci 6B. @imangohari1 WDYT? I open a discussion, see https://huggingface.co/Deci/DeciLM-6b/discussions/6

@sywangyi I agree. Let's see what respond we get from the model owners.

@sywangyi
the model type was added for the deci-6b.
I was able to run it after re-downloading the config.json file.
Could you please confirm the same on your end? Tnx

sywangyi · 2024-07-30T01:04:50Z

yes, deci-6b worked by my side as well,
open a similar discussion for deciLM-6b-instruct. https://huggingface.co/Deci/DeciLM-6b-instruct/discussions/7

imangohari1

LGTM!
@regisss for your review when you had a chance.

deciLM support huggingface#1133

regisss

LGTM

deciLM support

ff18ba3

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

sywangyi requested review from ssarkar2, bhargaveede, vivekgoe and regisss as code owners July 12, 2024 04:52

fix batch=1 issue

000c462

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

libinta added the review wip label Jul 17, 2024

no need to change example

f322529

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

imangohari1 suggested changes Jul 18, 2024

View reviewed changes

sywangyi added 2 commits July 21, 2024 19:32

Merge branch 'main' into deciLM

42843bb

add ci case for DeciLM 7B

78eb128

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

Merge branch 'main' into deciLM

ed51d92

imangohari1 approved these changes Jul 30, 2024

View reviewed changes

libinta added run-test Run CI for PRs from external contributors synapse1.17 PR that should be available along with Synapse 1.17 but have no dependency on Synapse 1.17 content. and removed review wip labels Aug 1, 2024

mounikamandava added a commit to emascarenhas/optimum-habana that referenced this pull request Aug 2, 2024

Merge branch 'deciLM' into syn1.17tr4.43

b045985

deciLM support huggingface#1133

regisss approved these changes Aug 2, 2024

View reviewed changes

regisss merged commit 7738595 into main Aug 2, 2024
7 checks passed

regisss deleted the deciLM branch August 2, 2024 08:41

pi314ever mentioned this pull request Sep 19, 2024

Minicpm enabling #1342

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deciLM support #1133

deciLM support #1133

sywangyi commented Jul 12, 2024

sywangyi commented Jul 12, 2024

sywangyi commented Jul 12, 2024

HuggingFaceDocBuilderDev commented Jul 12, 2024

sywangyi commented Jul 17, 2024 •

edited

Loading

imangohari1 commented Jul 17, 2024

sywangyi commented Jul 17, 2024

imangohari1 commented Jul 17, 2024 •

edited

Loading

sywangyi commented Jul 18, 2024

regisss commented Jul 18, 2024

sywangyi commented Jul 18, 2024 •

edited

Loading

imangohari1 left a comment

sywangyi commented Jul 19, 2024 •

edited

Loading

imangohari1 commented Jul 19, 2024

sywangyi commented Jul 22, 2024

sywangyi commented Jul 23, 2024 •

edited

Loading

imangohari1 commented Jul 23, 2024

imangohari1 commented Jul 29, 2024 •

edited

Loading

sywangyi commented Jul 30, 2024 •

edited

Loading

imangohari1 left a comment

regisss left a comment

deciLM support #1133

deciLM support #1133

Conversation

sywangyi commented Jul 12, 2024

What does this PR do?

Before submitting

sywangyi commented Jul 12, 2024

sywangyi commented Jul 12, 2024

HuggingFaceDocBuilderDev commented Jul 12, 2024

sywangyi commented Jul 17, 2024 • edited Loading

imangohari1 commented Jul 17, 2024

sywangyi commented Jul 17, 2024

imangohari1 commented Jul 17, 2024 • edited Loading

sywangyi commented Jul 18, 2024

regisss commented Jul 18, 2024

sywangyi commented Jul 18, 2024 • edited Loading

imangohari1 left a comment

Choose a reason for hiding this comment

Tests

sywangyi commented Jul 19, 2024 • edited Loading

imangohari1 commented Jul 19, 2024

sywangyi commented Jul 22, 2024

sywangyi commented Jul 23, 2024 • edited Loading

imangohari1 commented Jul 23, 2024

imangohari1 commented Jul 29, 2024 • edited Loading

sywangyi commented Jul 30, 2024 • edited Loading

imangohari1 left a comment

Choose a reason for hiding this comment

regisss left a comment

Choose a reason for hiding this comment

sywangyi commented Jul 17, 2024 •

edited

Loading

imangohari1 commented Jul 17, 2024 •

edited

Loading

sywangyi commented Jul 18, 2024 •

edited

Loading

sywangyi commented Jul 19, 2024 •

edited

Loading

sywangyi commented Jul 23, 2024 •

edited

Loading

imangohari1 commented Jul 29, 2024 •

edited

Loading

sywangyi commented Jul 30, 2024 •

edited

Loading