Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deciLM support #1133

Merged
merged 6 commits into from
Aug 2, 2024
Merged

deciLM support #1133

merged 6 commits into from
Aug 2, 2024

Conversation

sywangyi
Copy link
Collaborator

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
@sywangyi
Copy link
Collaborator Author

@yao-matrix

@sywangyi
Copy link
Collaborator Author

Hi, @regisss do we have plan to optimize the remote-code model like deci in habana? If yes, could it do like this way?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
@sywangyi
Copy link
Collaborator Author

sywangyi commented Jul 17, 2024

throught(tokens/s) A100 Gaudi2
BF16 43.5 121.7
FP32 42.67 54.59

python run_generation.py
--model_name_or_path Deci/DeciLM-7B
--use_kv_cache
--max_new_tokens 100
--batch_size 1
--bf16
--use_hpu_graphs
--prompt "DeepSpeed is a machine learning framework" \

@imangohari1
Copy link
Contributor

Hi @sywangyi
I have the following suggestions for this PR.
Let me know if you have any questions.

@sywangyi
Copy link
Collaborator Author

@imangohari1 my question is just to align could we support remote code model like this way? there's no remote code model support in the optimum-habana now. If it's yes. I will have a follow-up about your suggestion.

@imangohari1
Copy link
Contributor

imangohari1 commented Jul 17, 2024

@imangohari1 my question is just to align could we support remote code model like this way? there's no remote code model support in the optimum-habana now. If it's yes. I will have a follow-up about your suggestion.

I am not sure. Let's get @regisss input.
If the implemenation here is unclear, I suggest changing this to draft @sywangyi

@sywangyi
Copy link
Collaborator Author

@imangohari1 my question is just to align could we support remote code model like this way? there's no remote code model support in the optimum-habana now. If it's yes. I will have a follow-up about your suggestion.

I am not sure. Let's get @regisss input. If the implemenation here is unclear, I suggest changing this to draft @sywangyi

I've made the PR open to draw everyone's attention to it which would accelerate the decision making. Thanks for your feedback.

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
@regisss
Copy link
Collaborator

regisss commented Jul 18, 2024

Yes, it's fine doing like that. Please specify the exact commit hash your code is based on.
I guess we still need to set trust_remote_code to True when instantiating the model right?

@sywangyi
Copy link
Collaborator Author

sywangyi commented Jul 18, 2024

trust_remote_code

trust_remote_code is not needed any more in this way. it will use the model implementation in optimum-habana. because I registered it in adapt_transformers_to_gaudi
transformers.AutoConfig.register("deci", DeciLMConfig)
transformers.AutoModelForCausalLM.register(DeciLMConfig, DeciLMForCausalLM)

Copy link
Contributor

@imangohari1 imangohari1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sywangyi
I have tested these changes and it works with DeciLM-7B and it works, but it does not work with DeciLM-6b (below details).
Please:

Thank you.

Tests

deciLM7B, new token 1028, bs 16

python run_generation.py --model_name_or_path /datasets/huggingface/hub/models--Deci--DeciLM-7B/snapshots/c3c9f4226801dc0433f32aebffe0aac68ee2f051/ --use_kv_cache --max_new_tokens 1028 --batch_size 16 --bf16 --use_hpu_graphs --prompt "DeepSpeed is a machine learning framework"
Stats:
--------------------------------------------------------------------------------------------------------------
Throughput (including tokenization) = 440.5303867789949 tokens/second
Number of HPU graphs                = 14
Memory allocated                    = 17.44 GB
Max memory allocated                = 17.7 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 131.36527838598704 seconds
--------------------------------------------------------------------------------------------------------------

deciLM7B, new token 512, bs 32

python run_generation.py --model_name_or_path /datasets/huggingface/hub/models--Deci--DeciLM-7B/snapshots/c3c9f4226801dc0433f32aebffe0aac68ee2f051/ --use_kv_cache --max_new_tokens 512 --batch_size 32 --bf16 --use_hpu_graphs --prompt "DeepSpeed is a machine learning framework"
Stats:
--------------------------------------------------------------------------------------------------------------
Throughput (including tokenization) = 859.8175255470775 tokens/second
Number of HPU graphs                = 14
Memory allocated                    = 16.68 GB
Max memory allocated                = 17.21 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 78.38171231899469 seconds
--------------------------------------------------------------------------------------------------------------

deciLM7B-instuct, new token 256, bs 64

python run_generation.py --model_name_or_path /datasets/huggingface/hub/models--Deci--DeciLM-7B-instruct/snapshots/4adc7aa9efe61b47b0a98b2cc94527d9c45c3b4f/ --use_kv_cache --max_new_tokens 256 --batch_size 64 --bf16 --use_hpu_graphs --prompt "DeepSpeed is a machine learning framework"
--------------------------------------------------------------------------------------------------------------
Throughput (including tokenization) = 1724.740007723637 tokens/second
Number of HPU graphs                = 14
Memory allocated                    = 16.74 GB
Max memory allocated                = 17.8 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 45.174995980996755 seconds
--------------------------------------------------------------------------------------------------------------

deciLM6b, new token 100, bs 1

 python run_generation.py --model_name_or_path Deci/DeciLM-6b --use_kv_cache --max_new_tokens 100 --batch_size 1 --bf16 --use_hpu_graphs --prompt "DeepSpeed is a machine learning framework"
Traceback (most recent call last):
  File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/run_generation.py", line 666, in <module>
    main()
  File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/run_generation.py", line 309, in main
    model, assistant_model, tokenizer, generation_config = initialize_model(args, logger)
  File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/utils.py", line 509, in initialize_model
    setup_model(args, model_dtype, model_kwargs, logger)
  File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/utils.py", line 214, in setup_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 523, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 931, in from_pretrained
    trust_remote_code = resolve_trust_remote_code(
  File "/usr/local/lib/python3.10/dist-packages/transformers/dynamic_module_utils.py", line 627, in resolve_trust_remote_code
    raise ValueError(
ValueError: Loading Deci/DeciLM-6b requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.

deciLM6b-instruct, new token 100, bs 1

 python run_generation.py --model_name_or_path Deci/DeciLM-6b-instruct --use_kv_cache --max_new_tokens 100 --batch_size 1 --bf16 --use_hpu_graphs --prompt "DeepSpeed is a machine learning framework"
Traceback (most recent call last):
  File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/run_generation.py", line 666, in <module>
    main()
  File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/run_generation.py", line 309, in main
    model, assistant_model, tokenizer, generation_config = initialize_model(args, logger)
  File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/utils.py", line 509, in initialize_model
    setup_model(args, model_dtype, model_kwargs, logger)
  File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/utils.py", line 214, in setup_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 523, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 931, in from_pretrained
    trust_remote_code = resolve_trust_remote_code(
  File "/usr/local/lib/python3.10/dist-packages/transformers/dynamic_module_utils.py", line 627, in resolve_trust_remote_code
    raise ValueError(
ValueError: Loading Deci/DeciLM-6b-instruct requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.

@sywangyi
Copy link
Collaborator Author

sywangyi commented Jul 19, 2024

Hi, @imangohari1 , Thanks for the benchmark. Deci/DeciLM-6b-instruct and DeciLM-6b does not contain model type "deci" in https://huggingface.co/Deci/DeciLM-6b/blob/main/config.json. so it does not support local code mode. see logic in transformers. https://github.com/huggingface/transformers/blob/main/src/transformers/models/auto/configuration_auto.py#L973-L974
meanwhile the static shape optimization is determined by model type as well. see https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/transformers/generation/utils.py#L634. shall we need to hack the code to support it? the Deci 6B downloads is far behind of Deci 7B serial.

@imangohari1
Copy link
Contributor

Hi, @imangohari1 , Thanks for the benchmark. Deci/DeciLM-6b-instruct and DeciLM-6b does not contain model type "deci" in https://huggingface.co/Deci/DeciLM-6b/blob/main/config.json. so it does not support local code mode. see logic in transformers. https://github.com/huggingface/transformers/blob/main/src/transformers/models/auto/configuration_auto.py#L973-L974 meanwhile the static shape optimization is determined by model type as well. see https://github.com/huggingface/optimum-habana/blob/main/optimum/habana/transformers/generation/utils.py#L634. shall we need to hack the code to support it? the Deci 6B downloads is far behind of Deci 7B serial.

@sywangyi
Thanks.
I was able to get DeciLM-6b running by adding "model_type": "deci", in the json file here (below).
I am not sure what hack you were thinking but I think having an option to run that model might be useful.

python run_generation.py --model_name_or_path Deci/DeciLM-6b --use_kv_cache --max_new_tokens 1024 --batch_size 16 --bf16 --use_hpu_graphs --prompt "DeepSpeed is a machine learning framework"

Stats:
---------------------------------------------------------------------------------------------------------------
Throughput (including tokenization) = 483.99941399787946 tokens/second
Number of HPU graphs                = 14
Memory allocated                    = 14.61 GB
Max memory allocated                = 14.88 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 120.89111913500528 seconds
---------------------------------------------------------------------------------------------------------------
python run_generation.py --model_name_or_path Deci/DeciLM-6b --use_kv_cache --max_new_tokens 100 --batch_size 1 --bf16 --use_hpu_graphs --prompt "DeepSpeed is a machine learning framework"

Stats:
---------------------------------------------------------------------------------------------------------------
Throughput (including tokenization) = 140.92129844475383 tokens/second
Number of HPU graphs                = 14
Memory allocated                    = 10.74 GB
Max memory allocated                = 10.75 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 7.28320421501121 seconds
---------------------------------------------------------------------------------------------------------------

@sywangyi
Copy link
Collaborator Author

@imangohari1 hack is like following.
image
it's only for single card generation case, need to apply to multi-card, peft generation case as well. also need to apply to language modeling example if you need to enable finetune for deci-6B. the problem is "model_type is missing in config.json, and local class could not be called automatically from AutoModelForCausalLM, so it needed to be called manually.

sywangyi added 2 commits July 21, 2024 19:32
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
@sywangyi
Copy link
Collaborator Author

sywangyi commented Jul 23, 2024

so, I think the correct solution is to port similar change like https://huggingface.co/Deci/DeciLM-7B/commit/0be2d64c57344399a148a5f9e9129b7d6a07aac0 to deci 6B. @imangohari1 WDYT?
I open a discussion, see https://huggingface.co/Deci/DeciLM-6b/discussions/6

@imangohari1
Copy link
Contributor

so, I think the correct solution is to port similar change like https://huggingface.co/Deci/DeciLM-7B/commit/0be2d64c57344399a148a5f9e9129b7d6a07aac0 to deci 6B. @imangohari1 WDYT? I open a discussion, see https://huggingface.co/Deci/DeciLM-6b/discussions/6

@sywangyi I agree. Let's see what respond we get from the model owners.

@imangohari1
Copy link
Contributor

imangohari1 commented Jul 29, 2024

so, I think the correct solution is to port similar change like https://huggingface.co/Deci/DeciLM-7B/commit/0be2d64c57344399a148a5f9e9129b7d6a07aac0 to deci 6B. @imangohari1 WDYT? I open a discussion, see https://huggingface.co/Deci/DeciLM-6b/discussions/6

@sywangyi I agree. Let's see what respond we get from the model owners.

@sywangyi
the model type was added for the deci-6b.
I was able to run it after re-downloading the config.json file.
Could you please confirm the same on your end? Tnx

@sywangyi
Copy link
Collaborator Author

sywangyi commented Jul 30, 2024

yes, deci-6b worked by my side as well,
open a similar discussion for deciLM-6b-instruct. https://huggingface.co/Deci/DeciLM-6b-instruct/discussions/7

Copy link
Contributor

@imangohari1 imangohari1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
@regisss for your review when you had a chance.

@libinta libinta added run-test Run CI for PRs from external contributors synapse1.17 PR that should be available along with Synapse 1.17 but have no dependency on Synapse 1.17 content. and removed review wip labels Aug 1, 2024
mounikamandava added a commit to emascarenhas/optimum-habana that referenced this pull request Aug 2, 2024
Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@regisss regisss merged commit 7738595 into main Aug 2, 2024
7 checks passed
@regisss regisss deleted the deciLM branch August 2, 2024 08:41
@pi314ever pi314ever mentioned this pull request Sep 19, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-test Run CI for PRs from external contributors synapse1.17 PR that should be available along with Synapse 1.17 but have no dependency on Synapse 1.17 content.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants