-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deciLM support #1133
deciLM support #1133
Conversation
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Hi, @regisss do we have plan to optimize the remote-code model like deci in habana? If yes, could it do like this way? |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
python run_generation.py |
Hi @sywangyi
|
@imangohari1 my question is just to align could we support remote code model like this way? there's no remote code model support in the optimum-habana now. If it's yes. I will have a follow-up about your suggestion. |
I am not sure. Let's get @regisss input. |
I've made the PR open to draw everyone's attention to it which would accelerate the decision making. Thanks for your feedback. |
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Yes, it's fine doing like that. Please specify the exact commit hash your code is based on. |
trust_remote_code is not needed any more in this way. it will use the model implementation in optimum-habana. because I registered it in adapt_transformers_to_gaudi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sywangyi
I have tested these changes and it works with DeciLM-7B
and it works, but it does not work with DeciLM-6b
(below details).
Please:
- Revisit/test for
DeciLM-6b
/DeciLM-6b-instruct
models. https://huggingface.co/Deci/DeciLM-6b - Sync/rebase at the top of OH main.
Thank you.
Tests
deciLM7B, new token 1028, bs 16
python run_generation.py --model_name_or_path /datasets/huggingface/hub/models--Deci--DeciLM-7B/snapshots/c3c9f4226801dc0433f32aebffe0aac68ee2f051/ --use_kv_cache --max_new_tokens 1028 --batch_size 16 --bf16 --use_hpu_graphs --prompt "DeepSpeed is a machine learning framework"
Stats:
--------------------------------------------------------------------------------------------------------------
Throughput (including tokenization) = 440.5303867789949 tokens/second
Number of HPU graphs = 14
Memory allocated = 17.44 GB
Max memory allocated = 17.7 GB
Total memory available = 94.62 GB
Graph compilation duration = 131.36527838598704 seconds
--------------------------------------------------------------------------------------------------------------
deciLM7B, new token 512, bs 32
python run_generation.py --model_name_or_path /datasets/huggingface/hub/models--Deci--DeciLM-7B/snapshots/c3c9f4226801dc0433f32aebffe0aac68ee2f051/ --use_kv_cache --max_new_tokens 512 --batch_size 32 --bf16 --use_hpu_graphs --prompt "DeepSpeed is a machine learning framework"
Stats:
--------------------------------------------------------------------------------------------------------------
Throughput (including tokenization) = 859.8175255470775 tokens/second
Number of HPU graphs = 14
Memory allocated = 16.68 GB
Max memory allocated = 17.21 GB
Total memory available = 94.62 GB
Graph compilation duration = 78.38171231899469 seconds
--------------------------------------------------------------------------------------------------------------
deciLM7B-instuct, new token 256, bs 64
python run_generation.py --model_name_or_path /datasets/huggingface/hub/models--Deci--DeciLM-7B-instruct/snapshots/4adc7aa9efe61b47b0a98b2cc94527d9c45c3b4f/ --use_kv_cache --max_new_tokens 256 --batch_size 64 --bf16 --use_hpu_graphs --prompt "DeepSpeed is a machine learning framework"
--------------------------------------------------------------------------------------------------------------
Throughput (including tokenization) = 1724.740007723637 tokens/second
Number of HPU graphs = 14
Memory allocated = 16.74 GB
Max memory allocated = 17.8 GB
Total memory available = 94.62 GB
Graph compilation duration = 45.174995980996755 seconds
--------------------------------------------------------------------------------------------------------------
deciLM6b, new token 100, bs 1
python run_generation.py --model_name_or_path Deci/DeciLM-6b --use_kv_cache --max_new_tokens 100 --batch_size 1 --bf16 --use_hpu_graphs --prompt "DeepSpeed is a machine learning framework"
Traceback (most recent call last):
File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/run_generation.py", line 666, in <module>
main()
File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/run_generation.py", line 309, in main
model, assistant_model, tokenizer, generation_config = initialize_model(args, logger)
File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/utils.py", line 509, in initialize_model
setup_model(args, model_dtype, model_kwargs, logger)
File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/utils.py", line 214, in setup_model
model = AutoModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 523, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 931, in from_pretrained
trust_remote_code = resolve_trust_remote_code(
File "/usr/local/lib/python3.10/dist-packages/transformers/dynamic_module_utils.py", line 627, in resolve_trust_remote_code
raise ValueError(
ValueError: Loading Deci/DeciLM-6b requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.
deciLM6b-instruct, new token 100, bs 1
python run_generation.py --model_name_or_path Deci/DeciLM-6b-instruct --use_kv_cache --max_new_tokens 100 --batch_size 1 --bf16 --use_hpu_graphs --prompt "DeepSpeed is a machine learning framework"
Traceback (most recent call last):
File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/run_generation.py", line 666, in <module>
main()
File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/run_generation.py", line 309, in main
model, assistant_model, tokenizer, generation_config = initialize_model(args, logger)
File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/utils.py", line 509, in initialize_model
setup_model(args, model_dtype, model_kwargs, logger)
File "/devops/sgohari/tests/codes/pr-reviews/pr-1133/optimum-habana/examples/text-generation/utils.py", line 214, in setup_model
model = AutoModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 523, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 931, in from_pretrained
trust_remote_code = resolve_trust_remote_code(
File "/usr/local/lib/python3.10/dist-packages/transformers/dynamic_module_utils.py", line 627, in resolve_trust_remote_code
raise ValueError(
ValueError: Loading Deci/DeciLM-6b-instruct requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.
Hi, @imangohari1 , Thanks for the benchmark. Deci/DeciLM-6b-instruct and DeciLM-6b does not contain model type "deci" in https://huggingface.co/Deci/DeciLM-6b/blob/main/config.json. so it does not support local code mode. see logic in transformers. https://github.com/huggingface/transformers/blob/main/src/transformers/models/auto/configuration_auto.py#L973-L974 |
@sywangyi
python run_generation.py --model_name_or_path Deci/DeciLM-6b --use_kv_cache --max_new_tokens 100 --batch_size 1 --bf16 --use_hpu_graphs --prompt "DeepSpeed is a machine learning framework"
|
@imangohari1 hack is like following. |
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
so, I think the correct solution is to port similar change like https://huggingface.co/Deci/DeciLM-7B/commit/0be2d64c57344399a148a5f9e9129b7d6a07aac0 to deci 6B. @imangohari1 WDYT? |
@sywangyi I agree. Let's see what respond we get from the model owners. |
@sywangyi |
yes, deci-6b worked by my side as well, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@regisss for your review when you had a chance.
deciLM support huggingface#1133
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What does this PR do?
Fixes # (issue)
Before submitting