fix of use of unquantized weights in cohere GQA loading, also enable … #2291

sywangyi · 2024-07-24T06:15:20Z

…the model in intel platform

fix the crash

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 231, in serve_inner
model = get_model(

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/init.py", line 782, in get_model
return FlashCausalLM(

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 900, in init
model = model_class(prefix, config, weights)

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_cohere_modeling.py", line 495, in init
self.model = FlashCohereModel(prefix, config, weights)

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_cohere_modeling.py", line 426, in init
[

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_cohere_modeling.py", line 427, in
FlashCohereLayer(

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_cohere_modeling.py", line 366, in init
self.self_attn = FlashCohereAttention(

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_cohere_modeling.py", line 223, in init
self.query_key_value = load_attention(config, prefix, weights)

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_cohere_modeling.py", line 149, in load_attention
return _load_gqa(config, prefix, weights)

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_cohere_modeling.py", line 170, in _load_gqa
weight = weight.to(dtype=weights.dtype).to(device=weights.device)

AttributeError: 'UnquantizedWeight' object has no attribute 'to'

also enable it in intel platform

@OlivierDehaene OR @Narsil

…the model in intel platform Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

danieldk

Looks great, thanks!

#2291) fix of use of unquantized weights in cohere GQA loading, also enable the model in intel platform Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

huggingface#2291) fix of use of unquantized weights in cohere GQA loading, also enable the model in intel platform Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

fix of use of unquantized weights in cohere GQA loading, also enable …

0c651ac

…the model in intel platform Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

danieldk approved these changes Jul 24, 2024

View reviewed changes

danieldk merged commit 8642250 into huggingface:main Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix of use of unquantized weights in cohere GQA loading, also enable … #2291

fix of use of unquantized weights in cohere GQA loading, also enable … #2291

sywangyi commented Jul 24, 2024

danieldk left a comment

fix of use of unquantized weights in cohere GQA loading, also enable … #2291

fix of use of unquantized weights in cohere GQA loading, also enable … #2291

Conversation

sywangyi commented Jul 24, 2024

danieldk left a comment

Choose a reason for hiding this comment