You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The llava example in python/llm/example/GPU/PyTorch-Models/Model/llava can not work correctly when ENV BIGDL_QUANTIZE_KV_CACHE is set to 1.
Running generate.py after all the steps in README.md, we get a model with the following structure:
The script will crash after reading input from terminal, when calling method model.generate, the final part of the traceback is:
File "C:\Users\arda\miniforge3\envs\llava=test\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\arda\miniforge3\envs\llava=test\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 1068, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "C:\Users\arda\miniforge3\envs\llava=test\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\arda\miniforge3\envs\llava=test\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\arda\miniforge3\envs\llava=test\Lib\site-packages\ipex_llm\transformers\models\llama.py", line 323, in llama_decoder_forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "C:\Users\arda\miniforge3\envs\llava=test\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\arda\miniforge3\envs\llava=test\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\arda\miniforge3\envs\llava=test\Lib\site-packages\ipex_llm\transformers\models\llama.py", line 1539, in llama_attention_forward_4_38
return forward_function(
^^^^^^^^^^^^^^^^^
File "C:\Users\arda\miniforge3\envs\llava=test\Lib\site-packages\ipex_llm\transformers\models\llama.py", line 1743, in llama_attention_forward_4_38_quantized
attn_output = xe_addons.sdp_fp8(query_states, key_states, value_states, new_attn_mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: expected scalar type Byte but found Float
The text was updated successfully, but these errors were encountered:
The llava example in
python/llm/example/GPU/PyTorch-Models/Model/llava
can not work correctly when ENVBIGDL_QUANTIZE_KV_CACHE
is set to1
.Running
generate.py
after all the steps inREADME.md
, we get a model with the following structure:Model Structure
The script will crash after reading input from terminal, when calling method
model.generate
, the final part of the traceback is:The text was updated successfully, but these errors were encountered: