Skip to content

llama_int8 do not support do_sample=True #430

Open
@markluofd

Description

@markluofd

Describe the bug

with demo run_llama_int8.py, setting generate_kwargs["do_sample"] to be True, I got the error as follows:

command:
python run_llama_int8.py -m ${MODEL_ID} --quantized-model-path "/workspace/saved_results/best_model.pt" --benchmark --jit --int8-bf16-mixed --num-iter 5 --prompt "hello"

error log:
/opt/conda/lib/python3.9/site-packages/transformers/generation/utils.py:1405: UserWarning: You are calling .generate() with the input_ids being on a device type different than your model's device. input_ids is on cpu, whereas the model is on meta. You may experience unexpected behaviors or slower generation. Please make sure that you have put input_ids to the correct device by calling for example input_ids = input_ids.to('meta') before running .generate().
warnings.warn(
Traceback (most recent call last):
File "/lzw/run_llama_int8.py", line 378, in
output = user_model.generate(
File "/opt/conda/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/transformers/generation/utils.py", line 1485, in generate
return self.sample(
File "/opt/conda/lib/python3.9/site-packages/transformers/generation/utils.py", line 2524, in sample
outputs = self(
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1522, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1531, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/intel_extension_for_pytorch/cpu/transformers/models.py", line 624, in LlamaForCausalLM_forward
outputs = self.model(
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1522, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1531, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/intel_extension_for_pytorch/cpu/transformers/models.py", line 283, in LlamaModel_forward
attention_mask = self._prepare_decoder_attention_mask(
File "/opt/conda/lib/python3.9/site-packages/intel_extension_for_pytorch/cpu/transformers/attentions.py", line 65, in _prepare_decoder_attention_mask
combined_attention_mask = _make_causal_mask(
File "/opt/conda/lib/python3.9/site-packages/intel_extension_for_pytorch/cpu/transformers/attentions.py", line 18, in _make_causal_mask
mask = torch.full(
NotImplementedError: Could not run 'aten::_local_scalar_dense' with arguments from the 'Meta' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_local_scalar_dense' is only available for these backends: [CPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

do_sample is an import feature for me.

Versions

[pip3] intel-extension-for-pytorch==2.1.0.dev0+cpu.llm
[pip3] numpy==1.24.1
[pip3] torch==2.1.0.dev20230711+cpu
[pip3] torchaudio==2.1.0.dev20230711+cpu
[pip3] torchvision==0.16.0.dev20230711+cpu
[conda] intel-extension-for-pytorch 2.1.0.dev0+cpu.llm pypi_0 pypi
[conda] numpy 1.24.1 pypi_0 pypi
[conda] torch 2.1.0.dev20230711+cpu pypi_0 pypi
[conda] torchaudio 2.1.0.dev20230711+cpu pypi_0 pypi
[conda] torchvision 0.16.0.dev20230711+cpu pypi_0 pypi

Metadata

Metadata

Assignees

No one assigned

    Labels

    CPUCPU specific issuesCrashExecution crashesLLM

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions