Failed to do quantization for models like EleutherAI/gpt-neox-20b and bigscience/bloom-7b1

### Describe the bug

MODEL_ID="/models/models--EleutherAI--gpt-neox-20b"
mkdir saved_results_gpt_neox
python run_gpt-neox_int8.py --ipex-weight-only-quantization --output-dir "saved_results_gpt_neox" --jit  -m ${MODEL_ID} --int8


MODEL_ID="/models/models--bigscience--bloom-7b1"
mkdir saved_results_bloom
python run_bloom_int8.py --ipex-weight-only-quantization --output-dir "saved_results_bloom" --jit -m ${MODEL_ID} --int8-bf16-mixed

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:39<00:00, 19.74s/it]
Some weights of BloomForCausalLM were not initialized from the model checkpoint at /models/models--bigscience--bloom-7b1 and are newly initialized: ['lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Data type of the model: torch.float32
/opt/conda/envs/llm/lib/python3.9/site-packages/transformers/models/bloom/modeling_bloom.py:105: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  base = torch.tensor(
/opt/conda/envs/llm/lib/python3.9/site-packages/intel_extension_for_pytorch/cpu/transformers/attentions.py:143: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_shape[1] + past_key_values_length != attention_mask.shape[1]:
/opt/conda/envs/llm/lib/python3.9/site-packages/intel_extension_for_pytorch/cpu/transformers/attentions.py:153: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seq_length > 1:
not implementednot implementednot implementednot implementednot implementednot implementednot implemented
not implemented
not implemented
not implemented
not implemented



nnot implemented
not implemented

not implemented

### Versions

llm_feature_branch latest self compiled version
git clone --branch llm_feature_branch https://github.com/intel/intel-extension-for-pytorch.git
cd intel-extension-for-pytorch
git submodule sync && git submodule update --init --recursive
export DNNL_GRAPH_BUILD_COMPILER_BACKEND=1
export CXXFLAGS="${CXXFLAGS} -D__STDC_FORMAT_MACROS"
python setup.py install
cd ../

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Failed to do quantization for models like EleutherAI/gpt-neox-20b and bigscience/bloom-7b1 #438

Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Failed to do quantization for models like EleutherAI/gpt-neox-20b and bigscience/bloom-7b1 #438

Description

Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions