Skip to content

Failed to do quantization for models like EleutherAI/gpt-neox-20b and bigscience/bloom-7b1 #438

Open
@RenyanDiao

Description

@RenyanDiao

Describe the bug

MODEL_ID="/models/models--EleutherAI--gpt-neox-20b"
mkdir saved_results_gpt_neox
python run_gpt-neox_int8.py --ipex-weight-only-quantization --output-dir "saved_results_gpt_neox" --jit -m ${MODEL_ID} --int8

MODEL_ID="/models/models--bigscience--bloom-7b1"
mkdir saved_results_bloom
python run_bloom_int8.py --ipex-weight-only-quantization --output-dir "saved_results_bloom" --jit -m ${MODEL_ID} --int8-bf16-mixed

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:39<00:00, 19.74s/it]
Some weights of BloomForCausalLM were not initialized from the model checkpoint at /models/models--bigscience--bloom-7b1 and are newly initialized: ['lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Data type of the model: torch.float32
/opt/conda/envs/llm/lib/python3.9/site-packages/transformers/models/bloom/modeling_bloom.py:105: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
base = torch.tensor(
/opt/conda/envs/llm/lib/python3.9/site-packages/intel_extension_for_pytorch/cpu/transformers/attentions.py:143: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if input_shape[1] + past_key_values_length != attention_mask.shape[1]:
/opt/conda/envs/llm/lib/python3.9/site-packages/intel_extension_for_pytorch/cpu/transformers/attentions.py:153: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_length > 1:
not implementednot implementednot implementednot implementednot implementednot implementednot implemented
not implemented
not implemented
not implemented
not implemented

nnot implemented
not implemented

not implemented

Versions

llm_feature_branch latest self compiled version
git clone --branch llm_feature_branch https://github.com/intel/intel-extension-for-pytorch.git
cd intel-extension-for-pytorch
git submodule sync && git submodule update --init --recursive
export DNNL_GRAPH_BUILD_COMPILER_BACKEND=1
export CXXFLAGS="${CXXFLAGS} -D__STDC_FORMAT_MACROS"
python setup.py install
cd ../

Metadata

Metadata

Assignees

No one assigned

    Labels

    CPUCPU specific issuesCrashExecution crashesLLM

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions