Description
Describe the bug
MODEL_ID="/models/models--EleutherAI--gpt-neox-20b"
mkdir saved_results_gpt_neox
python run_gpt-neox_int8.py --ipex-weight-only-quantization --output-dir "saved_results_gpt_neox" --jit -m ${MODEL_ID} --int8
MODEL_ID="/models/models--bigscience--bloom-7b1"
mkdir saved_results_bloom
python run_bloom_int8.py --ipex-weight-only-quantization --output-dir "saved_results_bloom" --jit -m ${MODEL_ID} --int8-bf16-mixed
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:39<00:00, 19.74s/it]
Some weights of BloomForCausalLM were not initialized from the model checkpoint at /models/models--bigscience--bloom-7b1 and are newly initialized: ['lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Data type of the model: torch.float32
/opt/conda/envs/llm/lib/python3.9/site-packages/transformers/models/bloom/modeling_bloom.py:105: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
base = torch.tensor(
/opt/conda/envs/llm/lib/python3.9/site-packages/intel_extension_for_pytorch/cpu/transformers/attentions.py:143: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if input_shape[1] + past_key_values_length != attention_mask.shape[1]:
/opt/conda/envs/llm/lib/python3.9/site-packages/intel_extension_for_pytorch/cpu/transformers/attentions.py:153: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_length > 1:
not implementednot implementednot implementednot implementednot implementednot implementednot implemented
not implemented
not implemented
not implemented
not implemented
nnot implemented
not implemented
not implemented
Versions
llm_feature_branch latest self compiled version
git clone --branch llm_feature_branch https://github.com/intel/intel-extension-for-pytorch.git
cd intel-extension-for-pytorch
git submodule sync && git submodule update --init --recursive
export DNNL_GRAPH_BUILD_COMPILER_BACKEND=1
export CXXFLAGS="${CXXFLAGS} -D__STDC_FORMAT_MACROS"
python setup.py install
cd ../