We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
源2.0-M32大模型研发团队深入分析当前主流的量化方案,综合评估模型压缩效果和精度损失表现,最终采用了GPTQ量化方法,并采用AutoGPTQ作为量化框架。
Model: Yuan2-M32-HF-INT4 https://blog.csdn.net/2401_82700030/article/details/141469514 容器: intelanalytics/ipex-llm-serving-xpu-vllm-0.5.4-experimental:2.2.0b1
Test Step: Log into container:
Config.yaml 配置:
运行报错 , 结果如下log. Results Log:
The text was updated successfully, but these errors were encountered:
I try to reproduce it and meet the same issue again. And as I found that.
yuan
# https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/b403a2beb2746c0c923b4eb936fe1e2560c83b19/docs/README_GPTQ_CN.md#3-gptq%E9%87%8F%E5%8C%96%E6%A8%A1%E5%9E%8B%E7%9A%84%E6%8E%A8%E7%90%86 quantized_model_dir = "/mnt/beegfs2/Yuan2-M32-GPTQ-int4" # `gptq_model-4bit-128g.safetensors 0-2` tokenizer = LlamaTokenizer.from_pretrained('/mnt/beegfs2/Yuan2-M32-GPTQ-int4', add_eos_token=False, add_bos_token=False, eos_token='<eod>') model = AutoGPTQForCausalLM.from_quantized(quantized_model_dir, device="cuda:0", trust_remote_code=True)
Sorry, something went wrong.
hzjane
No branches or pull requests
源2.0-M32大模型研发团队深入分析当前主流的量化方案,综合评估模型压缩效果和精度损失表现,最终采用了GPTQ量化方法,并采用AutoGPTQ作为量化框架。
Model: Yuan2-M32-HF-INT4 https://blog.csdn.net/2401_82700030/article/details/141469514
容器: intelanalytics/ipex-llm-serving-xpu-vllm-0.5.4-experimental:2.2.0b1
Test Step:
Log into container:
docker exec -ti arc_vllm-new-2 bash
cd /benchmark/all-in-one/
vim config.yaml
Config.yaml 配置:
run-arc.sh
运行报错 , 结果如下log.
Results Log:
The text was updated successfully, but these errors were encountered: