Error in quantize vicuna-7b model from fp16 to int8 #20867
Labels
ep:CUDA
issues related to the CUDA execution provider
quantization
issues related to quantization
stale
issues that have not been addressed in a while; categorized by a bot
Describe the issue
use shape_inference.quant_pre_process to preprocess will result in error even if i set skip_optimization=True
after that, i use quantize_dynamic, it successfully quantize the model to int8, but it fails to load it back
To reproduce
Urgency
Urgent, paper diliver deadline is coming !
Platform
Linux
OS Version
ubuntu22.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.17
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA11.8
The text was updated successfully, but these errors were encountered: