Error in quantize vicuna-7b model from fp16 to int8 #20867

JackWeiw · 2024-05-30T09:20:44Z

Describe the issue

use shape_inference.quant_pre_process to preprocess will result in error even if i set skip_optimization=True

after that, i use quantize_dynamic, it successfully quantize the model to int8, but it fails to load it back

To reproduce

Urgency

Urgent, paper diliver deadline is coming !

Platform

Linux

OS Version

ubuntu22.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA11.8

xadupre · 2024-06-03T17:37:31Z

Did you try to see if it works with onnxruntime==1.18?

JackWeiw · 2024-06-04T03:37:14Z

Did you try to see if it works with onnxruntime==1.18?

I switch to onnxruntime==1.18, there it still return the same error when i try to pre-process

if i simply use quantize_dynamic, it works fine, but it fails to check_model

I set op_version as default(14) when export from PyTorch, my torch version is torch2.3-cu11.8.
Do you have any insights?

xadupre · 2024-06-04T08:44:29Z

Are you using the latest onnx package?

JackWeiw · 2024-06-05T04:32:46Z

Are you using the latest onnx package?

I have updated onnx to 1.16.1, onnxruntime to 1.18.0, than it succeed in quantization

howerver, when i tried to run it in onnxruntime, it report

github-actions · 2024-07-07T15:02:01Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

github-actions bot added the ep:CUDA issues related to the CUDA execution provider label May 30, 2024

JackWeiw mentioned this issue May 30, 2024

[Performance] Regression observed when using CUDA execution provider #20712

Closed

sophies927 added the quantization issues related to quantization label Jun 6, 2024

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Jul 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in quantize vicuna-7b model from fp16 to int8 #20867

Error in quantize vicuna-7b model from fp16 to int8 #20867

JackWeiw commented May 30, 2024

xadupre commented Jun 3, 2024

JackWeiw commented Jun 4, 2024

xadupre commented Jun 4, 2024

JackWeiw commented Jun 5, 2024

github-actions bot commented Jul 7, 2024

Error in quantize vicuna-7b model from fp16 to int8 #20867

Error in quantize vicuna-7b model from fp16 to int8 #20867

Comments

JackWeiw commented May 30, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

xadupre commented Jun 3, 2024

JackWeiw commented Jun 4, 2024

xadupre commented Jun 4, 2024

JackWeiw commented Jun 5, 2024

github-actions bot commented Jul 7, 2024