-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend][PaddlePaddle] PaddlePaddle model with NCHW data format that supports quantization #16651
Conversation
…nto support_nhwc
@jiangjiajun I have added support for the PaddleSlim quantization model, but TVM seems to have issues in this regard. Test in PaddlePaddleI used MobileNetV1_QAT trained on PaddleSlim for testing, and the test code is: import paddle
import tvm
from tvm import relay
from tvm.contrib import graph_executor
import numpy as np
log_file = "tune.json"
if __name__ == "__main__":
input_shape = [1, 3, 224, 224]
input_name = "inputs"
paddle.enable_static()
prefix = "MobileNetV1_QAT/inference"
params_file_path = prefix + ".pdiparams"
exe = paddle.static.Executor(paddle.CPUPlace())
prog, feed_target_names, fetch_targets = paddle.static.load_inference_model(prefix, exe)
# build
mod, params = relay.frontend.from_paddle(prog, shape_dict={input_name: input_shape})
with tvm.transform.PassContext(opt_level=5):
lib = relay.build(mod, target="llvm", params=params)
# create input data
input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)
# tvm inference
ctx = tvm.cpu()
tvm_model = graph_executor.GraphModule(lib['default'](ctx))
tvm_model.set_input(input_name, input_data)
tvm_model.run()
tvm_output = tvm_model.get_output(0).asnumpy()
# paddle inference
paddle_output, = exe.run(prog, feed={feed_target_names[0]: input_data}, fetch_list=fetch_targets)
print(np.argmax(tvm_output[0]), np.argmax(paddle_output[0]))
np.testing.assert_allclose(tvm_output[0], paddle_output[0], rtol=1e-5, atol=1e-5) I found that the test failed with the following error:
Test in ONNXTo verify whether the problem is caused by the different inference mechanisms of TVM and Paddle frameworks, I conducted additional testing on the ONNX framework. The input model is the same model exported from Paddle2ONNX, and the test code is as follows: import tvm
from tvm import relay
from tvm.contrib import graph_executor
import numpy as np
import onnx
import onnxruntime as rt
onnx_model_path = "MobileNetV1_QAT/inference.onnx"
log_file = "tune.json"
if __name__ == "__main__":
input_shape = [1, 3, 224, 224]
input_name = "inputs"
# build
onnx_model = onnx.load_model(onnx_model_path)
mod, params = relay.frontend.from_onnx(onnx_model, shape={input_name: input_shape})
with tvm.transform.PassContext(opt_level=5):
lib = relay.build(mod, target="llvm", params=params)
# create input data
input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)
# tvm inference
ctx = tvm.cpu()
tvm_model = graph_executor.GraphModule(lib['default'](ctx))
tvm_model.set_input(input_name, input_data)
tvm_model.run()
tvm_output = tvm_model.get_output(0).asnumpy()
sess = rt.InferenceSession(onnx_model_path,None)
input_name = sess.get_inputs()[0].name
out_name = sess.get_outputs()[0].name
onnx_output = sess.run([out_name], {input_name:input_data})[0]
print(np.max(tvm_output[0] - onnx_output[0]))
print(np.argmax(tvm_output[0] - onnx_output[0]))
np.testing.assert_allclose(tvm_output[0], onnx_output[0], rtol=1e-5, atol=1e-5) I found that I still couldn't pass the test and the error was very large
|
@jiangjiajun Is this error acceptable for the PaddleInference model? I tested using a single convolution operator, and in most cases, it meets the requirement of relative and absolute errors within 10 ^ -5. Currently, it is estimated that the accumulation of errors from multiple convolution operators may have caused a change in the output results of the model. |
@tvm-bot rerun |
@lhutton1, I'm glad you could help me rerun the CI. However, the errors in CI [unity/pr-head] seem to remain unresolved, and the tests still fail. |
How about the difference between quantized paddle model and quantized onnx model? |
@jiangjiajun I integrated the inference code for TVM, PaddlePaddle, and ONNX. The code is as follows: import paddle
import tvm
from tvm import relay
from tvm.contrib import graph_executor
import numpy as np
import onnx
import onnxruntime as rt
# Model Attr
input_shape = [1, 3, 224, 224]
input_name = "inputs"
def infer_by_paddlepaddle(temp_prefix, temp_input_data):
paddle.enable_static()
exe = paddle.static.Executor(paddle.CPUPlace())
temp_prog, feed_target_names, fetch_targets = paddle.static.load_inference_model(temp_prefix, exe)
temp_output, = exe.run(temp_prog, feed={feed_target_names[0]: temp_input_data}, fetch_list=fetch_targets)
return temp_prog, temp_output
def infer_by_onnx(temp_model_path, temp_input_data):
sess = rt.InferenceSession(temp_model_path, None)
temp_input_name = sess.get_inputs()[0].name
out_name = sess.get_outputs()[0].name
temp_onnx_output = sess.run([out_name], {temp_input_name: temp_input_data})[0]
temp_onnx_model = onnx.load_model(temp_model_path)
return temp_onnx_model, temp_onnx_output
def infer_by_tvm(temp_model, temp_input_data):
if isinstance(temp_model, paddle.static.Program):
# model is loaded by `paddle.static.load_inference_model`
mod, params = relay.frontend.from_paddle(temp_model, shape_dict={input_name: input_shape})
else:
mod, params = relay.frontend.from_onnx(temp_model, shape={input_name: input_shape})
with tvm.transform.PassContext(opt_level=5):
lib = relay.build(mod, target="llvm", params=params)
# tvm inference
ctx = tvm.cpu()
tvm_model = graph_executor.GraphModule(lib['default'](ctx))
tvm_model.set_input(input_name, temp_input_data)
tvm_model.run()
tvm_output = tvm_model.get_output(0).asnumpy()
return tvm_output
log_file = "tune.json"
if __name__ == "__main__":
np.random.seed(520)
# create input data
input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)
paddle_prefix = "MobileNetV1_QAT/inference"
paddle_model, paddle_output = infer_by_paddlepaddle(paddle_prefix, input_data)
onnx_model_path = "MobileNetV1_QAT/inference.onnx"
onnx_model, onnx_output = infer_by_paddlepaddle(paddle_prefix, input_data)
# 对比测试Paddle模型和ONNX模型的输出(通过测试)
np.testing.assert_allclose(paddle_output[0], onnx_output[0], rtol=1e-5, atol=1e-5)
# 测试TVM_Paddle模型和TVM_ONNX模型的输出(通过测试)
tvm_paddle_result = infer_by_tvm(paddle_model, input_data)
tvm_onnx_result = infer_by_tvm(onnx_model, input_data)
np.testing.assert_allclose(tvm_paddle_result[0], tvm_onnx_result[0], rtol=1e-5, atol=1e-5)
# 测试Paddle模型和TVM_Paddle模型的输出
# np.testing.assert_allclose(tvm_paddle_result[0], paddle_output[0], rtol=1e-5, atol=1e-5)
# 测试ONNX模型和TVM_ONNX模型的输出
np.testing.assert_allclose(tvm_onnx_result[0], onnx_output[0], rtol=1e-5, atol=1e-5) I found that when inputting the same data, the output data of the Paddle model and the ONNX model are consistent. The differences between TVM and Paddle are as follows:
The differences between TVM and ONNX are as follows:
Therefore, my initial statement should be considered incorrect; under the same data conditions, both the Paddle model and the ONNX model exhibit the same symptoms. |
@tvm-bot rerun tvm-wasm |
Hello, @Hzfengsy. I noticed you have submitted PRs related to tvm-bot. I'd like to ask you, how can we rerun only the failed unit tests instead of using the "tvm-bot rerun" command to rerun all CI tests? This would help speed up the merging process for PRs. |
@tvm-bot rerun |
1 similar comment
@tvm-bot rerun |
@Zheng-Bicheng Good question. We do not have such a mechanism, because it's unsafe to only test failed tests. For example, we fixed test A which failed the last time, but may introduce a new failure test B. It's a good method when debugging locally, but not suitable for CI |
@tvm-bot rerun |
I understand what you mean, but I've found that CI currently encounters some unknown errors. For example: CI[lint/pr-head] (Lint 1 of 2) log : log.txt Each rerun may result in a different CI failure, and I haven't figured out what's causing it. It seems unrelated to the code I've submitted. |
@tvm-bot rerun |
@tvm-bot rerun |
1 similar comment
@tvm-bot rerun |
@tvm-bot rerun |
@tvm-bot rerun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…t supports quantization (apache#16651) * support conv2d when data_format is NHWC * modify the annotation * Do not convert input data when processing quantization conv_2d nodes * Fix code formatting issues * fixed error code format * update dequantize and quantize * fixed bug when model is fp32 model * update dequantize and quantize * update for paddle quantize model when format is NCHW
…t supports quantization (apache#16651) * support conv2d when data_format is NHWC * modify the annotation * Do not convert input data when processing quantization conv_2d nodes * Fix code formatting issues * fixed error code format * update dequantize and quantize * fixed bug when model is fp32 model * update dequantize and quantize * update for paddle quantize model when format is NCHW
PaddlePaddle model with NCHW data format that supports quantization