-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QNN][TFLite] Parsing TFLite quantized models. #3900
Conversation
0f75650
to
a0e7be2
Compare
python/tvm/relay/frontend/tflite.py
Outdated
@@ -1030,4 +1153,6 @@ def from_tflite(model, shape_dict, dtype_dict): | |||
outputs = [exp_tab.get_expr(get_tensor_name(subgraph, i)) for i in model_outputs] | |||
outputs = outputs[0] if len(outputs) == 1 else _expr.Tuple(outputs) | |||
func = _expr.Function(analysis.free_vars(outputs), outputs) | |||
return _module.Module.from_expr(func), params | |||
mod = _module.Module.from_expr(func) | |||
mod = tvm.relay.qnn.transform.CanonicalizeOps()(mod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -848,6 +848,25 @@ def test_forward_inception_v4_net(): | |||
tvm.testing.assert_allclose(np.squeeze(tvm_output[0]), np.squeeze(tflite_output[0]), | |||
rtol=1e-5, atol=1e-5) | |||
|
|||
def test_forward_quantized_inception_v1_net(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we rename it to test_forward_qnn_inception_v1_net
a0e7be2
to
dc7b5b8
Compare
1e47074
to
ab3b78d
Compare
Looks like the latest version of this PR is dependent on #4033 for the renaming of qnn.quantized_dense to qnn.dense. |
ab3b78d
to
6ef06b3
Compare
@jackwish @FrozenGene @apivovarov @zhiics @shoubhik Ping to review please. This has been parked for sometime. I think this will benefit users who want to read pre-quantized models. |
This patch is a big step forward, the latest version works for a number of networks so would be good to get it merged sooner rather than later. There are a few common failure cases that we see, not sure if these are expected or unexpected at the moment: TVMError: Check failed: idx < data_.size() && data_[idx].second != 0: Attribute TOpPattern has not been registered for Operator qnn.requantize TVMError: Check failed: idx < data_.size() && data_[idx].second != 0: Attribute TOpPattern has not been registered for Operator qnn.dequantize |
@anijain2305 I am enjoying National Day holiday, plan to come back Oct 11. Would you mind waiting for my review after that? Thanks. |
@mshawcroft your issue should not be related with this pr. But it should be resolved. I think register for injective is enough. |
Thanks for checking it for many networks. You are seeing this error because you have to call
See the changes made in this PR in the file - tests/python/frontend/tflite/test_forward.py Pre-quantized models go through a Relay dialect called QNN. There are a couple of QNN passes that convert all QNN ops to a sequence of already existing Relay ops. Currently, these QNN passes are not a part of relay.build module and have to be called explicitly. We are working on simplifying this in another PR (#3971). Both of the PRs have been sitting idle for some time. Sorry for the inconvenience. We will try to get them in soon. |
6ef06b3
to
ad29bc6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM.
python/tvm/relay/frontend/tflite.py
Outdated
assert len(output_tensors) == 1, "output tensors should be 1" | ||
output_tensor = output_tensors[0] | ||
assert self.has_same_qnn_params(input_tensor, output_tensor), \ | ||
"qnn.op.reshape requires input and output qnn params to be same" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a requirement of tflite quantized outputs, then the error message should be :
"tflite.reshape requires input and output scale and zero_points to be the same"
else:
should this not be NotImplementedError(...) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Will update the message.
python/tvm/relay/frontend/tflite.py
Outdated
out = _op.nn.avg_pool2d(in_expr, **params) | ||
else: | ||
assert self.has_same_qnn_params(input_tensor, output_tensor), \ | ||
"qnn.op.avg_pool2d requires input and output qnn params to be same" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as the other one ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will change the message.
python/tvm/relay/frontend/tflite.py
Outdated
@@ -153,13 +155,24 @@ def get_tensors(self, tensors_idx_list): | |||
return_list = list() | |||
for tensor_idx in tensors_idx_list: | |||
if tensor_idx < 0: | |||
return_list.append(TensorWrapper(tensor_idx, 0, 0)) | |||
return_list.append(TensorWrapper(tensor_idx, 0, 0, None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value is None
. we don't need to change this line of code.
python/tvm/relay/frontend/tflite.py
Outdated
if qnn_params['scale'] == 0 and qnn_params['zero_point'] == 0: | ||
qnn_params = None | ||
return_list.append(TensorWrapper(tensor_idx, tensor, buffer, qnn_params)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better way is to check if qnn_params['scale'] != 0 or qnn_params['zero_point'] != 0:
, because the default value of qnn_params
is None
in line 166.
python/tvm/relay/frontend/tflite.py
Outdated
def has_same_qnn_params(self, tensor_a, tensor_b): | ||
return tensor_a.qnn_params['scale'] == tensor_b.qnn_params['scale'] and \ | ||
tensor_a.qnn_params['zero_point'] == tensor_b.qnn_params['zero_point'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor code style suggestion. tensor_a
be lhs_tensor
and tensor_b
be rhs_tensor
like we did in convert_elemwise
.
python/tvm/relay/frontend/tflite.py
Outdated
|
||
# If the tensors are quantized, ensure that input/output qnn params are same. | ||
if input_tensor.qnn_params is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be simplified if input_tensor.qnn_params:
python/tvm/relay/frontend/tflite.py
Outdated
# If the tensors are quantized, ensure that input/output qnn params are same. | ||
if input_tensor.qnn_params is not None: | ||
output_tensors = self.get_output_tensors(op) | ||
assert len(output_tensors) == 1, "output tensors should be 1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"output tensors should be 1" -> "The length of output tensors should be 1"
python/tvm/relay/frontend/tflite.py
Outdated
|
||
# Softmax does not go well with Int8. So Dequantize back to FP32. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better leave one TODO. Softmax's int8 computation is complex but we could leave it to implement when we have spare time. The same situation suits for like sigmoid
/ mean
operators.
python/tvm/relay/frontend/tflite.py
Outdated
@@ -738,7 +819,15 @@ def convert_conv(self, op, conv_type): | |||
raise tvm.error.OpAttributeUnImplemented( | |||
'Padding format {} is not supported for operator Conv.'.format(padding)) | |||
|
|||
out = _op.nn.conv2d(data=in_expr, weight=weight_expr, **params) | |||
# Check if the inputs are quantized. If yes, call qnn conv2d. | |||
if input_tensor.qnn_params is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment before.
python/tvm/relay/frontend/tflite.py
Outdated
.format('qnn.op.dense')) | ||
|
||
# Finally if the dense is quantized. Add a requantize at the end. | ||
if output_tensor.qnn_params is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment before.
python/tvm/relay/frontend/tflite.py
Outdated
@@ -581,7 +640,23 @@ def convert_fully_connected(self, op): | |||
|
|||
# If we have fused activations | |||
if fused_activation_fn != ActivationFunctionType.NONE: | |||
out = self.convert_fused_activation_function(out, fused_activation_fn) | |||
if output_tensor.qnn_params is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment before.
@@ -1029,3 +1048,6 @@ def test_forward_ssd_mobilenet_v1(): | |||
test_forward_inception_v3_net() | |||
test_forward_inception_v4_net() | |||
test_forward_ssd_mobilenet_v1() | |||
|
|||
# End to End quantized | |||
test_forward_qnn_inception_v1_net() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us add Mobilenet V2 quant model if this PR could run it. Because it is very popular and often compared across different framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Till now I have been focussing on the server side, and thus I used Inception. Can this PR go in first and I can send a PR for mobilenet V2 separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no problem. However, I have one question that do we have any TODO for Mobilenet V2 quant model? From my knowledge, we should work well on Mobilenet V2 quant model using this PR and just add test code of Mobilenet V2 quant model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is. I want to do a full 50k image test before adding MobileNetV2. I have done that for InceptionV1 and ensured that its accuracy is same as that provided in TFLite info. I will setup the pre-processing for mobilenet next week, and check end-to-end accuracy, and send a separate PR. What do you say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to do it you mention. Because we are parsing TFLite quant model and TFLite framework is one baseline. We could treat TFLite quant model like TFLite FP32 model and compare the result between us and TFLite framework. They are no difference. I think we only need to do end-to-end accuracy when we do quantization in TVM inside or we have strong reason to implement ops but don't keep compatibility with TFLite framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added MobileNet V1. MobilenetV2 fails with a segfault in LLVM. Will debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the detail information of LLVM report? Try the latest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the whole stack trace. Happens while the graph runtime create. Relay build has already passed.
0x00007fff78dd1e64 in llvm::EVT::getExtendedVectorNumElements() const () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
(gdb) bt
#0 0x00007fff78dd1e64 in llvm::EVT::getExtendedVectorNumElements() const () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#1 0x00007fff78ac1090 in llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::APInt const&, llvm::KnownBits&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#2 0x00007fff78ac0318 in llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::APInt const&, llvm::KnownBits&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#3 0x00007fff78abe55f in llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::KnownBits&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const ()
from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#4 0x00007fff789c7b65 in (anonymous namespace)::DAGCombiner::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#5 0x00007fff789a4339 in (anonymous namespace)::DAGCombiner::visit(llvm::SDNode*) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#6 0x00007fff7897670c in (anonymous namespace)::DAGCombiner::combine(llvm::SDNode*) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#7 0x00007fff78975cd3 in llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AAResults*, llvm::CodeGenOpt::Level) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#8 0x00007fff78aaa7b2 in llvm::SelectionDAGISel::CodeGenAndEmitDAG() () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#9 0x00007fff78aa9939 in llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#10 0x00007fff78aa6c06 in llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#11 0x00007fff787c2ffe in (anonymous namespace)::X86DAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#12 0x00007fff78cb0ff4 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#13 0x00007fff7976c4ba in llvm::FPPassManager::runOnFunction(llvm::Function&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#14 0x00007fff7976c843 in llvm::FPPassManager::runOnModule(llvm::Module&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#15 0x00007fff7976cdbf in llvm::legacy::PassManagerImpl::run(llvm::Module&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#16 0x00007fff78bdee38 in llvm::MCJIT::emitObject(llvm::Module*) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#17 0x00007fff78bdf094 in llvm::MCJIT::generateCodeForModule(llvm::Module*) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#18 0x00007fff78be038e in llvm::MCJIT::findSymbol(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#19 0x00007fff78bdfe58 in llvm::MCJIT::getSymbolAddress(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) ()
from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#20 0x00007fff78be04ba in llvm::MCJIT::getGlobalValueAddress(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#21 0x00007fff77dcb794 in tvm::codegen::LLVMModuleNode::LazyInitJIT() () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#22 0x00007fff77dca02e in tvm::codegen::LLVMModuleNode::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<tvm::runtime::ModuleNode> const&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#23 0x00007fff77e0db68 in tvm::runtime::GraphRuntime::CreateTVMOp(tvm::runtime::TVMOpParam const&, std::vector<DLTensor, std::allocator<DLTensor> > const&, unsigned long) ()
from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#24 0x00007fff77e0afe4 in tvm::runtime::GraphRuntime::SetupOpExecs() () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#25 0x00007fff77e09aad in tvm::runtime::GraphRuntime::Init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::Module, std::vector<DLContext, std::allocator<DLContext> > const&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#26 0x00007fff77e0f10b in tvm::runtime::GraphRuntimeCreate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::Module const&, std::vector<DLContext, std::allocator<DLContext> > const&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#27 0x00007fff77e10f8b in std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::$_12>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#28 0x00007fff77de5577 in TVMFuncCall () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#29 0x00007fff7e54ee20 in ffi_call_unix64 () from /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so
#30 0x00007fff7e54e88b in ffi_call () from /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so
#31 0x00007fff7e54901a in _ctypes_callproc () from /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so
#32 0x00007fff7e53cfcb in ?? () from /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so
#33 0x00000000005c3bd7 in PyObject_Call ()
#34 0x00000000005354af in PyEval_EvalFrameEx ()
#35 0x000000000053a81b in PyEval_EvalCodeEx ()
#36 0x00000000004e3423 in ?? ()
#37 0x00000000005c3bd7 in PyObject_Call ()
#38 0x00000000004f08be in ?? ()
#39 0x00000000005c3bd7 in PyObject_Call ()
ad29bc6
to
9703fd5
Compare
@FrozenGene Thanks! Can you please review again? I incorporated your comments. I would suggest to have a separate PR for mobilenet. I can take care of that in next couple of weeks. |
9703fd5
to
9fcd02f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. For MobilenetV2, I think we can send a separate PR because it looks to me that it wouldn't change structure of this PR (much). @FrozenGene how do you think?
|
||
def test_forward_qnn_mobilenet_v1_net(): | ||
"""Test the Quantized TFLite Mobilenet V1 model.""" | ||
# MobilenetV2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/MobilenetV2/MobilenetV1
Add a todo for MobilenetV2?
|
||
def test_forward_qnn_mobilenet_v1_net(): | ||
"""Test the Quantized TFLite Mobilenet V1 model.""" | ||
# MobilenetV2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MobilebetV1
tvm_output = run_tvm_graph(tflite_model_buf, data, 'input') | ||
tvm_predictions = np.squeeze(tvm_output) | ||
tvm_sorted_labels = tvm_predictions.argsort()[-3:][::-1] | ||
tvm.testing.assert_allclose(tvm_sorted_labels, tflite_sorted_labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain more why we can not compare tflite_out with tvm_out directly like we do in FP32? I think we could get the same result if we have kept the compatibility with TFLite, if not, why we don’t keep the compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only difference between TFLite and QNN is rounding. TFLite reference implementation is reverse-engineered from ARM assembly instructions (which perform 2 roundings in general). QNN follows more formal rounding described here. If you are interested, I would encourage to read the inlined comments in the code - https://github.com/dmlc/tvm/blob/068c148f01fcc81c72004512de320ca1cee24dc6/src/relay/qnn/util.cc#L79-L133
There was also a fair amount of discussion which has been summarized here - #3591 (comment). This should also be helpful.
This rounding can make small variations in the output, because of which we can not do exact check.
I have measured accuracy on TFLite quantized Inception V1-V3 on 50K images using QNN, and observed accuracy parity with TFLite accuracy native execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the information! @anijain2305 However, if I remember correctly, when I review the code of requantize
, I mention that we could use round_away_from_zero
, which is used for TFLite. Do we change it later?
Alright, if we change the default value later, In my opinion, I think we should pass round_away_from_zero
to requantize operator for TFLite model frontend parsing.The reasons are:
- Keep the compatibility with TFLite should be the first priority. Because we are parsing TFLite model, the quantization accuracy is controlled by Tensorflow quantization-aware training and TFLite.
- When we run new quantization model of TFLite, we can not get test suites all the time. For example, the customers / algo. team sends us the TFLite model and requires us boost the performance. We should make sure the result is the same as TFLite. The simplest way is to compare the result with TFLite, not do end-to-end accuracy, sometimes we even can not do it.
- For our development, we could also do the simple compare with TFLite like FP32, not do end-to-end accuracy. If one op need to requantize internally, we could compare it with TFLite, not need to think about the result is right or not using math when we have different result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFlite rounding is not exactly formal round_away_from_zero
. You might already know but these are the two functions that are involved in TFLite rounding
using gemmlowp::RoundingDivideByPOT;
using gemmlowp::SaturatingRoundingDoublingHighMul;
The key idea is that the above two functions map to ARM assembly instructions. But, in this process, they perform approximation that make it little different from formal GNU C rounding. The reasoning behind doing that should be to get performance.
In QNN, we decided to follow GNU C rounding, as thats more formal and can work across more frameworks (like MxNet, PyTorch and not just TFLite).
However, if one wants, one can implement a TFLite rounding. The code is modular enough to add one more rounding option. I think you have good comfortability with TFLite, so I would encourage you to write TFLite rounding if you get time :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully understand your point across more frameworks. My previous point is to emphasize we could provide one option for our TFLite frontend parser so that we could keep the compatibility with TFLite, we don't need to change any default value in our existing code, which could be used for MXNet, PyTorch and so on. Otherwise, the users will confuse why our result is not the same as TFLite when they use TVM. Keep the compatibility with TFLite should be the first priority when we parse TFLite model. So I think we shouldn't leave this to implement by TVM users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, as you said, it doesn't affect the code structure. If you agree the point of compatibility point, I think we could make this PR in firstly and open one issue reminding us we should do this thing. If you don't have spare time, I could help to manage it next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can open an issue to track the TFLite rounding. It would not affect code structure. We can set the value of the rounding in TFLite parser, hiding it from the TVM users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Could you help to open this this to track it? And you could assign this task to me.
9fcd02f
to
b6be6bb
Compare
@u99127 I have incorporated your comments. Please review again. Also please approve explicitly, it helps committers :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM generally, some code style comments and model specific concerns only. Thank you for the ping, and sorry for the delayed feedback. I was on holiday and busy with some other projects.
One thing I'd like to comment is that, as the QNN operators implementation is different from TFLite (I mean not trying to emulate float point with integer in operators like quantize/dequantize/requantize), making QNN dialect output elementwisely equals to TFLite in bits seems not a must (TFLite becomes a reference but not a mirror) - not the target of these QNN operators. The evaluation of the design should be the model accuracy I think.
input_scale=input_tensor.qnn_params['scale'], | ||
input_zero_point=input_tensor.qnn_params['zero_point']) | ||
|
||
out = _op.nn.softmax(in_expr, **params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems generates float output for softmax, would it be better if we quantize it back to uint8 since the semantic of a uint8 model requires an uint8 output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would indeed be better. Doing that.
raise tvm.error.OpNotImplemented( | ||
'Operator {} with fused activation is not supported yet.' | ||
.format('qnn.op.concatenate')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if we can have an clamp here to support it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to handle different types of activations like Relu/Relu6. I have a draft implementation of activation handling. I will send a separate PR for that once I test it with few cases.
python/tvm/relay/frontend/tflite.py
Outdated
@@ -584,7 +639,14 @@ def convert_fully_connected(self, op): | |||
weight_value = self.get_tensor_value(weight_tensor) | |||
weight_expr = self.exp_tab.new_const(weight_value, dtype=weight_tensor_type_str) | |||
|
|||
out = _op.nn.dense(in_expr, weight_expr) | |||
# Check if the inputs are quantized. If yes, call qnn dense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty easy-to-read logic making the comments not needed to me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
python/tvm/relay/frontend/tflite.py
Outdated
if not input_tensor.qnn_params: | ||
out = _op.nn.dense(in_expr, weight_expr) | ||
else: | ||
out = _qnn.op.dense(in_expr, weight_expr, | ||
input_zero_point=input_tensor.qnn_params['zero_point'], | ||
kernel_zero_point=weight_tensor.qnn_params['zero_point'], | ||
out_dtype='int32') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally, i suggest to write such logic as below because the human are more easy to understand not logic to me...
if input_tensor.qnn_params:
# path 1
else:
# path 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Readability is very important. Will change the style.
raise tvm.error.OpNotImplemented( | ||
'Operator {} with fused activation is not supported yet.' | ||
.format('qnn.op.conv2d')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's interesting, AFAIK, TFLite model converter always fuses the ReLU family activations to Conv/FC operators. The activation of Inception model you are using is not fused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, TFLite hosted models do not have any fused activation. There is a fused activation in Object Detection models, but we are far away from supporting object detection (basically dependent on control flow support).
if output_tensor.qnn_params: | ||
input_scale = input_tensor.qnn_params['scale'] * weight_tensor.qnn_params['scale'] | ||
input_zero_point = 0 | ||
out = _qnn.op.requantize(out, | ||
input_scale=input_scale, | ||
input_zero_point=input_zero_point, | ||
output_scale=output_tensor.qnn_params['scale'], | ||
output_zero_point=output_tensor.qnn_params['zero_point'], | ||
out_dtype=output_tensor_type_str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can have a small function, requantize_output(out, output_tensor, input_tensor)
to wrapper this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. But, I will skip this one for now. Currently, there are only 2 uses of requantize. Both of which are quite specific - where we have to multiply the input scales to get the new input scale. As we add more operators, and thus more requantize, I will revisit this.
assert self.has_same_qnn_params(input_tensor, output_tensor), \ | ||
"TFLite avg_pool2dreshape requires input and output scale" \ | ||
"and zero points to be equal" | ||
out = _op.cast(in_expr, dtype="int32") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style consistency of the "
/'
:)
b6be6bb
to
133e49d
Compare
Thanks @anijain2305 @jackwish @FrozenGene @mshawcroft @u99127 for the hard work on this PR. Since we now have approvals from several organizations and the parser is in a good shape, I am going to merge it. @anijain2305 please be aware of some further work on it, like MobileNetV2 and activation handling. |
Thanks @zhiics Yes, I will follow up on both MobilenetV2 and activation handing. Will open a discuss forum to address @FrozenGene concerns regarding TFLite output exact match as well. |
As of now, QNN dialect has enough QNN ops for Inception model. This PR takes the TFLite Pre-quantized model and parses it for a handful of QNN operators.
@jackwish @FrozenGene @apivovarov @zhiics @shoubhik