Skip to content

[Bug] The Onnx Frontend incorrectly mapping the Operator implementation version #18698

@locnd182644

Description

@locnd182644

Description

The Onnx Frontend incorrectly mapping the Operator implementation version. Specifically, frontend has selected implement an operator with a higher version than the model's opset number.

Steps to Reproduce

  • Graph: ReduceMean Node
Image
  • Using tvm.relax.frontend.onnx.from_onnx with ignore pass opset, Relax module wrong output shape.

Expected behavior

Mean Relax have axis = 2 and output shape = (1, 68, 18)

. . .

lv90: R.Tensor((1, 516, 4, 18), dtype="float32") = R.concat((lv86, lv87, lv88, lv89), axis=1)
lv91: R.Tensor((1, 68, 4, 18), dtype="float32") = R.nn.conv2d(lv90, metadata["relax.expr.Constant"][30], strides=[1, 1], padding=[0, 0, 0, 0], dilation=[1, 1], groups=1, data_layout="NCHW", kernel_layout="OIHW", out_layout="NCHW", out_dtype="void")
lv92: R.Tensor((1, 68, 1, 1), dtype="float32") = R.reshape(metadata["relax.expr.Constant"][31], R.shape([1, 68, 1, 1]))
lv93: R.Tensor((1, 68, 4, 18), dtype="float32") = R.add(lv91, lv92)
gv: R.Tensor((1, 68, 18), dtype="float32") = R.mean(lv93, axis=[2], keepdims=False)
R.output(gv)

Actual behavior

Mean Relax have axis = None and output shape = ( )

. . .

lv90: R.Tensor((1, 516, 4, 18), dtype="float32") = R.concat((lv86, lv87, lv88, lv89), axis=1)
lv91: R.Tensor((1, 68, 4, 18), dtype="float32") = R.nn.conv2d(lv90, metadata["relax.expr.Constant"][30], strides=[1, 1], padding=[0, 0, 0, 0], dilation=[1, 1], groups=1, data_layout="NCHW", kernel_layout="OIHW", out_layout="NCHW", out_dtype="void")
lv92: R.Tensor((1, 68, 1, 1), dtype="float32") = R.reshape(metadata["relax.expr.Constant"][31], R.shape([1, 68, 1, 1]))
lv93: R.Tensor((1, 68, 4, 18), dtype="float32") = R.add(lv91, lv92)
gv: R.Tensor((), dtype="float32") = R.mean(lv93, axis=None, keepdims=False)
R.output(gv)

Debug

Image
  • In method get_converter of OnnxOpConverter, the impl_v mapping algorithm takes the highest version among all versions <= opset. If there is no implementation version <= opset, the index of the versions will be a negative number.
  • Rotate to take the latest Implement of Op (Negative indexing starts at -1 for the last element) -> Wrong Implement Operator

Go back example: ReduceMean Operator in Onnx

Possible fix method

  • Suggestion 1: Using onnx.defs.get_schema
    • Using "from onnx import defs" to look up the Operator schema based on the model's opset. Then get Operator Version Exactly to Implement. Result as ai.onnx.v1 in Image.
    • Todo: Implement all Version of ONNX Operator.
Image
  • Suggestion 2: Add check empty impl, raise error
    • In method get_converter of OnnxOpConverter, filter Impl Version <= Opset, then get max this list versions to take Impl Version of Op. If this list filter versions are Empty, raise "Not found implement of operator compatible with the model's opset".
    • Todo: Implement version Op with different schemas (input, atrr)

Triage

  • needs-triage
  • type: bug
  • frontend:onnx

What do you think about this issue? I would be happy to hear different opinions from everyone.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triagePRs or issues that need to be investigated by maintainers to find the right assignees to address ittype: bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions