Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Torch] Add decomposition for 1d torch.nonzero #3876

Merged
merged 9 commits into from
Dec 19, 2024
Merged

Conversation

AmosLewis
Copy link
Collaborator

@AmosLewis AmosLewis commented Nov 15, 2024

Target model:
migraphx_onnx-model-zoo__gpt2-10
%350 = torch.operator "onnx.NonZero"(%349) : (!torch.vtensor<[?],si64>) -> !torch.vtensor<[1,?],si64>

module {
  func.func @main_graph(%arg0: !torch.vtensor<[?],i1>) -> !torch.vtensor<[1,?],si64>  attributes {torch.onnx_meta.ir_version = 9 : si64, torch.onnx_meta.opset_version = 20 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "2.6.0"} {
    %0 = torch.operator "onnx.NonZero"(%arg0) : (!torch.vtensor<[?],i1>) -> !torch.vtensor<[1,?],si64> 
    return %0 : !torch.vtensor<[1,?],si64> 
  }
}
%176 = torch.aten.nonzero %175 : !torch.vtensor<[?],si64> -> !torch.vtensor<[1,1],si64>
%177 = torch.aten.transpose.int %176, %int0, %int1 : !torch.vtensor<[1,1],si64>, !torch.int, !torch.int -> !torch.vtensor<[1,1],si64>

The python implementation: nonzero.py and decompose e2etest
To fix e2e test error in xida's previous draft #3721

Here is the bug and reproducer mlir: https://gist.github.com/AmosLewis/92717dbe4847649afefc915425629124

Running AtenNonzero1DModule_one_nonzero...
mismatched size for broadcast
./build_tools/ci/test_posix.sh: line 12: 3770074 Aborted                 (core dumped) python -m e2e_testing.main --config=onnx -v --filter AtenNonzero1DModule_one_nonzero

Related iree issue to be fixed: iree-org/iree#19481

@AmosLewis
Copy link
Collaborator Author

AmosLewis commented Nov 20, 2024

CI failed at MaskedScatterStaticBasic_basic_nonzerofailed.mlir, which lower to onnx.NonZero
%2 = torch.operator "onnx.NonZero"(%1) : (!torch.vtensor<[4,4],i1>) -> !torch.vtensor<[2,?],si64>

Running MaskedScatterStaticBasic_basic...
ERROR: Runtime op verification failed
"memref.store"(%589, %550, %552, %554) <{nontemporal = false}> : (i64, memref<2x2xi64>, index, index) -> ()
^ out-of-bounds access

@AmosLewis
Copy link
Collaborator Author

AmosLewis commented Nov 20, 2024

The issue probably arise from the end = %int-1 in torch.arrange op for dynamic input. Need to figure out a way to fix it.

%8 = torch.aten.arange.start_step %int0, %int-1, %int1, %none, %none, %none, %none : !torch.int, !torch.int, !torch.int, !torch.none, !torch.none, !torch.none, !torch.none -> !torch.vtensor<[?],si64>

    Value rangeTensor = rewriter.create<AtenArangeStartStepOp>(
        loc, cumulativeSumType, c(0),
        rewriter.create<ConstantIntOp>(
            loc, rewriter.getI64IntegerAttr(flattenedInputType.getSizes()[0])),
        one, noneCst, noneCst, noneCst, noneCst);

@AmosLewis
Copy link
Collaborator Author

AmosLewis commented Nov 20, 2024

After add dynamic support for AtenArangeStartStepOp, iree linalg bug move forward to
input>abi>preprocessing>global-optimization>dispatch>flow>>stream>executable-sources>executable-config>BUG
error_after_fix_dynamic_end.mlir

iree-compile --iree-hal-target-backends=llvm-cpu model.linalg.mlir -o model.vmfb --dump-compilation-phases-to=./tmp/
failed to translate executables
model.linalg.mlir:21:10: error: 'memref.alloca' op expected no unbounded stack allocations
    %1 = tensor.empty(%dim) : tensor<?xi64>
         ^
model.linalg.mlir:10:3: note: called from
  func.func @main_graph(%arg0: tensor<?xi1>) -> tensor<1x1xi64> {
  ^
model.linalg.mlir:21:10: note: see current operation: %14 = "memref.alloca"(%11) <{alignment = 64 : i64, operandSegmentSizes = array<i32: 1, 0>}> : (index) -> memref<?xi64>
    %1 = tensor.empty(%dim) : tensor<?xi64>
         ^
model.linalg.mlir:32:12: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "", cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", native_vector_size = 16 : i64, target_triple = "x86_64-unknown-unknown-eabi-elf"}>
    %7:2 = tm_tensor.scan dimension(0) inclusive(true) ins(%2 : tensor<?xi64>) outs(%4, %6 : tensor<?xi64>, tensor<i64>) {

If test only with torch-mlir

python -m e2e_testing.main --config=onnx -v --filter AtenNonzero1DModule_one_nonzero

****** Failed tests - 1 tests
    FAIL - "AtenNonzero1DModule_one_nonzero"
        @ trace item #0 - call to "forward"
        @ output of call to "forward"
        ERROR: value (Tensor with shape=[1, 1], dtype=torch.int64, min=+0.0, max=+0.0, mean=+0.0) is not close to golden value (Tensor with shape=[1, 1], dtype=torch.int64, min=+2.0, max=+2.0, mean=+2.0)


Summary:
    Failed: 1

@AmosLewis AmosLewis force-pushed the nonzero branch 4 times, most recently from e0312f9 to 1ac257b Compare December 5, 2024 06:07
@AmosLewis
Copy link
Collaborator Author

AmosLewis commented Dec 6, 2024

Even with static ([6], torch.bool, True),, module.forward(torch.tensor([0, 0, 0, 1, 0, 0], dtype=torch.int)) it also failed

****** Failed tests - 1 tests
    FAIL - "AtenNonzero1DModule_one_nonzero"
        @ trace item #0 - call to "forward"
        @ output of call to "forward"
        ERROR: value (Tensor with shape=[1, 1], dtype=torch.int64, min=+0.0, max=+0.0, mean=+0.0) is not close to golden value (Tensor with shape=[1, 1], dtype=torch.int64, min=+3.0, max=+3.0, mean=+3.0)

Copy link
Collaborator

@jinchen62 jinchen62 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just need to clean up the code.

@AmosLewis AmosLewis merged commit 51da49c into llvm:main Dec 19, 2024
3 checks passed
@AmosLewis AmosLewis deleted the nonzero branch December 19, 2024 21:40
rahuls-cerebras added a commit that referenced this pull request Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'memref.alloca' op expected no unbounded stack allocations
4 participants