Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve CI testing failure for Lazy Tensor Core #1088

Merged
merged 5 commits into from
Jul 25, 2022

Conversation

henrytwo
Copy link
Member

@henrytwo henrytwo commented Jul 20, 2022

  • xfails e2e tests for unsupported ops, which should allow CI tests to pass
  • removes passes tests from xfails set
  • included dynamic_ir.cpp to source list, which resolved free() error in CI
  • registered FuncDialect, which resolved MLIR error in CI
  • enabled e2e LTC tests for macOS and source build

cc: @ke1337 @antoniojkim

@henrytwo henrytwo self-assigned this Jul 20, 2022
@antoniojkim
Copy link
Collaborator

We're still seeing

free(): invalid pointer

which is causing a whole bunch of tests to fail. We're unable to reproduce this locally.

@silvasean @powderluv How should we proceed? What can be done to try and root cause this issue?

@silvasean
Copy link
Contributor

We're still seeing

free(): invalid pointer

which is causing a whole bunch of tests to fail. We're unable to reproduce this locally.

@silvasean @powderluv How should we proceed? What can be done to try and root cause this issue?

Did this ever work? If so, can we bisect to find where it failed?

@henrytwo
Copy link
Member Author

We're still seeing

free(): invalid pointer

which is causing a whole bunch of tests to fail. We're unable to reproduce this locally.
@silvasean @powderluv How should we proceed? What can be done to try and root cause this issue?

Did this ever work? If so, can we bisect to find where it failed?

We've never had it run successfully thru CI before

@silvasean
Copy link
Contributor

Does passing -s (single process) on the CI avoid the issue? This does sound really painful to debug... thinking...

@henrytwo
Copy link
Member Author

Does passing -s (single process) on the CI avoid the issue? This does sound really painful to debug... thinking...

erm now it dies without running the test?

Run cd $GITHUB_WORKSPACE
free(): invalid pointer
/home/runner/work/_temp/c3431cb3-81e1-42fe-8eef-24f84e[9](https://github.com/llvm/torch-mlir/runs/7438401407?check_suite_focus=true#step:8:10)91a68.sh: line 3: 17199 Aborted                 (core dumped) python -m e2e_testing.torchscript.main --config=lazy_tensor_core -v -s
Error: Process completed with exit code 134.

@silvasean
Copy link
Contributor

Does passing -s (single process) on the CI avoid the issue? This does sound really painful to debug... thinking...

erm now it dies without running the test?

Run cd $GITHUB_WORKSPACE
free(): invalid pointer
/home/runner/work/_temp/c3431cb3-81e1-42fe-8eef-24f84e[9](https://github.com/llvm/torch-mlir/runs/7438401407?check_suite_focus=true#step:8:10)91a68.sh: line 3: 17199 Aborted                 (core dumped) python -m e2e_testing.torchscript.main --config=lazy_tensor_core -v -s
Error: Process completed with exit code 134.

Oh, that is good and expected. That means that the multiprocessing was swallowing the crash by restarting the worker process. I think from here you can run the testing process under gdb/lldb if it is in the VM image, and have it break on free and then backtrace.

This appears to be a memory corruption issue, so likely running it locally with AddressSanitizer would flag it, even if locally there are no symptoms on a normal build.

btw, have you verified that locally you are doing the same Release+Asserts build as CI? This is the command line:

cd $GITHUB_WORKSPACE
  mkdir build
  cd build
  cmake $GITHUB_WORKSPACE/externals/llvm-project/llvm -GNinja \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_LINKER=lld \
    -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
    -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ \
    -DPython[3](https://github.com/llvm/torch-mlir/runs/7438401407?check_suite_focus=true#step:4:3)_EXECUTABLE=$(which python) \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DLLVM_ENABLE_PROJECTS=mlir \
    -DLLVM_EXTERNAL_PROJECTS="torch-mlir;torch-mlir-dialects" \
    -DLLVM_EXTERNAL_TORCH_MLIR_SOURCE_DIR="$GITHUB_WORKSPACE" \
    -DLLVM_EXTERNAL_TORCH_MLIR_DIALECTS_SOURCE_DIR="${GITHUB_WORKSPACE}/external/llvm-external-projects/torch-mlir-dialects" \
    -DMLIR_ENABLE_BINDINGS_PYTHON=ON \
    -DLLVM_TARGETS_TO_BUILD=host
  ninja check-torch-mlir-all

@antoniojkim
Copy link
Collaborator

btw, have you verified that locally you are doing the same Release+Asserts build as CI?

When I try to run the same cmake command as CI, I get the following error:

torch-mlir-dialects: command not found

And when I run ninja check-torch-mlir-all using the build that we already had, it all passes without any problems

@henrytwo
Copy link
Member Author

For those following along, we got a stack trace: https://github.com/llvm/torch-mlir/runs/7452705521?check_suite_focus=true

free(): invalid pointer
Thread 1 "python" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7a0b859 in __GI_abort () at abort.c:79
#2  0x00007ffff7a7626e in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7ba0298 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007ffff7a7e2fc in malloc_printerr (str=str@entry=0x7ffff7b9e4c1 "free(): invalid pointer") at malloc.c:5347
#4  0x00007ffff7a7fb2c in _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:4173
#5  0x00007fffde851738 in c10::impl::BoxedKernelWrapper<at::Tensor (at::Tensor const&, at::Tensor const&), void>::call(void (*)(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*), c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#6  0x00007fffdeb0d841 in at::_ops::mm::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#7  0x00007fffe009907a in torch::autograd::VariableType::(anonymous namespace)::mm(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#8  0x00007fffe0099eb3 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), &torch::autograd::VariableType::(anonymous namespace)::mm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#9  0x00007fffdeb514d6 in at::_ops::mm::call(at::Tensor const&, at::Tensor const&) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#10 0x00007ffff6484650 in torch::autograd::THPVariable_mm(_object*, _object*, _object*) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_python.so
#11 0x00007ffff7d6fdc2 in cfunction_call (func=func@entry=0x7fffdc44f590, args=args@entry=0x7fffcb03b080, kwargs=kwargs@entry=0x0) at Objects/methodobject.c:543
#12 0x00007ffff7d48420 in _PyObject_MakeTpCall (tstate=0x55555555c6b0, callable=0x7fffdc44f590, args=0x7fffcb31ba90, nargs=2, keywords=0x0) at Objects/call.c:191
#13 0x00007ffff7daf618 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fffcb31ba90, callable=0x7fffdc44f590, tstate=<optimized out>) at ./Include/cpython/abstract.h:116
#14 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fffcb31ba90, callable=0x7fffdc44f590, tstate=<optimized out>) at ./Include/cpython/abstract.h:103
#15 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fffcb31ba90, callable=0x7fffdc44f590) at ./Include/cpython/abstract.h:127
#16 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c6b0) at Python/ceval.c:5077
#17 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3489
#18 0x00007ffff7d48d33 in _PyEval_EvalFrame (throwflag=0, f=0x7fffcb31b900, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#19 function_code_fastcall (tstate=0x55555555c6b0, co=<optimized out>, args=<optimized out>, nargs=3, globals=<optimized out>) at Objects/call.c:330
#20 0x00007ffff7d4a874 in _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>, tstate=<optimized out>) at ./Include/cpython/abstract.h:118
#21 method_vectorcall (method=<optimized out>, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/classobject.c:61
#22 0x00007ffff7d498b3 in PyVectorcall_Call (callable=0x7fffcdfbe6c0, tuple=<optimized out>, kwargs=<optimized out>) at Objects/call.c:231
#23 0x00007ffff7daed7b in do_call_core (kwdict=0x0, callargs=0x7fffcad35840, func=0x7fffcdfbe6c0, tstate=<optimized out>) at Python/ceval.c:5125
#24 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3582
#25 0x00007ffff7d48d33 in _PyEval_EvalFrame (throwflag=0, f=0x7fffcad3[94](https://github.com/llvm/torch-mlir/runs/7452705521?check_suite_focus=true#step:8:95)40, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#26 function_code_fastcall (tstate=0x55555555c6b0, co=<optimized out>, args=<optimized out>, nargs=3, globals=<optimized out>) at Objects/call.c:330
#27 0x00007ffff7daaeab in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x555558929410, callable=0x7fffce221040, tstate=0x55555555c6b0) at ./Include/cpython/abstract.h:118
#28 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x555558929410, callable=<optimized out>) at ./Include/cpython/abstract.h:127
#29 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c6b0) at Python/ceval.c:5077
#30 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3506
#31 0x00007ffff7d48d33 in _PyEval_EvalFrame (throwflag=0, f=0x555558929270, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#32 function_code_fastcall (tstate=0x55555555c6b0, co=<optimized out>, args=<optimized out>, nargs=2, globals=<optimized out>) at Objects/call.c:330
#33 0x00007ffff7daabd9 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fffcb597e18, callable=0x7fffd5abfa60, tstate=0x55555555c6b0) at ./Include/cpython/abstract.h:118
#34 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fffcb597e18, callable=<optimized out>) at ./Include/cpython/abstract.h:127
#35 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c6b0) at Python/ceval.c:5077
#36 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3520
#37 0x00007ffff7da9c60 in _PyEval_EvalFrame (throwflag=0, f=0x7fffcb597c80, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#38 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x55555863c8b0, kwcount=<optimized out>, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x7ffff724e280, name=0x7ffff7507030, qualname=0x7ffff729ff80) at Python/ceval.c:4329
#39 0x00007ffff7d48b0b in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
#40 0x00007ffff7daabd9 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x55555863c8a8, callable=0x7fffcad33ca0, tstate=0x55555555c6b0) at ./Include/cpython/abstract.h:118
#41 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x55555863c8a8, callable=<optimized out>) at ./Include/cpython/abstract.h:127
#42 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c6b0) at Python/ceval.c:5077
#43 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3520
#44 0x00007ffff7da9c60 in _PyEval_EvalFrame (throwflag=0, f=0x55555863c6c0, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#45 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x7fffdc2f7f28, kwcount=<optimized out>, kwstep=1, defs=0x7ffff729ef58, defcount=1, kwdefs=0x0, closure=0x0, name=0x7ffff72c4bf0, qualname=0x7ffff72c4bf0) at Python/ceval.c:4329
#46 0x00007ffff7d48b0b in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
#47 0x00007ffff7daabd9 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fffdc2f7f10, callable=0x7fffd5abfb80, tstate=0x55555555c6b0) at ./Include/cpython/abstract.h:118
#48 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fffdc2f7f10, callable=<optimized out>) at ./Include/cpython/abstract.h:127
#49 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c6b0) at Python/ceval.c:5077
#50 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3520
#51 0x00007ffff7da9c60 in _PyEval_EvalFrame (throwflag=0, f=0x7fffdc2f7d60, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#52 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x7ffff747fa70, kwcount=<optimized out>, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7ffff748e770, qualname=0x7ffff748e770) at Python/ceval.c:4329
#53 0x00007ffff7d48b0b in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
#54 0x00007ffff7daabd9 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff747fa70, callable=0x7fffcad339d0, tstate=0x55555555c6b0) at ./Include/cpython/abstract.h:118
#55 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff747fa70, callable=<optimized out>) at ./Include/cpython/abstract.h:127
#56 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c6b0) at Python/ceval.c:5077
#57 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3520
#58 0x00007ffff7da9178 in _PyEval_EvalFrame (throwflag=0, f=0x7ffff747f900, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#59 _PyEval_EvalCode (qualname=0x0, name=<optimized out>, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwstep=2, kwcount=<optimized out>, kwargs=0x0, kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>, _co=<optimized out>, tstate=0x55555555c6b0) at Python/ceval.c:4329
#60 _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4361
#61 0x00007ffff7da8ec7 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:4377
#62 0x00007ffff7e2f71f in PyEval_EvalCode (co=co@entry=0x7ffff7266030, globals=globals@entry=0x7ffff7488280, locals=locals@entry=0x7ffff7488280) at Python/ceval.c:828
#63 0x00007ffff7e2e2b1 in builtin_exec_impl (module=<optimized out>, locals=0x7ffff7488280, globals=0x7ffff7488280, source=0x7ffff7266030) at Python/bltinmodule.c:1026
#64 builtin_exec (module=<optimized out>, args=args@entry=0x555555621f90, nargs=<optimized out>) at Python/clinic/bltinmodule.c.h:396
#65 0x00007ffff7d6fcf8 in cfunction_vectorcall_FASTCALL (func=0x7ffff74dfd60, args=0x555555621f90, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/methodobject.c:430
#66 0x00007ffff7daabd9 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x555555621f90, callable=0x7ffff74dfd60, tstate=0x55555555c6b0) at ./Include/cpython/abstract.h:118
#67 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x555555621f90, callable=<optimized out>) at ./Include/cpython/abstract.h:127
#68 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c6b0) at Python/ceval.c:5077
#69 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3520
#70 0x00007ffff7da9c60 in _PyEval_EvalFrame (throwflag=0, f=0x555555621dd0, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#71 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x5555555fabf0, kwcount=<optimized out>, kwstep=1, defs=0x7ffff73c03c8, defcount=5, kwdefs=0x0, closure=0x0, name=0x7ffff73becf0, qualname=0x7ffff73becf0) at Python/ceval.c:4329
#72 0x00007ffff7d48b0b in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
#73 0x00007ffff7daabd9 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x5555555fabc8, callable=0x7ffff73b4b80, tstate=0x55555555c6b0) at ./Include/cpython/abstract.h:118
#74 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x5555555fabc8, callable=<optimized out>) at ./Include/cpython/abstract.h:127
#75 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c6b0) at Python/ceval.c:5077
#76 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3520
#77 0x00007ffff7da9c60 in _PyEval_EvalFrame (throwflag=0, f=0x5555555faa20, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#78 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x7ffff73be768, kwcount=<optimized out>, kwstep=1, defs=0x7ffff744ac58, defcount=1, kwdefs=0x0, closure=0x0, name=0x7ffff73c0170, qualname=0x7ffff73c0170) at Python/ceval.c:4329
#79 0x00007ffff7d48b0b in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:396
#80 0x00007ffff7d498b3 in PyVectorcall_Call (callable=0x7ffff72d09d0, tuple=<optimized out>, kwargs=<optimized out>) at Objects/call.c:231
#81 0x00007ffff7e4aed7 in pymain_run_module (modname=<optimized out>, set_argv0=<optimized out>) at Modules/main.c:291
#82 0x00007ffff7e4abde in pymain_run_python (exitcode=0x7fffffffd760) at Modules/main.c:592
#83 Py_RunMain () at Modules/main.c:677
#84 0x00007ffff7e4a6dd in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:731
#85 0x00007ffff7a0d083 in __libc_start_main (main=0x555555555060 <main>, argc=6, argv=0x7fffffffd968, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd[95](https://github.com/llvm/torch-mlir/runs/7452705521?check_suite_focus=true#step:8:96)8) at ../csu/libc-start.c:308
#86 0x000055555555509e in _start ()

@henrytwo henrytwo changed the title XFail unsupported ops Resolve CI testing failure for Lazy Tensor Core Jul 21, 2022
@silvasean
Copy link
Contributor

Awesome. that's a good first step. It sounds like ASan would catch this bug and give a very clear diagnosis. ASan is probably pretty hard to set up for the full Python/etc. e2e test, but a small .cpp file with just a main() and a few calls into libtorch + LTC might be doable. Let me know how your debugging goes and I can get more hands-on (vs armchair debugging this) if you folks need.

@silvasean
Copy link
Contributor

silvasean commented Jul 21, 2022

And by Asan I mean a local ASan -- I suspect that the issue exists in the local build but is somehow not causing any symptoms locally (e.g. your local libc/malloc doesn't do as strict of checking) - asan will see it though.

@powderluv
Copy link
Collaborator

Thanks for continuing debugging this. Eventually we can add an ASAN builder (at least for the C++ parts)

@henrytwo henrytwo force-pushed the henrytu/xfail_unsupported_ops branch from ca6d89e to 29922a2 Compare July 21, 2022 20:49
@henrytwo
Copy link
Member Author

I made a new PR for experimenting with CI so everyone doesn't get spammed with emails: #1095

Once a solution is found, I'll bring it back over here

@silvasean
Copy link
Contributor

btw, have you verified that locally you are doing the same Release+Asserts build as CI?

When I try to run the same cmake command as CI, I get the following error:

torch-mlir-dialects: command not found

And when I run ninja check-torch-mlir-all using the build that we already had, it all passes without any problems

Is your -DLLVM_EXTERNAL_TORCH_MLIR_DIALECTS_SOURCE_DIR set correctly? Where is that error coming from?

@antoniojkim
Copy link
Collaborator

Is your -DLLVM_EXTERNAL_TORCH_MLIR_DIALECTS_SOURCE_DIR set correctly? Where is that error coming from?

Yes, it was set to

-DLLVM_EXTERNAL_TORCH_MLIR_DIALECTS_SOURCE_DIR="${GITHUB_WORKSPACE}/externals/llvm-external-projects/torch-mlir-dialects"

Honestly not sure where that error is coming from. Its not something that's encountered when running cmake via the setup.py

@antoniojkim antoniojkim force-pushed the torch_mlir_ltc_backend branch from 6698bb0 to 005748b Compare July 22, 2022 13:41
@henrytwo henrytwo force-pushed the henrytu/xfail_unsupported_ops branch from 29922a2 to 23e9bb8 Compare July 22, 2022 18:37
@henrytwo
Copy link
Member Author

Here's some updates on the situation. It looks like we forgot to add dynamic_ir.cpp to the CMake file, so some TS backend classes were used instead of ours, which likely resulted in the free() error. Now that that's fixed, there's a new problem related to MLIR:
https://github.com/llvm/torch-mlir/runs/7475488042?check_suite_focus=true

graph(%p0 : Float(1, 5)):
  %1 : Float(1, 5) = aten::tanh(%p0)
  return (%p0, %1)
LLVM ERROR: func.func created with unregistered dialect. If this is intended, please call allowUnregisteredDialects() on the MLIRContext, or use -allow-unregistered-dialect with the MLIR tool used.
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
Thread 1 "python" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7a0b859 in __GI_abort () at abort.c:79
#2  0x00007fffcefdd710 in llvm::report_fatal_error(llvm::Twine const&, bool) () from /home/runner/work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/_mlir_libs/libTorchMLIRAggregateCAPI.so
#3  0x00007fffcf0e52ec in mlir::Operation::Operation(mlir::Location, mlir::OperationName, unsigned int, unsigned int, unsigned int, mlir::DictionaryAttr, bool) () from /home/runner/work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/_mlir_libs/libTorchMLIRAggregateCAPI.so
#4  0x00007fffcf0e4bac in mlir::Operation::create(mlir::Location, mlir::OperationName, mlir::TypeRange, mlir::ValueRange, mlir::NamedAttrList&&, mlir::BlockRange, unsigned int) () from /home/runner/work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/_mlir_libs/libTorchMLIRAggregateCAPI.so
#5  0x00007fffcf0e4892 in mlir::Operation::create(mlir::Location, mlir::OperationName, mlir::TypeRange, mlir::ValueRange, mlir::NamedAttrList&&, mlir::BlockRange, mlir::RegionRange) () from /home/runner/work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/_mlir_libs/libTorchMLIRAggregateCAPI.so
#6  0x00007fffcf0e4826 in mlir::Operation::create(mlir::OperationState const&) () from /home/runner/work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/_mlir_libs/libTorchMLIRAggregateCAPI.so
#7  0x00007fffcef12f2f in mlirOperationCreate () from /home/runner/work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/_mlir_libs/libTorchMLIRAggregateCAPI.so
#8  0x00007fffcdc9eaf9 in torch_mlir::importJitFunctionAsFuncOp(MlirContext, torch::jit::Function*, std::function<MlirAttribute (int)>, torch_mlir::ImportOptions const&) () from /home/runner/work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/_mlir_libs/_jit_ir_importer.cpython-39-x86_64-linux-gnu.so
#9  0x00007fffcdb87738 in torch::lazy::TorchMlirLoweringContext::Build() () from /home/runner/work/torch-mlir/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/lib_mlir_ltc.so
#10 0x00007fffe0f218a8 in torch::lazy::LazyGraphExecutor::Compile(std::vector<c10::intrusive_ptr<torch::lazy::LazyTensor, c10::detail::intrusive_target_default_null_type<torch::lazy::LazyTensor> >, std::allocator<c10::intrusive_ptr<torch::lazy::LazyTensor, c10::detail::intrusive_target_default_null_type<torch::lazy::LazyTensor> > > > const&, c10::ArrayRef<std::string>, torch::lazy::LazyGraphExecutor::SyncTensorCollection const&, torch::lazy::LazyGraphExecutor::PostOrderData*) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#11 0x00007fffe0f25129 in torch::lazy::LazyGraphExecutor::SyncTensorsGraphInternal(std::vector<c10::intrusive_ptr<torch::lazy::LazyTensor, c10::detail::intrusive_target_default_null_type<torch::lazy::LazyTensor> >, std::allocator<c10::intrusive_ptr<torch::lazy::LazyTensor, c10::detail::intrusive_target_default_null_type<torch::lazy::LazyTensor> > > >*, c10::ArrayRef<std::string>, torch::lazy::LazyGraphExecutor::SyncTensorsConfig const&) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#12 0x00007fffe0f25a91 in torch::lazy::LazyGraphExecutor::SyncTensorsGraph(std::vector<c10::intrusive_ptr<torch::lazy::LazyTensor, c10::detail::intrusive_target_default_null_type<torch::lazy::LazyTensor> >, std::allocator<c10::intrusive_ptr<torch::lazy::LazyTensor, c10::detail::intrusive_target_default_null_type<torch::lazy::LazyTensor> > > >*, c10::ArrayRef<std::string>, bool, bool) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#13 0x00007fffe0f2604a in torch::lazy::LazyGraphExecutor::SyncLiveTensorsGraph(torch::lazy::BackendDevice const*, c10::ArrayRef<std::string>, bool) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#14 0x00007ffff69b3057 in pybind11::cpp_function::initialize<torch::lazy::initLazyBindings(_object*)::{lambda(std::string const&, std::vector<std::string, std::allocator<std::string> > const&, bool)#1}, void, std::string const&, std::vector<std::string, std::allocator<std::string> > const&, bool, pybind11::name, pybind11::scope, pybind11::sibling, pybind11::arg_v, pybind11::arg, pybind11::arg_v>(torch::lazy::initLazyBindings(_object*)::{lambda(std::string const&, std::vector<std::string, std::allocator<std::string> > const&, bool)#1}&&, void (*)(std::string const&, std::vector<std::string, std::allocator<std::string> > const&, bool), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, pybind11::arg_v const&, pybind11::arg const&, pybind11::arg_v const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_python.so
#15 0x00007ffff635818d in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/torch/lib/libtorch_python.so
#16 0x00007ffff7d6fdc2 in cfunction_call (func=0x7fffdc30df90, args=<optimized out>, kwargs=<optimized out>) at Objects/methodobject.c:543
#17 0x00007ffff7d484b8 in _PyObject_MakeTpCall (tstate=0x55555555c6b0, callable=0x7fffdc30df90, args=0x7fffd4bf8580, nargs=<optimized out>, keywords=<optimized out>) at Objects/call.c:191
#18 0x00007ffff7daf7fa in _PyObject_VectorcallTstate (kwnames=0x7ffff744a580, nargsf=<optimized out>, args=0x7fffd4bf8580, callable=0x7fffdc30df90, tstate=<optimized out>) at ./Include/cpython/abstract.h:116
#19 _PyObject_VectorcallTstate (kwnames=0x7ffff744a580, nargsf=<optimized out>, args=0x7fffd4bf8580, callable=0x7fffdc30df90, tstate=<optimized out>) at ./Include/cpython/abstract.h:103
#20 PyObject_Vectorcall (kwnames=0x7ffff744a580, nargsf=<optimized out>, args=<optimized out>, callable=0x7fffdc30df90) at ./Include/cpython/abstract.h:127
#21 call_function (kwnames=0x7ffff744a580, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=<optimized out>) at Python/ceval.c:5077
#22 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3537
#23 0x00007ffff7d48d33 in _PyEval_EvalFrame (throwflag=0, f=0x7fffd4bf8400, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#24 function_code_fastcall (tstate=0x55555555c6b0, co=<optimized out>, args=<optimized out>, nargs=2, globals=<optimized out>) at Objects/call.c:330
#25 0x00007ffff7daf10e in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x5555555b9c48, callable=0x7fffd488ea60, tstate=0x55555555c6b0) at ./Include/cpython/abstract.h:118
#26 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x5555555b9c48, callable=<optimized out>) at ./Include/cpython/abstract.h:127
#27 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55555555c6b0) at Python/ceval.c:5077
#28 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3489
#29 0x00007ffff7da9178 in _PyEval_EvalFrame (throwflag=0, f=0x5555555b9ad0, tstate=0x55555555c6b0) at ./Include/internal/pycore_ceval.h:40
#30 _PyEval_EvalCode (qualname=0x0, name=<optimized out>, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwstep=2, kwcount=<optimized out>, kwargs=0x0, kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=<optimized out>, globals=<optimized out>, _co=<optimized out>, tstate=0x55555555c6b0) at Python/ceval.c:4329
#31 _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4361
#32 0x00007ffff7da8ec7 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:4377
#33 0x00007ffff7e2f71f in PyEval_EvalCode (co=co@entry=0x7ffff73b3190, globals=globals@entry=0x7ffff74881c0, locals=locals@entry=0x7ffff74881c0) at Python/ceval.c:828
#34 0x00007ffff7e4230d in run_eval_code_obj (tstate=0x55555555c6b0, co=0x7ffff73b3190, globals=0x7ffff74881c0, locals=0x7ffff74881c0) at Python/pythonrun.c:1221
#35 0x00007ffff7e4229b in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x7ffff74881c0, locals=0x7ffff74881c0, flags=<optimized out>, arena=<optimized out>) at Python/pythonrun.c:1242
#36 0x00007ffff7ce7338 in pyrun_file (fp=fp@entry=0x555555559340, filename=filename@entry=0x7ffff73ad630, start=start@entry=257, globals=globals@entry=0x7ffff74881c0, locals=locals@entry=0x7ffff74881c0, closeit=closeit@entry=1, flags=0x7fffffffd788) at Python/pythonrun.c:1140
#37 0x00007ffff7ce70c4 in pyrun_simple_file (flags=0x7fffffffd788, closeit=1, filename=0x7ffff73ad630, fp=0x555555559340) at Python/pythonrun.c:450
#38 PyRun_SimpleFileExFlags (fp=fp@entry=0x555555559340, filename=<optimized out>, closeit=closeit@entry=1, flags=flags@entry=0x7fffffffd788) at Python/pythonrun.c:483
#39 0x00007ffff7ce7fb3 in PyRun_AnyFileExFlags (fp=fp@entry=0x555555559340, filename=<optimized out>, closeit=closeit@entry=1, flags=flags@entry=0x7fffffffd788) at Python/pythonrun.c:92
#40 0x00007ffff7e4ab61 in pymain_run_file (cf=0x7fffffffd788, config=0x55555555cf50) at Modules/main.c:373
#41 pymain_run_python (exitcode=0x7fffffffd780) at Modules/main.c:598
#42 Py_RunMain () at Modules/main.c:677
#43 0x00007ffff7e4a6dd in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:731
#44 0x00007ffff7a0d083 in __libc_start_main (main=0x555555555060 <main>, argc=2, argv=0x7fffffffd988, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd[97](https://github.com/llvm/torch-mlir/runs/7475488042?check_suite_focus=true#step:5:98)8) at ../csu/libc-start.c:308
#45 0x000055555555509e in _start ()

The behaviour is the same when running e2e tests, but in this case I'm running a small example model to help with isolating the source.

@henrytwo henrytwo force-pushed the henrytu/xfail_unsupported_ops branch from 23e9bb8 to c5e8448 Compare July 25, 2022 19:23
@silvasean
Copy link
Contributor

You probably need to call torchMlirRegisterAllDialects:

@henrytwo
Copy link
Member Author

You probably need to call torchMlirRegisterAllDialects:

Hmm the strange thing is that it is called: https://github.com/llvm/torch-mlir/blob/torch_mlir_ltc_backend/python/torch_mlir/csrc/base_lazy_backend/mlir_lowering_context.cpp#L279

@silvasean
Copy link
Contributor

I think we might also need the equivalent of torchMlirRegisterRequiredDialects: https://github.com/llvm/torch-mlir/pull/1084/files

@ashay I recall that patch got reverted -- is it okay for henry to just copy that function for now?

@henrytwo
Copy link
Member Author

Oh now that I think of it, mlir::func::FuncDialect is probably the only dialect I need to register. I'll try registering it directly and see what happens

@henrytwo henrytwo force-pushed the henrytu/xfail_unsupported_ops branch from c5e8448 to bfe533f Compare July 25, 2022 21:18
@henrytwo henrytwo requested a review from silvasean July 25, 2022 21:19
@henrytwo
Copy link
Member Author

From my testing on another branch, this PR should enable e2e CI tests to run successfully; however, we still run into some issues during Build out-of-tree

@silvasean
Copy link
Contributor

From my testing on another branch, this PR should enable e2e CI tests to run successfully; however, we still run into some issues during Build out-of-tree

we were talking today about the possibility of removing the out of tree build at the developer hour -- @powderluv, this sounds like another data point.

@henrytwo
Copy link
Member Author

@silvasean assuming the other 2 tests pass, are there any other changes you want on this PR, or would we be good to merge into torch_mlir_ltc_backend?

@silvasean
Copy link
Contributor

@silvasean assuming the other 2 tests pass, are there any other changes you want on this PR, or would we be good to merge into torch_mlir_ltc_backend?

LGTM

@silvasean
Copy link
Contributor

@silvasean assuming the other 2 tests pass, are there any other changes you want on this PR, or would we be good to merge into torch_mlir_ltc_backend?

And btw, thanks for debugging this. I've been on the other end of these "debug iteration goes through GitHub Actions" things and it is incredibly painful. I can't imagine debugging a memory corruption issue like you did here!

@ashay
Copy link
Collaborator

ashay commented Jul 25, 2022

I recall that patch got reverted -- is it okay for henry to just copy that function for now?

Sorry for being late to the discussion. Henry, feel free to copy that function. Now that macOS builds run in CI, I should be able to fix any redundancy in a subsequent patch.

@henrytwo
Copy link
Member Author

I'm going to merge this into the LTC branch now. The failures during source and macos build seem to be outside the scope of this PR, so they'll be addressed separately.

@henrytwo henrytwo merged commit 4106a7d into torch_mlir_ltc_backend Jul 25, 2022
@henrytwo henrytwo deleted the henrytu/xfail_unsupported_ops branch July 26, 2022 00:16
henrytwo added a commit that referenced this pull request Jul 29, 2022
* Xfail unsupported ops

* Register FuncDialect

* Include dynamic_ir in build

* Code reformat

* Enable LTC tests for macOS and Source Build
henrytwo added a commit that referenced this pull request Jul 29, 2022
* Xfail unsupported ops

* Register FuncDialect

* Include dynamic_ir in build

* Code reformat

* Enable LTC tests for macOS and Source Build
henrytwo added a commit that referenced this pull request Jul 30, 2022
* Xfail unsupported ops

* Register FuncDialect

* Include dynamic_ir in build

* Code reformat

* Enable LTC tests for macOS and Source Build
qedawkins pushed a commit to nod-ai/torch-mlir that referenced this pull request Oct 3, 2022
Signed-off-by: Gong Su <gong_su@hotmail.com>

Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
qedawkins pushed a commit to nod-ai/torch-mlir that referenced this pull request Oct 3, 2022
* Add check-onnx-backend to Mac CI. (llvm#1069)

* Add check-onnx-backend to Mac CI.

Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Additional Docker help and split README for easier reading (llvm#1084)

* initial docker documentation

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* split README with no redundant place for info

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* update

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* update

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* update

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* update

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* update

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* respond to suggestions

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* specify that onnx-mlir.py script generates only code suitable to be exec in Linux and/or Docker env

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* fix checkdocs

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* responded to review suggestion on onnx-mlir --help

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* use ONNX-MLIR everywhere

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* add verify for concat

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* check all inputs

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Support filtering out lit tests based on targets (llvm#1087)

Currently we ignore what targets llvm was built for in the lit tests, but recent changes to onnx-mlir explicitly initialize the available targets.
This makes the corresponding change to the lit configuration, so that we can filter out the lit tests based on the available targets.

Signed-off-by: Stella Stamenova <stilis@microsoft.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Switch URLs to use main instead of master (llvm#1094)

Signed-off-by: Charles Volzka <cjvolzka@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Fix MacOS build badge (llvm#1092)

Signed-off-by: Gong Su <gong_su@hotmail.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* onnx-mlir.py warning about binary output (.so and .jar) (llvm#1090)

not directly usable if host is not Linux

Signed-off-by: Gong Su <gong_su@hotmail.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Make the doc example obey ONNX_MLIR_BUILD_TESTS (llvm#1083)

* Make the doc example obey ONNX_MLIR_BUILD_TESTS

Currently, ONNX_MLIR_BUILD_TESTS controls EXCLUDE_FROM_ALL, however, the targets added through add_executable will always build. We follow the llvm pattern and explicitly set EXCLUDE_FROM_ALL in the add_onnx_mlir_executable function if it is set for the directory, so that add_executable targets don't always build.

Signed-off-by: Stella Stamenova <stilis@microsoft.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Explicitly install into lib on all systems (llvm#1088)

Signed-off-by: Gong Su <gong_su@hotmail.com>

Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* add check (llvm#1098)

Signed-off-by: Tong Chen <chentong@us.ibm.com>

Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* fix typos and add ssh-client to dockerfile (llvm#1096)

* fix typos and add ssh-client to dockerfile

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* sync doc and script

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Emit print statement only when the verbose option is in effect. (llvm#1097)

Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com>

Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* format & refine code by request

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Support older versions 6, 11, 12 for Clip Op (llvm#1100)

Signed-off-by: Tung D. Le <tung@jp.ibm.com>

Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* using front to get first input

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* add 3 lit test for concat  verifier

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* add newline

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Add check-onnx-backend to Mac CI. (llvm#1069)

* Add check-onnx-backend to Mac CI.

Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Additional Docker help and split README for easier reading (llvm#1084)

* initial docker documentation

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* split README with no redundant place for info

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* update

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* update

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* update

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* update

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* update

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* respond to suggestions

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* specify that onnx-mlir.py script generates only code suitable to be exec in Linux and/or Docker env

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* fix checkdocs

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* responded to review suggestion on onnx-mlir --help

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* use ONNX-MLIR everywhere

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Switch URLs to use main instead of master (llvm#1094)

Signed-off-by: Charles Volzka <cjvolzka@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Fix MacOS build badge (llvm#1092)

Signed-off-by: Gong Su <gong_su@hotmail.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* fix typos and add ssh-client to dockerfile (llvm#1096)

* fix typos and add ssh-client to dockerfile

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* sync doc and script

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Update document (llvm#1077)

* create

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* delete HowTOAddAnOperation.md

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* modify testing

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* create

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* delete HowTOAddAnOperation.md

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* modify testing

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* fix

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* create

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* add comment

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* delete HowTOAddAnOperation.md

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* modify testing

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* fix

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* create

Signed-off-by: Tong Chen <chentong@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Update LLVM level (llvm#1095)

* Update LLVM level to 700997a

Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Pass a type converter to all ONNX operations. (llvm#1102)

Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Nuke KrnlDummyCastOp now that we use MLIR's UnrealizedConversionCastOp (llvm#1103)

* Nuke KrnlDummyCastOp now that we use MLIR's UnrealizedConversionCastOp

Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com>

* Remove a dependency in src/Dialect/Krnl/CMakeList.txt.  Regenerate docs via 'ninja onnx-mlir-docs'.

Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Add --emitObj option to onnx-mlir (llvm#1104)

Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* fix warnings (llvm#1093)

Signed-off-by: Ian Bearman <ianb@microsoft.com>

Co-authored-by: Stella Stamenova <stilis@microsoft.com>
Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Add -march option to onnx-mlir (llvm#1107)

Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Fix Doc spelling and broken links, removed warnings about using main (llvm#1106)

* removed warning about main vs master in CONTRIBUTING, fixed links and spelling mistakes

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Update BuildONNX.md

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

Co-authored-by: Ettore Tiotto <etiotto@ca.ibm.com>
Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
Co-authored-by: Stella Stamenova <stilis@microsoft.com>
Co-authored-by: Charles Volzka <42243335+cjvolzka@users.noreply.github.com>
Co-authored-by: gongsu832 <gong_su@hotmail.com>
Co-authored-by: chentong319 <chentong@us.ibm.com>
Co-authored-by: Tung D. Le <tung@jp.ibm.com>
Co-authored-by: Ian Bearman <ian.bearman@live.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants