Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] Segfaults cause pytest internal errors #12776

Closed
driazati opened this issue Sep 14, 2022 · 0 comments
Closed

[ci] Segfaults cause pytest internal errors #12776

driazati opened this issue Sep 14, 2022 · 0 comments
Labels
dev:test-infra type:ci Relates to TVM CI infrastructure

Comments

@driazati
Copy link
Member

driazati commented Sep 14, 2022

See https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/aarch64_frontend_tests/3/pipeline

[2022-09-10T11:36:45.046Z] tests/python/frontend/pytorch/qnn_test.py::test_serialized_modules LLVM ERROR: out of memory
[2022-09-10T11:36:45.046Z] Fatal Python error: Aborted
[2022-09-10T11:36:45.046Z] 
[2022-09-10T11:36:45.046Z] Thread 0x0000ffffa6c241f0 (most recent call first):
[2022-09-10T11:36:45.046Z]   File "/usr/local/lib/python3.7/dist-packages/execnet/gateway_base.py", line 400 in read
[2022-09-10T11:36:45.046Z]   File "/usr/local/lib/python3.7/dist-packages/execnet/gateway_base.py", line 432 in from_io
[2022-09-10T11:36:45.046Z]   File "/usr/local/lib/python3.7/dist-packages/execnet/gateway_base.py", line 967 in _thread_receiver
[2022-09-10T11:36:45.046Z]   File "/usr/local/lib/python3.7/dist-packages/execnet/gateway_base.py", line 220 in run
[2022-09-10T11:36:45.046Z]   File "/usr/local/lib/python3.7/dist-packages/execnet/gateway_base.py", line 285 in _perform_spawn
[2022-09-10T11:36:45.046Z] 
[2022-09-10T11:36:45.046Z] Current thread 0x0000ffffa72ab010 (most recent call first):
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/torch/jit/_serialization.py", line 162 in load
[2022-09-10T11:36:45.047Z]   File "/workspace/tests/python/frontend/pytorch/qnn_test.py", line 535 in test_serialized_modules
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/python.py", line 192 in pytest_pyfunc_call
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_callers.py", line 39 in _multicall
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_manager.py", line 80 in _hookexec
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_hooks.py", line 265 in __call__
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/python.py", line 1761 in runtest
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/runner.py", line 166 in pytest_runtest_call
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_callers.py", line 39 in _multicall
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_manager.py", line 80 in _hookexec
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_hooks.py", line 265 in __call__
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/runner.py", line 259 in <lambda>
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/runner.py", line 338 in from_call
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/runner.py", line 259 in call_runtest_hook
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/runner.py", line 219 in call_and_report
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/runner.py", line 130 in runtestprotocol
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/pytest_rerunfailures.py", line 497 in pytest_runtest_protocol
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_callers.py", line 39 in _multicall
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_manager.py", line 80 in _hookexec
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_hooks.py", line 265 in __call__
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/xdist/remote.py", line 110 in run_one_test
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/xdist/remote.py", line 91 in pytest_runtestloop
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_callers.py", line 39 in _multicall
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_manager.py", line 80 in _hookexec
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_hooks.py", line 265 in __call__
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/main.py", line 322 in _main
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/main.py", line 268 in wrap_session
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/_pytest/main.py", line 315 in pytest_cmdline_main
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_callers.py", line 39 in _multicall
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_manager.py", line 80 in _hookexec
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/pluggy/_hooks.py", line 265 in __call__
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/xdist/remote.py", line 291 in <module>
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/execnet/gateway_base.py", line 1084 in executetask
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/execnet/gateway_base.py", line 220 in run
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/execnet/gateway_base.py", line 285 in _perform_spawn
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/execnet/gateway_base.py", line 267 in integrate_as_primary_thread
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/execnet/gateway_base.py", line 1060 in serve
[2022-09-10T11:36:45.047Z]   File "/usr/local/lib/python3.7/dist-packages/execnet/gateway_base.py", line 1554 in serve
[2022-09-10T11:36:45.047Z]   File "<string>", line 8 in <module>
[2022-09-10T11:36:45.047Z]   File "<string>", line 1 in <module>
[2022-09-10T11:36:45.047Z] 
[2022-09-10T11:36:45.047Z] [gw0] node down: Not properly terminated
[2022-09-10T11:36:45.047Z] INTERNALERROR> Traceback (most recent call last):
[2022-09-10T11:36:45.047Z] INTERNALERROR>   File "/usr/local/lib/python3.7/dist-packages/_pytest/main.py", line 268, in wrap_session
[2022-09-10T11:36:45.047Z] INTERNALERROR>     session.exitstatus = doit(config, session) or 0
[2022-09-10T11:36:45.047Z] INTERNALERROR>   File "/usr/local/lib/python3.7/dist-packages/_pytest/main.py", line 322, in _main
[2022-09-10T11:36:45.047Z] INTERNALERROR>     config.hook.pytest_runtestloop(session=session)
[2022-09-10T11:36:45.047Z] INTERNALERROR>   File "/usr/local/lib/python3.7/dist-packages/pluggy/_hooks.py", line 265, in __call__
[2022-09-10T11:36:45.047Z] INTERNALERROR>     return self._hookexec(self.name, self.get_hookimpls(), kwargs, firstresult)
[2022-09-10T11:36:45.047Z] INTERNALERROR>   File "/usr/local/lib/python3.7/dist-packages/pluggy/_manager.py", line 80, in _hookexec
[2022-09-10T11:36:45.047Z] INTERNALERROR>     return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
[2022-09-10T11:36:45.047Z] INTERNALERROR>   File "/usr/local/lib/python3.7/dist-packages/pluggy/_callers.py", line 60, in _multicall
[2022-09-10T11:36:45.047Z] INTERNALERROR>     return outcome.get_result()
[2022-09-10T11:36:45.047Z] INTERNALERROR>   File "/usr/local/lib/python3.7/dist-packages/pluggy/_result.py", line 60, in get_result
[2022-09-10T11:36:45.047Z] INTERNALERROR>     raise ex[1].with_traceback(ex[2])
[2022-09-10T11:36:45.047Z] INTERNALERROR>   File "/usr/local/lib/python3.7/dist-packages/pluggy/_callers.py", line 39, in _multicall
[2022-09-10T11:36:45.047Z] INTERNALERROR>     res = hook_impl.function(*args)
[2022-09-10T11:36:45.047Z] INTERNALERROR>   File "/usr/local/lib/python3.7/dist-packages/xdist/dsession.py", line 117, in pytest_runtestloop
[2022-09-10T11:36:45.047Z] INTERNALERROR>     self.loop_once()
[2022-09-10T11:36:45.047Z] INTERNALERROR>   File "/usr/local/lib/python3.7/dist-packages/xdist/dsession.py", line 140, in loop_once
[2022-09-10T11:36:45.047Z] INTERNALERROR>     call(**kwargs)
[2022-09-10T11:36:45.047Z] INTERNALERROR>   File "/usr/local/lib/python3.7/dist-packages/xdist/dsession.py", line 209, in worker_errordown
[2022-09-10T11:36:45.047Z] INTERNALERROR>     self.handle_crashitem(crashitem, node)
[2022-09-10T11:36:45.047Z] INTERNALERROR>   File "/usr/local/lib/python3.7/dist-packages/xdist/dsession.py", line 358, in handle_crashitem
[2022-09-10T11:36:45.047Z] INTERNALERROR>     sched=self.sched,
[2022-09-10T11:36:45.047Z] INTERNALERROR>   File "/usr/local/lib/python3.7/dist-packages/pluggy/_hooks.py", line 265, in __call__
[2022-09-10T11:36:45.047Z] INTERNALERROR>     return self._hookexec(self.name, self.get_hookimpls(), kwargs, firstresult)
[2022-09-10T11:36:45.047Z] INTERNALERROR>   File "/usr/local/lib/python3.7/dist-packages/pluggy/_manager.py", line 80, in _hookexec
[2022-09-10T11:36:45.047Z] INTERNALERROR>     return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
[2022-09-10T11:36:45.047Z] INTERNALERROR>   File "/usr/local/lib/python3.7/dist-packages/pluggy/_callers.py", line 60, in _multicall
[2022-09-10T11:36:45.047Z] INTERNALERROR>     return outcome.get_result()
[2022-09-10T11:36:45.047Z] INTERNALERROR>   File "/usr/local/lib/python3.7/dist-packages/pluggy/_result.py", line 60, in get_result
[2022-09-10T11:36:45.047Z] INTERNALERROR>     raise ex[1].with_traceback(ex[2])
[2022-09-10T11:36:45.047Z] INTERNALERROR>   File "/usr/local/lib/python3.7/dist-packages/pluggy/_callers.py", line 39, in _multicall
[2022-09-10T11:36:45.047Z] INTERNALERROR>     res = hook_impl.function(*args)
[2022-09-10T11:36:45.047Z] INTERNALERROR>   File "/usr/local/lib/python3.7/dist-packages/pytest_rerunfailures.py", line 341, in pytest_handlecrashitem
[2022-09-10T11:36:45.047Z] INTERNALERROR>     sched.mark_test_pending(crashitem)
[2022-09-10T11:36:45.047Z] INTERNALERROR>   File "/usr/local/lib/python3.7/dist-packages/xdist/scheduler/loadscope.py", line 247, in mark_test_pending
[2022-09-10T11:36:45.047Z] INTERNALERROR>     raise NotImplementedError()
[2022-09-10T11:36:45.047Z] INTERNALERROR> NotImplementedError
[2022-09-10T11:36:45.047Z] 
[2022-09-10T11:36:45.047Z] ============== 1 failed, 14 warnings, 1 error, 3 rerun in 15.63s ===============
[2022-09-10T11:36:45.047Z] + exit_code=3
[2022-09-10T11:36:45.047Z] + '[' 3 -ne 0 ']'
[2022-09-10T11:36:45.047Z] + '[' 3 -ne 5 ']'
[2022-09-10T11:36:45.047Z] + pytest_errors+=("${suite_name}: $@")
[2022-09-10T11:36:45.047Z] + cleanup
[2022-09-10T11:36:45.047Z] + set +x
[2022-09-10T11:36:45.047Z] These pytest invocations failed, the results can be found in the Jenkins 'Tests' tab or by scrolling up through the raw logs here.
[2022-09-10T11:36:45.047Z] Report flaky test shortcut: https://github.com/apache/tvm/issues/new?labels=test%3A+flaky&title=%5BFlaky+Test%5D+%60cython.py%3A%3Atests.python.frontend.pytorch.test_object_detection%60%2C+%60pytest.py%3A%3Ainternal%60%2C+%60tests%2Fpython%2Ffrontend%2Fpytorch%2Fqnn_test.py%3A%3Atest_quantized_imagenet%60&body=%0AThese+tests+were+found+to+be+flaky+%28intermittently+failing+on+%60main%60+or+failed+in+a+PR+with+unrelated+changes%29.+See+%5Bthe+docs%5D%28https%3A%2F%2Fgithub.com%2Fapache%2Ftvm%2Fblob%2Fmain%2Fdocs%2Fcontribute%2Fci.rst%23handling-flaky-failures%29+for+details.%0A%0A%23%23%23+Tests%28s%29%0A%0A++-+%60cython.py%3A%3Atests.python.frontend.pytorch.test_object_detection%60%0A++-+%60pytest.py%3A%3Ainternal%60%0A++-+%60tests%2Fpython%2Ffrontend%2Fpytorch%2Fqnn_test.py%3A%3Atest_quantized_imagenet%60%0A%0A%23%23%23+Jenkins+Links%0A%0A++-+https%3A%2F%2Fci.tlcpack.ai%2Fjob%2Ftvm%2Fjob%2Faarch64_frontend_tests%2F3%2Fdisplay%2Fredirect
[2022-09-10T11:36:45.047Z] =============================== PYTEST FAILURES ================================
[2022-09-10T11:36:45.047Z] These pytest suites failed to execute. The results can be found in the Jenkins 'Tests' tab or by scrolling up through the raw logs here. If there is no test listed below, the failure likely came from a segmentation fault which you can find in the logs above.
[2022-09-10T11:36:45.047Z] 
[2022-09-10T11:36:45.047Z]     - python-frontend-pytorch-shard-0-cython: tests/python/frontend/pytorch
[2022-09-10T11:36:45.047Z] 
[2022-09-10T11:36:45.047Z] You can reproduce these specific failures locally with this command:
[2022-09-10T11:36:45.047Z] 
[2022-09-10T11:36:45.047Z]     python3 tests/scripts/ci.py arm --tests cython.py::tests.python.frontend.pytorch.test_object_detection --tests pytest.py::internal --tests tests/python/frontend/pytorch/qnn_test.py::test_quantized_imagenet
[2022-09-10T11:36:45.047Z] 
script returned exit code 1

mark_test_pending isn't there since we're using a LoadScopeScheduling instance to serialize some tests: https://github.com/apache/tvm/blob/main/python/tvm/testing/plugin.py#L334. The segfault should just look like any other test error without any pytest internal failures.

cc @Mousius @areusch @gigiblender

@driazati driazati added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type:ci Relates to TVM CI infrastructure and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Sep 14, 2022
driazati added a commit to driazati/tvm that referenced this issue Sep 20, 2022
This is a less-than-ideal fix for apache#12776. It disables reruns for crashes
in xdist workers, which gets rid of the pytest internal failure and
correctly reports the test name. Ideally it would also include the
backtrace in the report and rerun the test but there doesn't seem to be
an easy way to get pytest-rerunfailures to do that alongside the
LoadScopeScheduler.
driazati added a commit to driazati/tvm that referenced this issue Sep 20, 2022
This is a less-than-ideal fix for apache#12776. It disables reruns for crashes
in xdist workers, which gets rid of the pytest internal failure and
correctly reports the test name. Ideally it would also include the
backtrace in the report and rerun the test but there doesn't seem to be
an easy way to get pytest-rerunfailures to do that alongside the
LoadScopeScheduler.
driazati added a commit to driazati/tvm that referenced this issue Sep 20, 2022
This is a less-than-ideal fix for apache#12776. It disables reruns for crashes
in xdist workers, which gets rid of the pytest internal failure and
correctly reports the test name. Ideally it would also include the
backtrace in the report and rerun the test but there doesn't seem to be
an easy way to get pytest-rerunfailures to do that alongside the
LoadScopeScheduler.
driazati added a commit to driazati/tvm that referenced this issue Sep 20, 2022
This is a less-than-ideal fix for apache#12776. It disables reruns for crashes
in xdist workers, which gets rid of the pytest internal failure and
correctly reports the test name. Ideally it would also include the
backtrace in the report and rerun the test but there doesn't seem to be
an easy way to get pytest-rerunfailures to do that alongside the
LoadScopeScheduler.
driazati added a commit to driazati/tvm that referenced this issue Sep 20, 2022
This is a less-than-ideal fix for apache#12776. It disables reruns for crashes
in xdist workers, which gets rid of the pytest internal failure and
correctly reports the test name. Ideally it would also include the
backtrace in the report and rerun the test but there doesn't seem to be
an easy way to get pytest-rerunfailures to do that alongside the
LoadScopeScheduler.
driazati added a commit to driazati/tvm that referenced this issue Sep 20, 2022
This is a less-than-ideal fix for apache#12776. It disables reruns for crashes
in xdist workers, which gets rid of the pytest internal failure and
correctly reports the test name. Ideally it would also include the
backtrace in the report and rerun the test but there doesn't seem to be
an easy way to get pytest-rerunfailures to do that alongside the
LoadScopeScheduler.
driazati added a commit to driazati/tvm that referenced this issue Sep 26, 2022
This is a less-than-ideal fix for apache#12776. It disables reruns for crashes
in xdist workers, which gets rid of the pytest internal failure and
correctly reports the test name. Ideally it would also include the
backtrace in the report and rerun the test but there doesn't seem to be
an easy way to get pytest-rerunfailures to do that alongside the
LoadScopeScheduler.
driazati added a commit to driazati/tvm that referenced this issue Sep 27, 2022
This is a less-than-ideal fix for apache#12776. It disables reruns for crashes
in xdist workers, which gets rid of the pytest internal failure and
correctly reports the test name. Ideally it would also include the
backtrace in the report and rerun the test but there doesn't seem to be
an easy way to get pytest-rerunfailures to do that alongside the
LoadScopeScheduler.
driazati added a commit to driazati/tvm that referenced this issue Sep 27, 2022
This is a less-than-ideal fix for apache#12776. It disables reruns for crashes
in xdist workers, which gets rid of the pytest internal failure and
correctly reports the test name. Ideally it would also include the
backtrace in the report and rerun the test but there doesn't seem to be
an easy way to get pytest-rerunfailures to do that alongside the
LoadScopeScheduler.
driazati added a commit to driazati/tvm that referenced this issue Sep 28, 2022
This is a less-than-ideal fix for apache#12776. It disables reruns for crashes
in xdist workers, which gets rid of the pytest internal failure and
correctly reports the test name. Ideally it would also include the
backtrace in the report and rerun the test but there doesn't seem to be
an easy way to get pytest-rerunfailures to do that alongside the
LoadScopeScheduler.
driazati added a commit to driazati/tvm that referenced this issue Sep 28, 2022
This is a less-than-ideal fix for apache#12776. It disables reruns for crashes
in xdist workers, which gets rid of the pytest internal failure and
correctly reports the test name. Ideally it would also include the
backtrace in the report and rerun the test but there doesn't seem to be
an easy way to get pytest-rerunfailures to do that alongside the
LoadScopeScheduler.
driazati added a commit to driazati/tvm that referenced this issue Sep 28, 2022
This is a less-than-ideal fix for apache#12776. It disables reruns for crashes
in xdist workers, which gets rid of the pytest internal failure and
correctly reports the test name. Ideally it would also include the
backtrace in the report and rerun the test but there doesn't seem to be
an easy way to get pytest-rerunfailures to do that alongside the
LoadScopeScheduler.
@areusch areusch added the needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it label Oct 19, 2022
@janetsc janetsc added dev:test-infra and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Oct 19, 2022
@tqchen tqchen closed this as completed Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev:test-infra type:ci Relates to TVM CI infrastructure
Projects
None yet
Development

No branches or pull requests

4 participants