Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLAKY] tvmc/test_frontends.py #7455

Closed
masahi opened this issue Feb 13, 2021 · 8 comments
Closed

[FLAKY] tvmc/test_frontends.py #7455

masahi opened this issue Feb 13, 2021 · 8 comments

Comments

@masahi
Copy link
Member

masahi commented Feb 13, 2021

There is another CI incident on main https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/activity?branch=main

https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/561/pipeline/
https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/560/pipeline/

The error is the same one discussed in #7366. Could be due to the recurring omp + pytorch issue

@jwfromm @comaniac @leandron @altanh

@areusch
Copy link
Contributor

areusch commented Feb 17, 2021

Happened again this morning: https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/570/pipeline

Relevant traceback:

tests/python/driver/tvmc/test_frontends.py::test_load_model__pth munmap_chunk(): invalid pointer
Fatal Python error: Aborted

Current thread 0x00007fca21074740 (most recent call first):
  File "/workspace/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 416 in _conv_forward
  File "/workspace/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 419 in forward
  File "/workspace/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 704 in _slow_forward
  File "/workspace/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 720 in _call_impl
  File "/workspace/.local/lib/python3.6/site-packages/torchvision/models/resnet.py", line 203 in _forward_impl
  File "/workspace/.local/lib/python3.6/site-packages/torchvision/models/resnet.py", line 220 in forward
  File "/workspace/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 704 in _slow_forward
  File "/workspace/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 720 in _call_impl
  File "/workspace/.local/lib/python3.6/site-packages/torch/jit/__init__.py", line 1109 in trace_module
  File "/workspace/.local/lib/python3.6/site-packages/torch/jit/__init__.py", line 955 in trace
  File "/workspace/tests/python/driver/tvmc/conftest.py", line 113 in pytorch_resnet18
  File "/usr/local/lib/python3.6/dist-packages/_pytest/fixtures.py", line 930 in call_fixture_func
  File "/usr/local/lib/python3.6/dist-packages/_pytest/fixtures.py", line 1124 in pytest_fixture_setup
  File "/usr/local/lib/python3.6/dist-packages/pluggy/callers.py", line 187 in _multicall
  File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 87 in <lambda>
  File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 93 in _hookexec
  File "/usr/local/lib/python3.6/dist-packages/pluggy/hooks.py", line 286 in __call__
  File "/usr/local/lib/python3.6/dist-packages/_pytest/fixtures.py", line 1070 in execute
  File "/usr/local/lib/python3.6/dist-packages/_pytest/fixtures.py", line 689 in _compute_fixture_value
  File "/usr/local/lib/python3.6/dist-packages/_pytest/fixtures.py", line 605 in _get_active_fixturedef
  File "/usr/local/lib/python3.6/dist-packages/_pytest/fixtures.py", line 585 in getfixturevalue
  File "/usr/local/lib/python3.6/dist-packages/_pytest/fixtures.py", line 572 in _fillfixtures
  File "/usr/local/lib/python3.6/dist-packages/_pytest/python.py", line 1633 in setup
  File "/usr/local/lib/python3.6/dist-packages/_pytest/runner.py", line 448 in prepare
  File "/usr/local/lib/python3.6/dist-packages/_pytest/runner.py", line 151 in pytest_runtest_setup
  File "/usr/local/lib/python3.6/dist-packages/pluggy/callers.py", line 187 in _multicall
  File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 87 in <lambda>
  File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 93 in _hookexec
  File "/usr/local/lib/python3.6/dist-packages/pluggy/hooks.py", line 286 in __call__
  File "/usr/local/lib/python3.6/dist-packages/_pytest/runner.py", line 256 in <lambda>
  File "/usr/local/lib/python3.6/dist-packages/_pytest/runner.py", line 310 in from_call
  File "/usr/local/lib/python3.6/dist-packages/_pytest/runner.py", line 256 in call_runtest_hook
  File "/usr/local/lib/python3.6/dist-packages/_pytest/runner.py", line 216 in call_and_report
  File "/usr/local/lib/python3.6/dist-packages/_pytest/runner.py", line 121 in runtestprotocol
  File "/usr/local/lib/python3.6/dist-packages/_pytest/runner.py", line 110 in pytest_runtest_protocol
  File "/usr/local/lib/python3.6/dist-packages/pluggy/callers.py", line 187 in _multicall
  File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 87 in <lambda>
  File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 93 in _hookexec
  File "/usr/local/lib/python3.6/dist-packages/pluggy/hooks.py", line 286 in __call__
  File "/usr/local/lib/python3.6/dist-packages/_pytest/main.py", line 338 in pytest_runtestloop
  File "/usr/local/lib/python3.6/dist-packages/pluggy/callers.py", line 187 in _multicall
  File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 87 in <lambda>
  File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 93 in _hookexec
  File "/usr/local/lib/python3.6/dist-packages/pluggy/hooks.py", line 286 in __call__
  File "/usr/local/lib/python3.6/dist-packages/_pytest/main.py", line 313 in _main
  File "/usr/local/lib/python3.6/dist-packages/_pytest/main.py", line 257 in wrap_session
  File "/usr/local/lib/python3.6/dist-packages/_pytest/main.py", line 306 in pytest_cmdline_main
  File "/usr/local/lib/python3.6/dist-packages/pluggy/callers.py", line 187 in _multicall
  File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 87 in <lambda>
  File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 93 in _hookexec
  File "/usr/local/lib/python3.6/dist-packages/pluggy/hooks.py", line 286 in __call__
  File "/usr/local/lib/python3.6/dist-packages/_pytest/config/__init__.py", line 165 in main
  File "/usr/local/lib/python3.6/dist-packages/_pytest/config/__init__.py", line 187 in console_main
  File "/usr/local/lib/python3.6/dist-packages/pytest/__main__.py", line 5 in <module>
  File "/usr/lib/python3.6/runpy.py", line 85 in _run_code
  File "/usr/lib/python3.6/runpy.py", line 193 in _run_module_as_main

@leandron
Copy link
Contributor

cc @ekalda

@ekalda
Copy link
Contributor

ekalda commented Feb 18, 2021

Very odd. It looks like CI is always either skipping that test or failing with that error. That test is passing for me locally though, so I'm not sure how to reproduce it...

@areusch
Copy link
Contributor

areusch commented Feb 18, 2021

@ekalda are you using the docker container to reproduce?

e.g.
docker/bash.sh tlcpack/ci-cpu:v0.72-t0 ./tests/scripts/task_config_build_cpu.sh
docker/bash.sh tlcpack/ci-cpu:v0.72-t0 ./tests/scripts/task_build.sh build -j2
docker/bash.sh tlcpack/ci-cpu:v0.72-t0 ./tests/scripts/task_python_integration.sh

@leandron
Copy link
Contributor

leandron commented Feb 18, 2021

On a quick look in the CI jobs all the posted links so far (including the ones by @apivovarov) it looks like the job was always being executed by INFO: NODE_NAME=node.aladdin.cudabuild EXECUTOR_NUMBER=1, whereas I couldn't find any failed on INFO: NODE_NAME=node.aladdin.cudabuild EXECUTOR_NUMBER=0 (list below).

Looking at random on recent builds:
https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/567/pipeline/40
https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/569/pipeline/40
https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/576/pipeline/40
https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/574/pipeline/40

Any idea on how this could be an infrastructure fail rather than a flaky test? (cc @areusch @masahi )

@ekalda
Copy link
Contributor

ekalda commented Feb 18, 2021

@areusch I built and ran the tests in this docker container using the commands you gave me and I see this test being skipped (similarly to the successful CI runs), which makes me suspect that there is some CI problem...

@masahi
Copy link
Member Author

masahi commented Feb 18, 2021

FYI this test is getting disabled in #7465

lhutton1 added a commit to lhutton1/tvm that referenced this issue Nov 4, 2021
This test was originally disabled due to the issue documented in apache#7455
affecting CI. I believe this has since been resolved by apache#9362.

Note: This patch should not be merged until the changes in
https: //github.com/tlc-pack/tlcpack/pull/81 are reflected in CI.

Change-Id: Ib918595a1d9149e3c858ca761861304450cbfe13
lhutton1 added a commit to lhutton1/tvm that referenced this issue Nov 5, 2021
This test was originally disabled due to the issue documented in apache#7455
affecting CI. I believe this has since been resolved by apache#9362.

Note: This patch should not be merged until the changes in
https: //github.com/tlc-pack/tlcpack/pull/81 are reflected in CI.

Change-Id: Ib918595a1d9149e3c858ca761861304450cbfe13
masahi pushed a commit that referenced this issue Nov 6, 2021
This test was originally disabled due to the issue documented in #7455
affecting CI. I believe this has since been resolved by #9362.

Note: This patch should not be merged until the changes in
https: //github.com/tlc-pack/tlcpack/pull/81 are reflected in CI.

Change-Id: Ib918595a1d9149e3c858ca761861304450cbfe13
@lhutton1
Copy link
Contributor

I believe this can be closed now.

@tqchen tqchen closed this as completed Nov 11, 2021
mehrdadh pushed a commit to mehrdadh/tvm that referenced this issue Dec 1, 2021
This test was originally disabled due to the issue documented in apache#7455
affecting CI. I believe this has since been resolved by apache#9362.

Note: This patch should not be merged until the changes in
https: //github.com/tlc-pack/tlcpack/pull/81 are reflected in CI.

Change-Id: Ib918595a1d9149e3c858ca761861304450cbfe13
mehrdadh pushed a commit to mehrdadh/tvm that referenced this issue Dec 1, 2021
This test was originally disabled due to the issue documented in apache#7455
affecting CI. I believe this has since been resolved by apache#9362.

Note: This patch should not be merged until the changes in
https: //github.com/tlc-pack/tlcpack/pull/81 are reflected in CI.

Change-Id: Ib918595a1d9149e3c858ca761861304450cbfe13
ylc pushed a commit to ylc/tvm that referenced this issue Jan 7, 2022
This test was originally disabled due to the issue documented in apache#7455
affecting CI. I believe this has since been resolved by apache#9362.

Note: This patch should not be merged until the changes in
https: //github.com/tlc-pack/tlcpack/pull/81 are reflected in CI.

Change-Id: Ib918595a1d9149e3c858ca761861304450cbfe13
ylc pushed a commit to ylc/tvm that referenced this issue Jan 13, 2022
This test was originally disabled due to the issue documented in apache#7455
affecting CI. I believe this has since been resolved by apache#9362.

Note: This patch should not be merged until the changes in
https: //github.com/tlc-pack/tlcpack/pull/81 are reflected in CI.

Change-Id: Ib918595a1d9149e3c858ca761861304450cbfe13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants