-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ONNX tests failing on master #3251
Comments
Found the offending PR on pytorch side: pytorch/pytorch#49410 The torch_onnx_test on master branch fails at around 1/14/2021 2:00 am This root cause is pytorch/pytorch#50163 which merges pytorch-onnx dev branch into pytorch version, merging happens at 1:56 pm on 1/13/2021. I imported the failure test into pytorch test |
@jiafatom Thanks for the great analysis/investigation. I see you were involved on the review of the PR pytorch/pytorch#49410. What would you recommend it the right course of action? Do you plan to raise a ticket and let the author of the original PR about the issue or you plan to send a PR upstream? Let me know if you need anything from my side. Thanks! |
@datumbox I have a PR to fix this issue on upstream: pytorch/pytorch#50582 It need some time for this PR get merged. For current policy with Facebook, we merge to pytorch branch when we have ~10 PRs in a batch. So we estimate this PR merge may happen in around 10-14 days. That means torch_vision test_onnx will still be red during this time. Do you have any comments on this? Thanks. Detail [A]: The difference is around rtol=0.0017 and atol = 2.7e-5, slightly larger than the bound rtol=0.001 and atol=1e-05. I feel it is acceptable - we can relax the error bar to unblock torch vision UT. Further analysis is a separate issue. |
@datumbox I just brought this issue at group meeting, please feel free to disable onnx test for now if needed. Thanks. |
@jiafatom Thanks for looking into it. We are currently completing the work of including FasterRCNN with MobileNetV3 backbone (#3253). Given that this bug affects the tests of *rcnn models, it makes it hard to confirm that the new model will be ONNX compatible. I wonder if your team could bring the PR faster as an exception for this use-case? |
cc @neginraoof, @spandantiwari |
…50582) Fixing pytorch/vision#3251 (PR #49410 triggers the torch vision test build failure, on three tests test_faster_rcnn, test_mask_rcnn, test_keypoint_rcnn. ) The offending PR is fine on pytorch UT, because the torchvision and pytorch test has a gap when we merge them - we are using different test API on two sides, therefore causing some discrepancy. This PR bridge the gap for the above three tests, and disable _jit_pass_onnx_fold_if pass until it gets fixed. Allow _jit_pass_onnx_fold_if only when dynamic_axes is None.
The above mentioned pytorch/pytorch#50582 got merged to our dev branch. After our dev branch merge into master, torch.vision onnx test should be fine. |
@jiafatom Much appreciated, thanks for the flexibility! |
Another thing I would bring up on torch vision side, |
@jiafatom You mean update the tests on |
…50582) Fixing pytorch/vision#3251 (PR #49410 triggers the torch vision test build failure, on three tests test_faster_rcnn, test_mask_rcnn, test_keypoint_rcnn. ) The offending PR is fine on pytorch UT, because the torchvision and pytorch test has a gap when we merge them - we are using different test API on two sides, therefore causing some discrepancy. This PR bridge the gap for the above three tests, and disable _jit_pass_onnx_fold_if pass until it gets fixed. Allow _jit_pass_onnx_fold_if only when dynamic_axes is None. [ghstack-poisoned]
…s is None (#50582)" Fixing pytorch/vision#3251 (PR #49410 triggers the torch vision test build failure, on three tests test_faster_rcnn, test_mask_rcnn, test_keypoint_rcnn. ) The offending PR is fine on pytorch UT, because the torchvision and pytorch test has a gap when we merge them - we are using different test API on two sides, therefore causing some discrepancy. This PR bridge the gap for the above three tests, and disable _jit_pass_onnx_fold_if pass until it gets fixed. Allow _jit_pass_onnx_fold_if only when dynamic_axes is None. [ghstack-poisoned]
…s is None (#50582)" Fixing pytorch/vision#3251 (PR #49410 triggers the torch vision test build failure, on three tests test_faster_rcnn, test_mask_rcnn, test_keypoint_rcnn. ) The offending PR is fine on pytorch UT, because the torchvision and pytorch test has a gap when we merge them - we are using different test API on two sides, therefore causing some discrepancy. This PR bridge the gap for the above three tests, and disable _jit_pass_onnx_fold_if pass until it gets fixed. Allow _jit_pass_onnx_fold_if only when dynamic_axes is None. [ghstack-poisoned]
…s is None (#50582)" Fixing pytorch/vision#3251 (PR #49410 triggers the torch vision test build failure, on three tests test_faster_rcnn, test_mask_rcnn, test_keypoint_rcnn. ) The offending PR is fine on pytorch UT, because the torchvision and pytorch test has a gap when we merge them - we are using different test API on two sides, therefore causing some discrepancy. This PR bridge the gap for the above three tests, and disable _jit_pass_onnx_fold_if pass until it gets fixed. Allow _jit_pass_onnx_fold_if only when dynamic_axes is None. Differential Revision: [D26023934](https://our.internmc.facebook.com/intern/diff/D26023934) [ghstack-poisoned]
…s is None (#50582)" Fixing pytorch/vision#3251 (PR #49410 triggers the torch vision test build failure, on three tests test_faster_rcnn, test_mask_rcnn, test_keypoint_rcnn. ) The offending PR is fine on pytorch UT, because the torchvision and pytorch test has a gap when we merge them - we are using different test API on two sides, therefore causing some discrepancy. This PR bridge the gap for the above three tests, and disable _jit_pass_onnx_fold_if pass until it gets fixed. Allow _jit_pass_onnx_fold_if only when dynamic_axes is None. [ghstack-poisoned]
…s is None (#50582)" Fixing pytorch/vision#3251 (PR #49410 triggers the torch vision test build failure, on three tests test_faster_rcnn, test_mask_rcnn, test_keypoint_rcnn. ) The offending PR is fine on pytorch UT, because the torchvision and pytorch test has a gap when we merge them - we are using different test API on two sides, therefore causing some discrepancy. This PR bridge the gap for the above three tests, and disable _jit_pass_onnx_fold_if pass until it gets fixed. Allow _jit_pass_onnx_fold_if only when dynamic_axes is None. Differential Revision: [D26050886](https://our.internmc.facebook.com/intern/diff/D26050886) [ghstack-poisoned]
…s is None (#50582)" Fixing pytorch/vision#3251 (PR #49410 triggers the torch vision test build failure, on three tests test_faster_rcnn, test_mask_rcnn, test_keypoint_rcnn. ) The offending PR is fine on pytorch UT, because the torchvision and pytorch test has a gap when we merge them - we are using different test API on two sides, therefore causing some discrepancy. This PR bridge the gap for the above three tests, and disable _jit_pass_onnx_fold_if pass until it gets fixed. Allow _jit_pass_onnx_fold_if only when dynamic_axes is None. Differential Revision: [D26050886](https://our.internmc.facebook.com/intern/diff/D26050886) [ghstack-poisoned]
…50582) (#50910) Summary: Pull Request resolved: #50910 Fixing pytorch/vision#3251 (PR #49410 triggers the torch vision test build failure, on three tests test_faster_rcnn, test_mask_rcnn, test_keypoint_rcnn. ) The offending PR is fine on pytorch UT, because the torchvision and pytorch test has a gap when we merge them - we are using different test API on two sides, therefore causing some discrepancy. This PR bridge the gap for the above three tests, and disable _jit_pass_onnx_fold_if pass until it gets fixed. Allow _jit_pass_onnx_fold_if only when dynamic_axes is None. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26050886 Pulled By: SplitInfinity fbshipit-source-id: b765ffe30914261866dcc761f0d0999fd16169e3
…ytorch#50582) Fixing pytorch/vision#3251 (PR pytorch#49410 triggers the torch vision test build failure, on three tests test_faster_rcnn, test_mask_rcnn, test_keypoint_rcnn. ) The offending PR is fine on pytorch UT, because the torchvision and pytorch test has a gap when we merge them - we are using different test API on two sides, therefore causing some discrepancy. This PR bridge the gap for the above three tests, and disable _jit_pass_onnx_fold_if pass until it gets fixed. Allow _jit_pass_onnx_fold_if only when dynamic_axes is None. ghstack-source-id: 53b04161a00c3f7ae959dada3780be360e3071b7 Pull Request resolved: pytorch#50910
Yes, what I mean is to add |
🐛 Bug
I seems that the ONNX tests are failing today on the latest master and the problem is probably related to changes upstream.
This was originally spotted on an unrelated PR but to confirm we reran the tests on previously day's passing master and it failed with the following errors:
cc @neginraoof, @spandantiwari , @jiafatom
The text was updated successfully, but these errors were encountered: