tests: skip flaky MPS tests on CI #8442
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
I don't know what changed but at some point recently, MPS partial model loading tests have been getting flakier and flakier in CI, to the point that today not a single one has finished (have tried many times). We are OOMing in the tests.
I tried a few different things (clear torch caches before and after every model caching/loading test, clear GC, call torch.synchronize everywhere). Nothing worked. To unblock dev I've just marked these tests to skip in CI.
It's weird. If a change in our codebase broke CI, we'd expect that PR's CI to have never passed, preventing the offending change from getting into the codebase. Maybe it's a GH runner issue or upstream dependency change.
Related Issues / Discussions
n/a
QA Instructions
n/a
Merge Plan
This needs to merge ASAP so that other changes that alter python code can pass CI and themselves get merged.
Checklist
What's New
copy (if doing a release after this PR)