Skip to content

Conversation

psychedelicious
Copy link
Collaborator

Summary

I don't know what changed but at some point recently, MPS partial model loading tests have been getting flakier and flakier in CI, to the point that today not a single one has finished (have tried many times). We are OOMing in the tests.

I tried a few different things (clear torch caches before and after every model caching/loading test, clear GC, call torch.synchronize everywhere). Nothing worked. To unblock dev I've just marked these tests to skip in CI.

It's weird. If a change in our codebase broke CI, we'd expect that PR's CI to have never passed, preventing the offending change from getting into the codebase. Maybe it's a GH runner issue or upstream dependency change.

Related Issues / Discussions

n/a

QA Instructions

n/a

Merge Plan

This needs to merge ASAP so that other changes that alter python code can pass CI and themselves get merged.

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

@github-actions github-actions bot added the python-tests PRs that change python tests label Aug 18, 2025
@psychedelicious psychedelicious merged commit a8a0759 into main Aug 18, 2025
12 checks passed
@psychedelicious psychedelicious deleted the psyche/tests/skip-mps-on-ci branch August 18, 2025 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python-tests PRs that change python tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants