Skip to content

[Prototype] Run tests in parallel #273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Conversation

jlamypoirier
Copy link
Collaborator

@jlamypoirier jlamypoirier commented May 16, 2025

✨ Description

Allows running tests in parallel and using all the available gpus so we can run lots of tests fast. Pytest-xdist is already relatively good, but puts everything in the first GPU(s) and risks causing OOMs, port conflicts and other issues. I made a simple allocation and locking mechanism to prevent such issues, adapted from pytest-xdist-lock.

The system comes in a few steps:

  • Test request a certain amount of gpus, gpu memory and ports through the get_test_resources mark or a specialized decorator such as requires_cuda.
  • The lock adapter safely allocates the gpu(s). It sets the default device to the first allocated one and restrict gpu usage through set_per_process_memory_fraction (5 GB by default for requested devices, 0 for other gpus), which is good enough for many tests.
  • For simple tests nothing more is needed, but more complex ones need to know the allocated gpus and ports, which they get through the get_test_resources fixture. This include fast-llm runs and distributed configs, for which I added config options and the get_distributed_config fixture, and Megatron runs which use CUDA_VISIBLE_DEVICES.
  • Once the test is done, the lock adapter checks that the allocation was respected, ensures that the GPU memory is de-allocated, and unlock the resources for other tests.

What remains is to ensure that dependencies between tests are respected (i.e. that pytest-xdist and pytest-depends are compatible enough), and that shared resource files (ex. test dataset) are parallel-safe.

I got things to a relatively stable state up to ~20 workers, but things start to break above it. It's still enough to reduce slow tests from 8 minutes to ~2 minutes, most of which comes from parallel overhead (~1 minute) and the slowest test (~40 s), so it adds room for lots of extra tests.

🔍 Type of change

Select all that apply:

  • 🐛 Bug fix (non-breaking change that addresses a specific issue)
  • 🚀 New feature (non-breaking change that adds functionality)
  • ⚠️ Breaking change (a change that could affect existing functionality)
  • 📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
  • 🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
  • 📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
  • 📝 Documentation change (updates documentation, including new content or typo fixes)
  • 🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

num_gpus=2,
compare=f"test_{TEST_MODEL}",
)
with gpu_lock(num_gpus=2):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally I'd use a mark or a better fixture to avoid this. Any pytest expert here knows how to do it? (@bigximik @tscholak)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how have you solved this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlamypoirier jlamypoirier mentioned this pull request May 28, 2025
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants