[Prototype] Run tests in parallel #273
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
✨ Description
Allows running tests in parallel and using all the available gpus so we can run lots of tests fast. Pytest-xdist is already relatively good, but puts everything in the first GPU(s) and risks causing OOMs, port conflicts and other issues. I made a simple allocation and locking mechanism to prevent such issues, adapted from pytest-xdist-lock.
The system comes in a few steps:
get_test_resources
mark or a specialized decorator such asrequires_cuda
.set_per_process_memory_fraction
(5 GB by default for requested devices, 0 for other gpus), which is good enough for many tests.get_test_resources
fixture. This include fast-llm runs and distributed configs, for which I added config options and theget_distributed_config
fixture, and Megatron runs which useCUDA_VISIBLE_DEVICES
.What remains is to ensure that dependencies between tests are respected (i.e. that pytest-xdist and pytest-depends are compatible enough), and that shared resource files (ex. test dataset) are parallel-safe.
I got things to a relatively stable state up to ~20 workers, but things start to break above it. It's still enough to reduce slow tests from 8 minutes to ~2 minutes, most of which comes from parallel overhead (~1 minute) and the slowest test (~40 s), so it adds room for lots of extra tests.
🔍 Type of change
Select all that apply: