Parallel test with pytest-xdist #518

gau-nernst · 2024-07-17T15:58:01Z

Currently this PR creates 2 workers, each with 2 GPUs

pytest test --verbose -s --dist load --tx popen//env:CUDA_VISIBLE_DEVICES=0,1 --tx popen//env:CUDA_VISIBLE_DEVICES=2,3

I think only FSDP tests require 2 GPUs

Lines 484 to 494 in 4f53882

    
           class TestQLoRA(FSDPTest): 
        
               @property 
        
               def world_size(self) -> int: 
        
                   return 2 
        
               @pytest.mark.skipif( 
        
                   version.parse(torch.__version__).base_version < "2.4.0", 
        
                   reason="torch >= 2.4 required", 
        
               ) 
        
               @skip_if_lt_x_gpu(2) 
        
               def test_qlora_fsdp2(self):

ao/test/prototype/test_low_bit_optim.py

Lines 161 to 168 in 4f53882

    
           class TestFSDP2(FSDPTest): 
        
               @property 
        
               def world_size(self) -> int: 
        
                   return 2 
        
               @pytest.mark.skipif(not TORCH_VERSION_AFTER_2_4, reason="torch >= 2.4 required") 
        
               @skip_if_lt_x_gpu(2) 
        
               def test_fsdp2(self):

Thus, CI runtime can be further improved by

Mark these tests as "require 2 GPUs"
Run the remaining tests with 4 workers, each with 1 GPU
Run the 2-GPU tests in a separate step. Can either be 2 workers with 2 GPUs each, or just normal, non-parallel pytest since there are not a lot of 2-GPU tests right now.

UPDATE: parallel tests with 4 1-GPU workers + sequential multi-GPU tests is implemented now. Some problems I encountered along the way (for future reference and improvement). pytest-xdist provides multiple methods to distribute work across workers. Refer to here for the full list. I have tried the following

--dist load (send each test to next available worker):
- TestSaveLoadMeta use the same filename test.pth which is problematic when multiple tests write/read the same file. I made a unique name for each test to solve it for now, though the better way is to use pytest tmp_path fixture (not compatible with unittest.TestCase).
- test_quant_llm_quantize_ebits_2_mbits_2 gets a strange torch.compile warning in PyTorch 2.3 only torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py:436] [0/48] failed to eagerly compile backwards for dynamic, suppressing in case backwards not needed. Looks harmless though.
--dist loadscope (send a TestClass to next available worker):
- TestSaveLoadMeta gets a strange error - RuntimeError: Error: accessing tensor output of CUDAGraphs that has been overwritten by a subsequent forward run. Stack trace: [Could not find stack trace]. To prevent overwriting, clone the tensor outside of torch.compile() or call torch.compiler.cudagraph_mark_step_begin() before each model invocation.. No traceback. Since tests within a class are run by the same worker, there shouldn't be filename conflict test.py. Perhaps some strange interaction between how pytest-xdist sends test to the subprocess worker and CUDA graphs + torch.compile?
--dist loadfile (send each file to next available worker): this should be the least error-prone. However, there is little speedup (24min vs 35min) since test_integration.py file is huge -> load imbalance (probably a good idea to split this into smaller files anyway).

The current approach of this PR is using --dist load. To summarize

Approach	CUDA nightly CI time
pytest sequential (baseline)	35min
pytest-xdist with two 2-GPU workers (this PR, original)	21min
pytest-xdist with four 1-GPU workers + sequential multi-GPU tests (this PR, latest)	19min

I also notice that it takes 3min to pull the Docker image and another 3-4min to download and install packages. So it seems like two 2-GPU workers have pretty good scaling (7 + 14 vs 7 + 28), while four 1-GPU workers is not much of an improvement (7 + 12 vs 7 + 28). I guess the 2-GPU FSDP tests can be run in parallel too for the latest configuration.

pytorch-bot · 2024-07-17T15:58:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/518

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4c3c1fd with merge base 4f53882 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

msaroufim · 2024-07-18T04:37:17Z

What I had tried messing around with before was using https://docs.pytest.org/en/stable/how-to/mark.html

which is useful to mark the 2 gpu and 1 gpu tests independently, and in the github action file you can just have 2 pytest commands to run them one after the other

gau-nernst · 2024-07-18T04:44:39Z

Yup that's what I have in mind too. Just want to make sure only those 2 tests use 2 GPUs.

msaroufim · 2024-07-18T16:58:37Z

We can merge this but let's keep an eye out on whether this makes debugging test failures more tricky to debug in teh future

gau-nernst · 2024-07-19T02:15:49Z

@msaroufim I keep seeing this test failing

test/integration/test_integration.py::TestSaveLoadMeta::test_save_load_int8woqtensors

Not exactly clear what's causing it. Should we revert or try to fix it?

msaroufim · 2024-07-19T02:23:54Z

Revert and ask questions later

This reverts commit cbaff6c.

* try pytest-xdist * run multi_gpu tests separately * each worker run 1 file * use loadscope to balance load better (for test_integration.py) * change back to loadfile * per-test schedule * per-test sharding. avoid name collision

try pytest-xdist

471d705

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 17, 2024

gau-nernst marked this pull request as ready for review July 18, 2024 02:29

gau-nernst added 7 commits July 18, 2024 12:49

Merge branch 'pytorch:main' into improve_ci

6a8b460

run multi_gpu tests separately

607213b

each worker run 1 file

a88ef65

use loadscope to balance load better (for test_integration.py)

d545521

change back to loadfile

7df5cab

per-test schedule

60dbe86

per-test sharding. avoid name collision

4c3c1fd

msaroufim approved these changes Jul 18, 2024

View reviewed changes

msaroufim mentioned this pull request Jul 18, 2024

Parallel Tests #511

Closed

msaroufim merged commit cbaff6c into pytorch:main Jul 18, 2024
13 checks passed

gau-nernst deleted the improve_ci branch July 18, 2024 19:36

msaroufim added a commit that referenced this pull request Jul 19, 2024

Revert "Parallel test with pytest-xdist (#518)"

67a1726

This reverts commit cbaff6c.

msaroufim mentioned this pull request Jul 19, 2024

Revert "Parallel test with pytest-xdist" #526

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parallel test with pytest-xdist #518

Parallel test with pytest-xdist #518

Uh oh!

gau-nernst commented Jul 17, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 17, 2024 •

edited

Loading

Uh oh!

msaroufim commented Jul 18, 2024

Uh oh!

gau-nernst commented Jul 18, 2024

Uh oh!

msaroufim commented Jul 18, 2024

Uh oh!

Uh oh!

gau-nernst commented Jul 19, 2024

Uh oh!

msaroufim commented Jul 19, 2024 •

edited

Loading

Uh oh!

Uh oh!

	class TestQLoRA(FSDPTest):
	@property
	def world_size(self) -> int:
	return 2

	@pytest.mark.skipif(
	version.parse(torch.__version__).base_version < "2.4.0",
	reason="torch >= 2.4 required",
	)
	@skip_if_lt_x_gpu(2)
	def test_qlora_fsdp2(self):

	class TestFSDP2(FSDPTest):
	@property
	def world_size(self) -> int:
	return 2

	@pytest.mark.skipif(not TORCH_VERSION_AFTER_2_4, reason="torch >= 2.4 required")
	@skip_if_lt_x_gpu(2)
	def test_fsdp2(self):

Parallel test with pytest-xdist #518

Parallel test with pytest-xdist #518

Uh oh!

Conversation

gau-nernst commented Jul 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/518

✅ No Failures

Uh oh!

msaroufim commented Jul 18, 2024

Uh oh!

gau-nernst commented Jul 18, 2024

Uh oh!

msaroufim commented Jul 18, 2024

Uh oh!

Uh oh!

gau-nernst commented Jul 19, 2024

Uh oh!

msaroufim commented Jul 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

gau-nernst commented Jul 17, 2024 •

edited

Loading

pytorch-bot bot commented Jul 17, 2024 •

edited

Loading

msaroufim commented Jul 19, 2024 •

edited

Loading