Clean up CUDA state between tests #2296

rraminen · 2025-06-26T07:33:42Z

This PR fixes the unit test,

test/test_cuda.py::TestCuda::test_set_per_process_memory_fraction FAILED [0.1163s]

Traceback (most recent call last):
  File "/var/lib/jenkins/pytorch/test/test_cuda.py", line 471, in test_set_per_process_memory_fraction
    tmp_tensor = torch.empty(application, dtype=torch.int8, device="cuda")
RuntimeError: Trying to create tensor with negative dimension -5681285432: [-5681285432]

This error is coming from an integer overflow when another unit test, test/test_cuda.py::TestCuda::test_randint_generation_for_large_numel creates a tensor with a huge numel, which overflows into a higher torch.cuda.max_memory_reserved() when you call test/test_cuda.py::TestCuda::test_set_per_process_memory_fraction afterward. To avoid this we introduced torch.cuda.empty_cache() and torch.cuda.reset_peak_memory_stats() to clean up CUDA states.

JIRA: https://ontrack-internal.amd.com/browse/SWDEV-535295

rocm-repo-management-api · 2025-06-26T07:55:46Z

Jenkins build for acea51aab3ae9c443b19a072ff7fa8791afe58a6 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Clean up CUDA state between tests

acea51a

rraminen requested review from jithunnair-amd and pruthvistony June 26, 2025 07:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clean up CUDA state between tests #2296

Clean up CUDA state between tests #2296

rraminen commented Jun 26, 2025 •

edited

Loading

Uh oh!

rocm-repo-management-api bot commented Jun 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Clean up CUDA state between tests #2296

Are you sure you want to change the base?

Clean up CUDA state between tests #2296

Conversation

rraminen commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rraminen commented Jun 26, 2025 •

edited

Loading

rocm-repo-management-api bot commented Jun 26, 2025 •

edited

Loading