Fix C10_CUDA_CHECK for failing to capture last cuda error occasionall… · Mookel/pytorch@f40183d

Commit

Fix C10_CUDA_CHECK for failing to capture last cuda error occasionally (

pytorch#93192)

Fix C10_CUDA_CHECK for failing to capture last cuda error occasionally

This error was accidentally introduced by pytorch#92227, which was trying to fix_ pytorch#91758 as introduced in pytorch#85256.

The unit test `TestCuda.test_events_multi_gpu_elapsed_time` has been failed since that PR got merged (in cuda 11.8 and cuda 12.0). That test requires >=2 GPU, so it's probably not tested in the OSS CI?
```
python test/test_cuda.py -v -k TestCuda.test_events_multi_gpu_elapsed_time
```

E.g. in https://github.com/pytorch/pytorch/actions/runs/4026926691/jobs/6922406192
```
2023-01-27T19:41:32.2312162Z   test_events_multi_gpu_elapsed_time (__main__.TestCuda) ... skip: detected only one GPU (0.001s)
```

The original C10_CUDA_CHECK before pytorch#85256 has an extra `cudaGetLastError` that captures those cuda errors, https://github.com/pytorch/pytorch/pull/85256/files#diff-0823e63e781acf56e93a5553ed7feee0db0bda05d86e2560c7b80e87e32e0024L41-L42

This extra `cudaGetLastError` was originally introduced in pytorch#17337. As commented here https://github.com/pytorch/pytorch/pull/17337/files#r259104503

> soumith on Feb 21, 2019:
Without this, a previously raised error was still lingering and falsely being triggered for a subsequent CUDA call. colesbury suggested that this is the right thing to do.
Pull Request resolved: pytorch#93192
Approved by: https://github.com/ezyang

Loading branch information

xwang233 authored and pytorchmergebot committed Jan 28, 2023

1 parent aac9e52 commit f40183d

c10/cuda/CUDAException.cpp

-Original file line number
+Diff line change
@@ Expand Up / @@ -24,6 +24,9 @@ void c10_cuda_check_implementation( @@
         return;
       }
+      auto error_unused C10_UNUSED = cudaGetLastError();
+      (void)error_unused;
       std::string check_message;
     #ifndef STRIP_ERROR_MESSAGES
       check_message.append("CUDA error: ");
@@ Expand Down @@

0 comments on commit `f40183d`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `f40183d`

Commit

There are no files selected for viewing

0 comments on commit f40183d

0 comments on commit `f40183d`