Description
Describe the bug
Wrong order of the ~ur_device_handle_t_()
destructor and a user-app static buffer destructor.
The ur_device_handle_t_::~ur_device_handle_t_()
destructor of a CUDA device is incorrectly called too early before the sycl::~buffer
destructor of a user-app static buffer is called. As a result the CUDA device is destroyed before a memory allocated from this device is freed and the test segfaults - see a part of a log from the sycl/test-e2e/Regression/static-buffer-dtor.cpp
test:
UR ---> ~ur_device_handle_t_() ---> cuDevicePrimaryCtxRelease(0) // <--- destroying the CUDA device
UR <--- ~ur_device_handle_t_() <--- cuDevicePrimaryCtxRelease(0)
---> urEventWait
<--- urEventWait(.numEvents = 1, .phEventWaitList = 0x7fffee576128 {0x39f39d80}) -> UR_RESULT_SUCCESS;
---> urMemRelease // <--- freeing memory from the already destroyed CUDA device
[PID:459547 TID:459547 DEBUG UMF] umfFree: calling umfPoolFree(hPool = 0x7f2df7287268, ptr = 0x7f2dd4200200)
[PID:459547 TID:459547 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7f2df7295068, pool=0x7f2df7287268, ptr=0x7f2dd4200200, size=256
[PID:459547 TID:459547 DEBUG UMF] cu_memory_provider_free: cu_memory_provider_free(0x7f2dd4200200, 256)
[PID:459547 TID:459547 ERROR UMF] set_context: cuCtxGetCurrent() failed (cu_result = 4)
[PID:459547 TID:459547 ERROR UMF] cu2umf_result: CUDA driver has been deinitialized
[PID:459547 TID:459547 ERROR UMF] cu_memory_provider_free: Failed to set CUDA context, ret = 7
[PID:459547 TID:459547 ERROR UMF] trackingFree: upstream provider failed to free the memory
[PID:459547 TID:459547 DEBUG UMF] umfMemoryTrackerAdd: memory region is added, tracker=0x7f2df7295068, pool=0x7f2df7287268, ptr=0x7f2dd4200200, size=256
<--- urMemRelease(.hMem = 0x39678070) -> UR_RESULT_SUCCESS;
See: #17411 (comment)
Ref: #17411
To reproduce
-
Include a code snippet that is as short as possible - the
sycl/test-e2e/Regression/static-buffer-dtor.cpp
SYCL test. -
Specify the command which should be used to compile the program
$ build/bin/clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda sycl/test-e2e/Regression/static-buffer-dtor.cpp -Xarch_device -fsanitize=address -DMALLOC_DEVICE -O0 -g -o static_buffer_dtor
- Specify the command which should be used to launch the program
$ ONEAPI_DEVICE_SELECTOR="cuda:gpu" SYCL_UR_TRACE=2 UMF_LOG="level:debug;flush:debug;output:stderr;pid:yes" ./static_buffer_dtor
- Indicate what is wrong and what was expected
The sycl/test-e2e/Regression/static-buffer-dtor.cpp test segfaults on the PR #17468
See the reproduction in CI: https://github.com/intel/llvm/actions/runs/13855853921/job/38773587493?pr=17468
Environment
- OS: Linux (Ubuntu 22.04.4 LTS)
- DPC++ version: on the PR: [UR] Bump UMF to v0.11.0-dev4 #17468
- Dependencies version: [e.g. the output of
sycl-ls --verbose
]
$ sycl-ls --verbose
Warning: ONEAPI_DEVICE_SELECTOR environment variable is set to cuda:gpu.
To see the correct device id, please unset ONEAPI_DEVICE_SELECTOR.
[ext_oneapi_cuda:gpu:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3060 8.6 [CUDA 12.6]
Platforms: 1
Platform [#1]:
Version : CUDA 12.6
Name : NVIDIA CUDA BACKEND
Vendor : NVIDIA Corporation
Devices : 1
Device [#0]:
Type : gpu
Version : 8.6
Name : NVIDIA GeForce RTX 3060
Vendor : NVIDIA Corporation
Driver : CUDA 12.6
Aspects : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations ext_intel_pci_address usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_oneapi_bfloat16_math_functions ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_widthur_print: Images are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime.
ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_interop_memory_import ext_oneapi_interop_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_non_uniform_groups
info::device::sub_group_sizes: 32
default_selector() : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3060 8.6 [CUDA 12.6]
accelerator_selector() : No device of requested type available. Please chec...
cpu_selector() : No device of requested type available. Please chec...
gpu_selector() : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3060 8.6 [CUDA 12.6]
custom_selector(gpu) : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3060 8.6 [CUDA 12.6]
custom_selector(cpu) : No device of requested type available. Please chec...
custom_selector(acc) : No device of requested type available. Please chec...
Additional context
This bug can happen if SYCL calls urAdapterRelease()
before this static sycl::buffer
is destroyed.