Skip to content

[SYCL][UR][CUDA] wrong order of the ~ur_device_handle_t_() destructor and a user-app static buffer destructor #17450

Closed
@ldorau

Description

@ldorau

Describe the bug

Wrong order of the ~ur_device_handle_t_() destructor and a user-app static buffer destructor.

The ur_device_handle_t_::~ur_device_handle_t_() destructor of a CUDA device is incorrectly called too early before the sycl::~buffer destructor of a user-app static buffer is called. As a result the CUDA device is destroyed before a memory allocated from this device is freed and the test segfaults - see a part of a log from the sycl/test-e2e/Regression/static-buffer-dtor.cpp test:

UR ---> ~ur_device_handle_t_() ---> cuDevicePrimaryCtxRelease(0) // <--- destroying the CUDA device
UR <--- ~ur_device_handle_t_() <--- cuDevicePrimaryCtxRelease(0)
   ---> urEventWait
   <--- urEventWait(.numEvents = 1, .phEventWaitList = 0x7fffee576128 {0x39f39d80}) -> UR_RESULT_SUCCESS;
   ---> urMemRelease // <--- freeing memory from the already destroyed CUDA device
[PID:459547 TID:459547 DEBUG UMF] umfFree: calling umfPoolFree(hPool = 0x7f2df7287268, ptr = 0x7f2dd4200200)
[PID:459547 TID:459547 DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7f2df7295068, pool=0x7f2df7287268, ptr=0x7f2dd4200200, size=256
[PID:459547 TID:459547 DEBUG UMF] cu_memory_provider_free: cu_memory_provider_free(0x7f2dd4200200, 256)
[PID:459547 TID:459547 ERROR UMF] set_context: cuCtxGetCurrent() failed (cu_result = 4)
[PID:459547 TID:459547 ERROR UMF] cu2umf_result: CUDA driver has been deinitialized
[PID:459547 TID:459547 ERROR UMF] cu_memory_provider_free: Failed to set CUDA context, ret = 7
[PID:459547 TID:459547 ERROR UMF] trackingFree: upstream provider failed to free the memory
[PID:459547 TID:459547 DEBUG UMF] umfMemoryTrackerAdd: memory region is added, tracker=0x7f2df7295068, pool=0x7f2df7287268, ptr=0x7f2dd4200200, size=256
   <--- urMemRelease(.hMem = 0x39678070) -> UR_RESULT_SUCCESS;

See: #17411 (comment)
Ref: #17411

To reproduce

  1. Include a code snippet that is as short as possible - the sycl/test-e2e/Regression/static-buffer-dtor.cpp SYCL test.

  2. Specify the command which should be used to compile the program

$ build/bin/clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda sycl/test-e2e/Regression/static-buffer-dtor.cpp -Xarch_device -fsanitize=address -DMALLOC_DEVICE -O0 -g -o static_buffer_dtor
  1. Specify the command which should be used to launch the program
$ ONEAPI_DEVICE_SELECTOR="cuda:gpu" SYCL_UR_TRACE=2 UMF_LOG="level:debug;flush:debug;output:stderr;pid:yes" ./static_buffer_dtor
  1. Indicate what is wrong and what was expected

The sycl/test-e2e/Regression/static-buffer-dtor.cpp test segfaults on the PR #17468

See the reproduction in CI: https://github.com/intel/llvm/actions/runs/13855853921/job/38773587493?pr=17468

Environment

$ sycl-ls --verbose
Warning: ONEAPI_DEVICE_SELECTOR environment variable is set to cuda:gpu.
To see the correct device id, please unset ONEAPI_DEVICE_SELECTOR.

[ext_oneapi_cuda:gpu:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3060 8.6 [CUDA 12.6]

Platforms: 1
Platform [#1]:
    Version  : CUDA 12.6
    Name     : NVIDIA CUDA BACKEND
    Vendor   : NVIDIA Corporation
    Devices  : 1
        Device [#0]:
        Type       : gpu
        Version    : 8.6
        Name       : NVIDIA GeForce RTX 3060
        Vendor     : NVIDIA Corporation
        Driver     : CUDA 12.6
        Aspects    : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations ext_intel_pci_address usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_oneapi_bfloat16_math_functions ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_widthur_print: Images are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime.
 ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_interop_memory_import ext_oneapi_interop_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_non_uniform_groups
        info::device::sub_group_sizes: 32
default_selector()      : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3060 8.6 [CUDA 12.6]
accelerator_selector()  : No device of requested type available. Please chec...
cpu_selector()          : No device of requested type available. Please chec...
gpu_selector()          : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3060 8.6 [CUDA 12.6]
custom_selector(gpu)    : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3060 8.6 [CUDA 12.6]
custom_selector(cpu)    : No device of requested type available. Please chec...
custom_selector(acc)    : No device of requested type available. Please chec...

Additional context

This bug can happen if SYCL calls urAdapterRelease() before this static sycl::buffer is destroyed.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions