Open
Description
Describe the bug
Running Sycl E2E tests that compile OpenCL kernels at runtime fails when the system has 2 distinct L0 capable GPUs (e.g. a Battlemage GPU and an iGPU).
This happens because sycl uses different flags in this scenario. If a single device is used, sycl will just pass the -device
flag to ocloc
. However, when there are distinct devices, sycl passes a list of extensions instead which triggers the bug. This logic can be found in kernel_compiler_opencl.cpp#L257
Compilation error log: ocloc_compilation_error.log
To reproduce
1- Use a server that contains 2 distinct intel GPUs. For example:
fabio@ed-dlpc-2e11:~/projects/dpcpp/llvm/cmake-build-l0-release-slurm-bmg/bin$ ./sycl-ls
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) B580 Graphics 20.1.0 [1.6.32536]
[level_zero:gpu][level_zero:1] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) UHD Graphics 770 12.2.0 [1.6.32536]
2 - Compile DPCPP with L0 support
3- Run E2E tests that compile OpenCL kernels using kernel bundles :
cd <build-dir>/tools/sycl/test-e2e
../../../bin/llvm-lit -sva RawKernelArg
Environment
- OS: Linux
- Target device and vendor: System with both intel iGPU and a Battlemage GPU.
- DPC++ version: 194ec74
- Dependencies version:
fabio@ed-dlpc-2e11:~/projects/dpcpp/llvm/cmake-build-l0-release-slurm-bmg/bin$ ./sycl-ls --verbose
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) B580 Graphics 20.1.0 [1.6.32536]
[level_zero:gpu][level_zero:1] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) UHD Graphics 770 12.2.0 [1.6.32536]
Platforms: 1
Platform [#1]:
Version : 1.6
Name : Intel(R) oneAPI Unified Runtime over Level-Zero
Vendor : Intel(R) Corporation
Devices : 2
Device [#0]:
Type : gpu
Version : 20.1.0
Name : Intel(R) Arc(TM) B580 Graphics
Vendor : Intel(R) Corporation
Driver : 1.6.32536
UUID : 13412811226000030000000
DeviceID : 57867
Num SubDevices : 0
Num SubSubDevices : 0
Aspects : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address ext_intel_gpu_eu_count ext_intel_gpu_eu_simd_width ext_intel_gpu_slices ext_intel_gpu_subslices_per_slice ext_intel_gpu_eu_count_per_subslice atomic64 ext_intel_device_info_uuid ext_intel_gpu_hw_threads_per_eu ext_oneapi_cuda_async_barrier ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_intel_legacy_image ext_oneapi_bindless_images ext_oneapi_bindless_images_1d_usm ext_oneapi_bindless_images_2d_usm ext_intel_esimd ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_intel_matrix ext_oneapi_graph ext_oneapi_limited_graph ext_oneapi_private_alloca ext_oneapi_queue_profiling_tag ext_oneapi_virtual_mem ext_oneapi_virtual_functions
info::device::sub_group_sizes: 16 32
Architecture: intel_gpu_bmg_g21
Device [#1]:
Type : gpu
Version : 12.2.0
Name : Intel(R) UHD Graphics 770
Vendor : Intel(R) Corporation
Driver : 1.6.32536
UUID : 134128128167400002000000
DeviceID : 42880
Num SubDevices : 0
Num SubSubDevices : 0
Aspects : gpu fp16 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address ext_intel_gpu_eu_count ext_intel_gpu_eu_simd_width ext_intel_gpu_slices ext_intel_gpu_subslices_per_slice ext_intel_gpu_eu_count_per_subslice atomic64 ext_intel_device_info_uuid ext_intel_gpu_hw_threads_per_eu ext_oneapi_cuda_async_barrier ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_intel_legacy_image ext_intel_esimd ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_oneapi_limited_graph ext_oneapi_private_alloca ext_oneapi_queue_profiling_tag ext_oneapi_virtual_mem ext_oneapi_virtual_functions
info::device::sub_group_sizes: 8 16 32
Architecture: intel_gpu_adl_s
default_selector() : gpu, Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) B580 Graphics 20.1.0 [1.6.32536]
accelerator_selector() : No device of requested type available.
cpu_selector() : No device of requested type available.
gpu_selector() : gpu, Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) B580 Graphics 20.1.0 [1.6.32536]
custom_selector(gpu) : gpu, Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) B580 Graphics 20.1.0 [1.6.32536]
custom_selector(cpu) : No device of requested type available.
custom_selector(acc) : No device of requested type available.
Additional context
No response