Skip to content

L0 E2E tests fail when system has 2 distinct gpu devices #17294

Open
@fabiomestre

Description

@fabiomestre

Describe the bug

Running Sycl E2E tests that compile OpenCL kernels at runtime fails when the system has 2 distinct L0 capable GPUs (e.g. a Battlemage GPU and an iGPU).

This happens because sycl uses different flags in this scenario. If a single device is used, sycl will just pass the -device flag to ocloc. However, when there are distinct devices, sycl passes a list of extensions instead which triggers the bug. This logic can be found in kernel_compiler_opencl.cpp#L257

Compilation error log: ocloc_compilation_error.log

To reproduce

1- Use a server that contains 2 distinct intel GPUs. For example:

fabio@ed-dlpc-2e11:~/projects/dpcpp/llvm/cmake-build-l0-release-slurm-bmg/bin$ ./sycl-ls
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) B580 Graphics 20.1.0 [1.6.32536]
[level_zero:gpu][level_zero:1] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) UHD Graphics 770 12.2.0 [1.6.32536]

2 - Compile DPCPP with L0 support

3- Run E2E tests that compile OpenCL kernels using kernel bundles :

cd <build-dir>/tools/sycl/test-e2e
 ../../../bin/llvm-lit -sva RawKernelArg

Environment

  • OS: Linux
  • Target device and vendor: System with both intel iGPU and a Battlemage GPU.
  • DPC++ version: 194ec74
  • Dependencies version:
fabio@ed-dlpc-2e11:~/projects/dpcpp/llvm/cmake-build-l0-release-slurm-bmg/bin$ ./sycl-ls --verbose
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) B580 Graphics 20.1.0 [1.6.32536]
[level_zero:gpu][level_zero:1] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) UHD Graphics 770 12.2.0 [1.6.32536]

Platforms: 1
Platform [#1]:
    Version  : 1.6
    Name     : Intel(R) oneAPI Unified Runtime over Level-Zero
    Vendor   : Intel(R) Corporation
    Devices  : 2
        Device [#0]:
        Type              : gpu
        Version           : 20.1.0
        Name              : Intel(R) Arc(TM) B580 Graphics
        Vendor            : Intel(R) Corporation
        Driver            : 1.6.32536
        UUID              : 13412811226000030000000
        DeviceID          : 57867
        Num SubDevices    : 0
        Num SubSubDevices : 0
        Aspects           : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address ext_intel_gpu_eu_count ext_intel_gpu_eu_simd_width ext_intel_gpu_slices ext_intel_gpu_subslices_per_slice ext_intel_gpu_eu_count_per_subslice atomic64 ext_intel_device_info_uuid ext_intel_gpu_hw_threads_per_eu ext_oneapi_cuda_async_barrier ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_intel_legacy_image ext_oneapi_bindless_images ext_oneapi_bindless_images_1d_usm ext_oneapi_bindless_images_2d_usm ext_intel_esimd ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_intel_matrix ext_oneapi_graph ext_oneapi_limited_graph ext_oneapi_private_alloca ext_oneapi_queue_profiling_tag ext_oneapi_virtual_mem ext_oneapi_virtual_functions
        info::device::sub_group_sizes: 16 32
        Architecture: intel_gpu_bmg_g21
        Device [#1]:
        Type              : gpu
        Version           : 12.2.0
        Name              : Intel(R) UHD Graphics 770
        Vendor            : Intel(R) Corporation
        Driver            : 1.6.32536
        UUID              : 134128128167400002000000
        DeviceID          : 42880
        Num SubDevices    : 0
        Num SubSubDevices : 0
        Aspects           : gpu fp16 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address ext_intel_gpu_eu_count ext_intel_gpu_eu_simd_width ext_intel_gpu_slices ext_intel_gpu_subslices_per_slice ext_intel_gpu_eu_count_per_subslice atomic64 ext_intel_device_info_uuid ext_intel_gpu_hw_threads_per_eu ext_oneapi_cuda_async_barrier ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_width ext_intel_legacy_image ext_intel_esimd ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_oneapi_limited_graph ext_oneapi_private_alloca ext_oneapi_queue_profiling_tag ext_oneapi_virtual_mem ext_oneapi_virtual_functions
        info::device::sub_group_sizes: 8 16 32
        Architecture: intel_gpu_adl_s
default_selector()      : gpu, Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) B580 Graphics 20.1.0 [1.6.32536]
accelerator_selector()  : No device of requested type available.
cpu_selector()          : No device of requested type available.
gpu_selector()          : gpu, Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) B580 Graphics 20.1.0 [1.6.32536]
custom_selector(gpu)    : gpu, Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) B580 Graphics 20.1.0 [1.6.32536]
custom_selector(cpu)    : No device of requested type available.
custom_selector(acc)    : No device of requested type available.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedWe don't have ability to look into this at the moment, but contributions are welcome

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions