Skip to content

[SYCL][SYCL_INTEL_enqueue_barrier] submit_barrier and LevelZero backend. #3000

Closed
@al42and

Description

@al42and

Hello!

When a barrier is submitted using the queue::submit_barrier function (from SYCL_INTEL_enqueue_barrier), then waiting on it never finishes when running on the LevelZero backend.

Example code

#include <stdlib.h>
#include <CL/sycl.hpp>

int main()
{
    cl::sycl::device dev(cl::sycl::gpu_selector{});
    cl::sycl::queue q{dev};

    cl::sycl::event e = q.submit_barrier();
    e.wait_and_throw();

    return 0;
}

Built using the most recent (0877be6) version of the clang++: clang++ sycl_hang.cpp -fsycl -o sycl_hang.

Output:

$ SYCL_BE=PI_OPENCL ./sycl_hang && echo OK

WARNING: The legacy environment variables SYCL_BE and SYCL_DEVICE_TYPE are deprecated. Please use SYCL_DEVICE_FILTER instead. For details, please refer to https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md
OK
# Everything is fine

$ SYCL_BE=PI_LEVEL_ZERO ./sycl_hang && echo OK

WARNING: The legacy environment variables SYCL_BE and SYCL_DEVICE_TYPE are deprecated. Please use SYCL_DEVICE_FILTER instead. For details, please refer to https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md
# Just hangs here...

Some analysis

Looking at the output of ze_intercept, it seems the barrier is enqueued using zeCommandListAppendBarrier to a command list, but there are no subsequent zeCommandListClose/zeCommandQueueExecuteCommandLists calls to actually execute the list:

[some boilerplate initialization omitted]
>>>> [5385845] zeCommandListCreate: hContext = 0xf30750 hDevice = 0xced9f0 desc = 0x7fffe4ee8c30 phCommandList = 0x7fffe4ee8cc8 (hCommandList = 0)
<<<< [5430639] zeCommandListCreate [44794 ns] hCommandList = 0xf36730 -> ZE_RESULT_SUCCESS (0)
>>>> [5445320] zeFenceCreate: hCommandQueue = 0xf35340 desc = 0x7fffe4ee8c50 phFence = 0x7fffe4ee8cd0 (hFence = 0)
<<<< [5462520] zeFenceCreate [17200 ns] hFence = 0xf377f0 -> ZE_RESULT_SUCCESS (0)
>>>> [5484902] zeEventPoolCreate: hContext = 0xf30750 desc = 0x7fffe4ee8b90 numDevices = 1 phDevices = 0xf339b0 (hDevices = 0xced9f0) phEventPool = 0xd3f7f0 (hEventPool = 0)
<<<< [5515106] zeEventPoolCreate [30204 ns] hDevices = 0xced9f0 hEventPool = 0xf366c0 -> ZE_RESULT_SUCCESS (0)
>>>> [5538637] zeEventCreate: hEventPool = 0xf366c0 desc = 0x7fffe4ee8c40 phEvent = 0x7fffe4ee8c30 (hEvent = 0xf)
<<<< [5560631] zeEventCreate [21994 ns] hEvent = 0xf32170 -> ZE_RESULT_SUCCESS (0)
>>>> [5577792] zeCommandListAppendBarrier: hCommandList = 0xf36730 hSignalEvent = 0xf32170 numWaitEvents = 0 phWaitEvents = 0xf339b0 (hWaitEvents = 0xf369b3)
<<<< [5594327] zeCommandListAppendBarrier [16535 ns] hWaitEvents = 0xf369b3 -> ZE_RESULT_SUCCESS (0)
>>>> [5625409] zeEventHostSynchronize: hEvent = 0xf32170 timeout = 4294967295
# Just hangs here...

Note about the example code

The reduced example above hits an edge case in specs:

The returned event enters the info::event_command_status::complete state when all events that the barrier is dependent on (implicitly from all previously submitted commands to the same queue) have entered the info::event_command_status::complete state.

So, if there are no previous commands, the event is not guaranteed ever to enter the ::complete state. (I personally find the current behavior of OpenCL more logical: if the event has no dependencies, it should triggered immediately. But that is not relevant.)

However, the issue described here happens even if there are kernel launches before and after the barrier is submitted. I.e., this code also hangs on wait_and_throw.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions