Description
Hello!
When a barrier is submitted using the queue::submit_barrier
function (from SYCL_INTEL_enqueue_barrier), then waiting on it never finishes when running on the LevelZero backend.
Example code
#include <stdlib.h>
#include <CL/sycl.hpp>
int main()
{
cl::sycl::device dev(cl::sycl::gpu_selector{});
cl::sycl::queue q{dev};
cl::sycl::event e = q.submit_barrier();
e.wait_and_throw();
return 0;
}
Built using the most recent (0877be6) version of the clang++: clang++ sycl_hang.cpp -fsycl -o sycl_hang
.
Output:
$ SYCL_BE=PI_OPENCL ./sycl_hang && echo OK
WARNING: The legacy environment variables SYCL_BE and SYCL_DEVICE_TYPE are deprecated. Please use SYCL_DEVICE_FILTER instead. For details, please refer to https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md
OK
# Everything is fine
$ SYCL_BE=PI_LEVEL_ZERO ./sycl_hang && echo OK
WARNING: The legacy environment variables SYCL_BE and SYCL_DEVICE_TYPE are deprecated. Please use SYCL_DEVICE_FILTER instead. For details, please refer to https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md
# Just hangs here...
Some analysis
Looking at the output of ze_intercept
, it seems the barrier is enqueued using zeCommandListAppendBarrier
to a command list, but there are no subsequent zeCommandListClose
/zeCommandQueueExecuteCommandLists
calls to actually execute the list:
[some boilerplate initialization omitted]
>>>> [5385845] zeCommandListCreate: hContext = 0xf30750 hDevice = 0xced9f0 desc = 0x7fffe4ee8c30 phCommandList = 0x7fffe4ee8cc8 (hCommandList = 0)
<<<< [5430639] zeCommandListCreate [44794 ns] hCommandList = 0xf36730 -> ZE_RESULT_SUCCESS (0)
>>>> [5445320] zeFenceCreate: hCommandQueue = 0xf35340 desc = 0x7fffe4ee8c50 phFence = 0x7fffe4ee8cd0 (hFence = 0)
<<<< [5462520] zeFenceCreate [17200 ns] hFence = 0xf377f0 -> ZE_RESULT_SUCCESS (0)
>>>> [5484902] zeEventPoolCreate: hContext = 0xf30750 desc = 0x7fffe4ee8b90 numDevices = 1 phDevices = 0xf339b0 (hDevices = 0xced9f0) phEventPool = 0xd3f7f0 (hEventPool = 0)
<<<< [5515106] zeEventPoolCreate [30204 ns] hDevices = 0xced9f0 hEventPool = 0xf366c0 -> ZE_RESULT_SUCCESS (0)
>>>> [5538637] zeEventCreate: hEventPool = 0xf366c0 desc = 0x7fffe4ee8c40 phEvent = 0x7fffe4ee8c30 (hEvent = 0xf)
<<<< [5560631] zeEventCreate [21994 ns] hEvent = 0xf32170 -> ZE_RESULT_SUCCESS (0)
>>>> [5577792] zeCommandListAppendBarrier: hCommandList = 0xf36730 hSignalEvent = 0xf32170 numWaitEvents = 0 phWaitEvents = 0xf339b0 (hWaitEvents = 0xf369b3)
<<<< [5594327] zeCommandListAppendBarrier [16535 ns] hWaitEvents = 0xf369b3 -> ZE_RESULT_SUCCESS (0)
>>>> [5625409] zeEventHostSynchronize: hEvent = 0xf32170 timeout = 4294967295
# Just hangs here...
Note about the example code
The reduced example above hits an edge case in specs:
The returned event enters the info::event_command_status::complete state when all events that the barrier is dependent on (implicitly from all previously submitted commands to the same queue) have entered the info::event_command_status::complete state.
So, if there are no previous commands, the event is not guaranteed ever to enter the ::complete
state. (I personally find the current behavior of OpenCL more logical: if the event has no dependencies, it should triggered immediately. But that is not relevant.)
However, the issue described here happens even if there are kernel launches before and after the barrier is submitted. I.e., this code also hangs on wait_and_throw
.