Description
A test is added here #18721 that fails on the level_zero backend but passes for the cuda backend.
Note that this does pass on L0 with the latest drivers, but still fails on windows for l0. We have explicitly tested windows on cuda for this example and know that it is passing.
The test allocates a single element array of sampled_image_handle*
, pointing to a valid sampled_image_handle
that is dereferenced on the device. The values returned from fetch_image
using this handle are incorrect. Note that performing a single dereference (i.e using a 1d array) does work. It is only when adding the sampled_image_handle**
allocating and dereferencing from there that the failure is seen. The failure also occurs identically if unsampled_image_handle
is used instead of sampled_image_handle
.
The (L0 backend) IR generated for this test that deals with the dereference looks like the IR generated for the cuda backend: the dereferencing of the pointers looks correct. It is straightforward to also show that using a non image_handle
type (e.g. long
) to create a similar 2d dynamic C array that is dereferenced in the same way works correctly even on the level_zero backend, so this bug appears to be isolated to this images case covered by the above linked test.
fetch_image
itself does differ in l0 impl compared to cuda, but nothing is happening that I see as an issue that could cause this failure. The only opaque function that is called within fetch_image
is the corresponding spirv function __spirv_ImageRead
, which our team suspects could be the culprit. This could possibly be related to this issue for the corresponding spirv write function #17807.
We do not think that anything involved in allocating/copying the sampled_image_handle
is likely to cause problems, especially at the sycl runtime level (the l0 ur implementation of image allocated/copy does differ considerably from the cuda backend, involves lots of complicated reinterpret_casts
of different ur types, and is hard to follow, but our team thinks things generally should work as they are currently implemented).