Description
I'm unable to pinpoint at what point the error is occurring, as debugging KernelAbstractions kernels is different from normal Julia, so some guidance here would be helpful.
The issue is that some elements of the output array are 0.
I can't test this with any other driver other than PoCL either, since NVIDIA doesn't support SPIRV, so it would be nice if someone could test if it gives a different behavior with Intel drivers.
This probably points to a problem in our current code somewhere. Behavior is same both before and after the USM PR.
Edit:
Most recently:
Here's a reproducer:
using OpenCL, pocl_jll, KernelAbstractions
@kernel inbounds=true function _mwe!(@Const(v))
temp = @localmem Int8 (1,)
i = @index(Global, Linear)
@print i "\n"
@synchronize()
end
v = CLArray(rand(Float32, 10))
_mwe!(OpenCLBackend(), 256)(v, ndrange=length(v))
This prints 1...256.
The CUDA version of the same code prints 1...10.
using CUDA
b = CuArray(rand(Float32, 10))
_mwe!(CUDABackend(false, false), 256)(b, ndrange=length(b))
The issue is probably that ndrange is not working.
Consequently, creating a CLArray of size (multiple of) 256 works without any issues, for the any
and all
functions , as well as merge_sort
function