Closed
Description
So... I messed up with #429 . Turns out that inbounds=true
works fine for the GPU kernels I tried, it does not work for CPU kernels because of an issue with the @index(...)
macro:
julia> using KernelAbstractions
julia> @kernel function check()
idx = @index(Global, Linear)
end
julia> kernel! = check(get_backend([]), 10)
KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(10,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_check)}(CPU(false), cpu_check)
julia> kernel!(ndrange=1)
julia> @kernel function check()
@inbounds idx = @index(Global, Linear)
end
julia> kernel!(ndrange=1)
ERROR: MethodError: no method matching __index_Global_Linear(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, CartesianIndex{1}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(10,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}})
Closest candidates are:
__index_Global_Linear(::Any, ::CartesianIndex)
@ KernelAbstractions ~/projects/gpu/KernelAbstractions.jl/src/cpu.jl:134
Stacktrace:
[1] cpu_check
@ ~/projects/gpu/KernelAbstractions.jl/src/macros.jl:282 [inlined]
[2] __thread_run(tid::Int64, len::Int64, rem::Int64, obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(10,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_check)}, ndrange::Tuple{Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(10,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}, args::Tuple{}, dynamic::KernelAbstractions.NDIteration.DynamicCheck)
@ KernelAbstractions ~/projects/gpu/KernelAbstractions.jl/src/cpu.jl:115
[3] __run(obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(10,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_check)}, ndrange::Tuple{Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(10,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}, args::Tuple{}, dynamic::KernelAbstractions.NDIteration.DynamicCheck, static_threads::Bool)
@ KernelAbstractions ~/projects/gpu/KernelAbstractions.jl/src/cpu.jl:82
[4] (::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(10,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_check)})(; ndrange::Int64, workgroupsize::Nothing)
@ KernelAbstractions ~/projects/gpu/KernelAbstractions.jl/src/cpu.jl:44
[5] top-level scope
@ REPL[7]:1
Note that this is because idx
with @inbounds
is for some reason no longer CartesianIndex
for this fx:
@inline function __index_Global_Linear(ctx, idx::CartesianIndex)
I = @inbounds expand(__iterspace(ctx), __groupindex(ctx), idx)
@inbounds LinearIndices(__ndrange(ctx))[I]
end
For ROCKernels, such functions do not dispatch on idx
and only work off of ctx
:
@device_override @inline function __index_Global_Linear(ctx)
I = @inbounds expand(__iterspace(ctx), AMDGPU.Device.blockIdx().x, AMDGPU.D
evice.threadIdx().x)
# TODO: This is unfortunate, can we get the linear index cheaper
@inbounds LinearIndices(__ndrange(ctx))[I]
end
I don't think we need to revert #429 because, well, it's not like older kernels are broken and it's a developmental feature (as noted in the docs).
Metadata
Metadata
Assignees
Labels
No labels