Skip to content

inbounds=true broken for CPU kernels #430

Closed
@leios

Description

@leios

So... I messed up with #429 . Turns out that inbounds=true works fine for the GPU kernels I tried, it does not work for CPU kernels because of an issue with the @index(...) macro:

julia> using KernelAbstractions

julia> @kernel function check()
           idx = @index(Global, Linear)
       end

julia> kernel! = check(get_backend([]), 10)
KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(10,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_check)}(CPU(false), cpu_check)

julia> kernel!(ndrange=1)

julia> @kernel function check()
           @inbounds idx = @index(Global, Linear)
       end

julia> kernel!(ndrange=1)
ERROR: MethodError: no method matching __index_Global_Linear(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, CartesianIndex{1}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(10,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}})

Closest candidates are:
  __index_Global_Linear(::Any, ::CartesianIndex)
   @ KernelAbstractions ~/projects/gpu/KernelAbstractions.jl/src/cpu.jl:134

Stacktrace:
 [1] cpu_check
   @ ~/projects/gpu/KernelAbstractions.jl/src/macros.jl:282 [inlined]
 [2] __thread_run(tid::Int64, len::Int64, rem::Int64, obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(10,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_check)}, ndrange::Tuple{Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(10,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}, args::Tuple{}, dynamic::KernelAbstractions.NDIteration.DynamicCheck)
   @ KernelAbstractions ~/projects/gpu/KernelAbstractions.jl/src/cpu.jl:115
 [3] __run(obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(10,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_check)}, ndrange::Tuple{Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(10,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}, args::Tuple{}, dynamic::KernelAbstractions.NDIteration.DynamicCheck, static_threads::Bool)
   @ KernelAbstractions ~/projects/gpu/KernelAbstractions.jl/src/cpu.jl:82
 [4] (::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(10,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_check)})(; ndrange::Int64, workgroupsize::Nothing)
   @ KernelAbstractions ~/projects/gpu/KernelAbstractions.jl/src/cpu.jl:44
 [5] top-level scope
   @ REPL[7]:1

Note that this is because idx with @inbounds is for some reason no longer CartesianIndex for this fx:

@inline function __index_Global_Linear(ctx, idx::CartesianIndex)
    I = @inbounds expand(__iterspace(ctx), __groupindex(ctx), idx)
    @inbounds LinearIndices(__ndrange(ctx))[I]
end

For ROCKernels, such functions do not dispatch on idx and only work off of ctx:

@device_override @inline function __index_Global_Linear(ctx)
    I =  @inbounds expand(__iterspace(ctx), AMDGPU.Device.blockIdx().x, AMDGPU.D
evice.threadIdx().x)
    # TODO: This is unfortunate, can we get the linear index cheaper
    @inbounds LinearIndices(__ndrange(ctx))[I]
end

I don't think we need to revert #429 because, well, it's not like older kernels are broken and it's a developmental feature (as noted in the docs).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions