`inbounds=true` broken for CPU kernels

So... I messed up with #429 . Turns out that `inbounds=true` works fine for the GPU kernels I tried, it does not work for CPU kernels because of an issue with the `@index(...)` macro:

```
julia> using KernelAbstractions

julia> @kernel function check()
           idx = @index(Global, Linear)
       end

julia> kernel! = check(get_backend([]), 10)
KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(10,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_check)}(CPU(false), cpu_check)

julia> kernel!(ndrange=1)

julia> @kernel function check()
           @inbounds idx = @index(Global, Linear)
       end

julia> kernel!(ndrange=1)
ERROR: MethodError: no method matching __index_Global_Linear(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, CartesianIndex{1}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(10,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}})

Closest candidates are:
  __index_Global_Linear(::Any, ::CartesianIndex)
   @ KernelAbstractions ~/projects/gpu/KernelAbstractions.jl/src/cpu.jl:134

Stacktrace:
 [1] cpu_check
   @ ~/projects/gpu/KernelAbstractions.jl/src/macros.jl:282 [inlined]
 [2] __thread_run(tid::Int64, len::Int64, rem::Int64, obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(10,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_check)}, ndrange::Tuple{Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(10,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}, args::Tuple{}, dynamic::KernelAbstractions.NDIteration.DynamicCheck)
   @ KernelAbstractions ~/projects/gpu/KernelAbstractions.jl/src/cpu.jl:115
 [3] __run(obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(10,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_check)}, ndrange::Tuple{Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(10,)}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, Nothing}, args::Tuple{}, dynamic::KernelAbstractions.NDIteration.DynamicCheck, static_threads::Bool)
   @ KernelAbstractions ~/projects/gpu/KernelAbstractions.jl/src/cpu.jl:82
 [4] (::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(10,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_check)})(; ndrange::Int64, workgroupsize::Nothing)
   @ KernelAbstractions ~/projects/gpu/KernelAbstractions.jl/src/cpu.jl:44
 [5] top-level scope
   @ REPL[7]:1
```

Note that this is because `idx` with `@inbounds` is for some reason no longer `CartesianIndex` for this fx:
```
@inline function __index_Global_Linear(ctx, idx::CartesianIndex)
    I = @inbounds expand(__iterspace(ctx), __groupindex(ctx), idx)
    @inbounds LinearIndices(__ndrange(ctx))[I]
end
```

For ROCKernels, such functions do not dispatch on `idx` and only work off of `ctx`:
```
@device_override @inline function __index_Global_Linear(ctx)
    I =  @inbounds expand(__iterspace(ctx), AMDGPU.Device.blockIdx().x, AMDGPU.D
evice.threadIdx().x)
    # TODO: This is unfortunate, can we get the linear index cheaper
    @inbounds LinearIndices(__ndrange(ctx))[I]
end
```

I don't think we need to revert #429 because, well, it's not like older kernels are broken and it's a developmental feature (as noted in the docs).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`inbounds=true` broken for CPU kernels #430

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

inbounds=true broken for CPU kernels #430

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`inbounds=true` broken for CPU kernels #430