Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gather is not friendly with matrix of size 0 on GPU #411

Open
YichengDWu opened this issue May 31, 2022 · 12 comments
Open

gather is not friendly with matrix of size 0 on GPU #411

YichengDWu opened this issue May 31, 2022 · 12 comments

Comments

@YichengDWu
Copy link

using NNlib, CUDA
julia>  NNlib.gather(rand(0,32),[2,3,4]) #on CPU
0×3 Matrix{Float64} 


julia> a = rand(0,32)|>gpu
0×32 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}

julia> idx = cu[2,4,6]
3-element CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}:
 2
 4
 6

julia>  NNlib.gather(a,idx)
ERROR: DivideError: integer division error
Stacktrace:
 [1] div
   @ .\int.jl:284 [inlined]
 [2] div
   @ .\div.jl:257 [inlined]
 [3] div
   @ .\div.jl:312 [inlined]
 [4] cld
   @ .\div.jl:269 [inlined]
 [5] gather!(dst::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, src::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, idx::CuArray{Int64, 1, CUDA.Mem.DeviceBuffer})
   @ NNlibCUDA C:\Users\Luffy\.julia\packages\NNlibCUDA\vECff\src\gather.jl:62
 [6] gather(src::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, idx::CuArray{Int64, 1, CUDA.Mem.DeviceBuffer})
   @ NNlib C:\Users\Luffy\.julia\packages\NNlib\hydo3\src\gather.jl:77
 [7] top-level scope
   @ REPL[135]:1
 [8] top-level scope
   @ C:\Users\Luffy\.julia\packages\CUDA\qAl31\src\initialization.jl:52
@mcabbott
Copy link
Member

Looks like gather allocates an empty array of the right size:

https://github.com/FluxML/NNlib.jl/blob/master/src/gather.jl#L76

so this can probably be fixed by adding a short-circuit like isempty(dst) && return dst in gather!, before it launches kernels?

https://github.com/FluxML/NNlibCUDA.jl/blob/master/src/gather.jl#L52

Should be an easy PR you're interested. Would want tests in this file:

https://github.com/FluxML/NNlibCUDA.jl/blob/master/test/gather.jl

@YichengDWu
Copy link
Author

so this can probably be fixed by adding a short-circuit like isempty(dst) && return dst in gather!, before it launches kernels?

We need to check max(index)<=size(src)[end-M:end]. This is missing in general if index is on GPU. Or we can simply write

if size(srt,1)==0
    NNlib.gather(srt,idx_gpu) = NNlib.gather(srt,cpu(idx_gpu))
end

This works fine but is not ideal

@YichengDWu
Copy link
Author

Maybe it's a good idea to remove the @inbounds macro in
https://github.com/FluxML/NNlibCUDA.jl/blob/fb6fe8efa4764e989d4a328232433ca0fde129bd/src/gather.jl#L32
and
https://github.com/FluxML/NNlibCUDA.jl/blob/fb6fe8efa4764e989d4a328232433ca0fde129bd/src/gather.jl#L43

After doing this, we have the desired error info

10×1 CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}:
ERROR: Out-of-bounds array access.
ERROR: Out-of-bounds array access.
ERROR: Out-of-bounds array access.
ERROR: Out-of-bounds array access.
ERROR: Out-of-bounds array access.
ERROR: Out-of-bounds array access.
ERROR: Out-of-bounds array access.
ERROR: Out-of-bounds array access.
ERROR: Out-of-bounds array access.
ERROR: Out-of-bounds array access.
ERROR: a exception was thrown during kernel execution.
       Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
       Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
       Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
       Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
       Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
       Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
       Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
       Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
       Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
       Run Julia on debug level 2 for device stack traces.
Error showing value of type CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}:
ERROR: KernelException: exception thrown during kernel execution on device NVIDIA GeForce RTX 3070 Laptop GPU
Stacktrace:

@YichengDWu
Copy link
Author

@yuehhua

@yuehhua
Copy link
Member

yuehhua commented Jun 1, 2022

It's not good to remove @inbounds in GPU kernel. Dimensions of CuArray should be checked outside kernel such that kernel should work properly. I agree with the idea from @mcabbott.

@YichengDWu
Copy link
Author

So the problem is we need to add a bounds checking function here, and make it compatible with empty arrays

@yuehhua
Copy link
Member

yuehhua commented Jun 2, 2022

In the beginning, gather is not considered to accept empty array. The CPU case is a coincidence. If it is intended to be compatible with GPU, returns a empty array is reasonable. If throwing an error is expected, just check if an empty array is received and throw the error out. You don't need to deal with bound check. The empty array input is root cause, and the indexing is derived issue.

@YichengDWu
Copy link
Author

YichengDWu commented Jun 2, 2022

The CPU case is a coincidence

It is a coincidence we should avoid. Using NNlib alone is not calling NNlibCUDA

using NNlib, CUDA

src = CUDA.rand(2,3)

NNlib.gather(src,cu[1,4])

ERROR: BoundsError: attempt to access 2×3 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer} at index [1:2, 4]
Stacktrace:
 [1] throw_boundserror(A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, I::Tuple{Base.Slice{Base.OneTo{Int64}}, Int64})
   @ Base .\abstractarray.jl:691
 [2] checkbounds
   @ .\abstractarray.jl:656 [inlined]
 [3] view
   @ C:\Users\Luffy\.julia\packages\CUDA\qAl31\src\array.jl:617 [inlined]
 [4] _view(X::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, colons::Tuple{Colon}, k::Int64)
   @ NNlib C:\Users\Luffy\.julia\packages\NNlib\hydo3\src\scatter.jl:38
 [5] gather!(dst::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, src::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, idx::CuArray{Int64, 1, CUDA.Mem.DeviceBuffer})
   @ NNlib C:\Users\Luffy\.julia\packages\NNlib\hydo3\src\gather.jl:27
 [6] gather(src::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, idx::CuArray{Int64, 1, CUDA.Mem.DeviceBuffer})
   @ NNlib C:\Users\Luffy\.julia\packages\NNlib\hydo3\src\gather.jl:77
 [7] top-level scope
   @ c:\Users\Luffy\gather_test.jl:5

unless we write

using NNlib, CUDA
using NNlibCUDA

src = CUDA.rand(2,3)

NNlib.gather(src,cu[1,4])

2×2 CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}:
 0.430532  0.0
 0.474528  0.0

Now we have the bounds checking thing. This causes problems downstream as in CarloLucibello/GraphNeuralNetworks.jl#181. Now it's more like a bug.

the indexing is derived issue

As you can see above, bounds checking is a separate issue, empty array is another, and you need to deal with both.

@yuehhua
Copy link
Member

yuehhua commented Jun 2, 2022

Oh! Now I get your point.

@YichengDWu
Copy link
Author

YichengDWu commented Jun 2, 2022

The so-called coincidence may be the third issue. For instance, if I'm using Flux.jl, which imports NNlibCUDA, then I have no idea I should always put the index on GPU. Very easily, I could write something like

NNlib.gather(srt,[2,3,4])

It won't throw an error since under the hood we are calling NNlib.gather! not NNlibCUDA.gather!. Assuming everything is fine with NNlibCUDA.gather!, then either we should automatically move the index to GPU when NNlibCUDA.gather! is in the namespace (it should always be there since we already know srt is on GPU) or it throws an error like "srt is on GPU, but idx is on CPU"

@yuehhua
Copy link
Member

yuehhua commented Jun 2, 2022

For these points, you can fire corresponding issues.

@YichengDWu
Copy link
Author

#416 and #415

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants