Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve GPU functionality #780

Merged
merged 10 commits into from
Aug 24, 2024
Merged

Improve GPU functionality #780

merged 10 commits into from
Aug 24, 2024

Conversation

ptiede
Copy link
Contributor

@ptiede ptiede commented Aug 23, 2024

This PR adds some missing methods required to ensure the DimensionalData works on the GPU.

The PR makes the following possible.

using CUDA
using DimensionalData

d = DimArray(CUDA.rand(64, 64), (X(1:64), Y(1:64))

# This is what GPUArraysExt enables
d .= CUDA.zeros(64, 64)

# copyto!(dest::AbstractDimArray, bc::Broadcast{<:AbstractArrayStyle{0}}) enables
d .= 0

# copyto!(dest::AbstractDimArray, bc::Broadcast{Nothing})
x = 1:2.0:128
d .= x .+ x'

# Change to mapreduce
mapreduce(x->x^2, + d)

I've also added some tests based on JLArrays to ensure test for potential scalar indexing problems in the future.

The biggest thing I am unsure of is the changes to mapreduce. The new implementation is pretty simple so I was uncertain if there was a specific case I missed. All the tests pass, but maybe this impacts a downstream package?

@lazarusA
Copy link
Collaborator

will this also work with Metal.jl?

@rafaqz
Copy link
Owner

rafaqz commented Aug 23, 2024

Is the extension really needed? Cant we just always do that with a more generic copyto! ?

I had also assumed that this already worked, we just can't test on GPUs in CI so it can break.

But DimensionalStyle should rewrap the AbstractGPUArrayStyle so the copyto! we have will work as it already forwards to the parent of the AbstractDimArray.

Maybe something has changed... but there is surely a more elegant way to do this.

src/array/broadcast.jl Outdated Show resolved Hide resolved
ptiede and others added 2 commits August 23, 2024 14:22
Co-authored-by: Rafael Schouten <rafaelschouten@gmail.com>
Co-authored-by: Rafael Schouten <rafaelschouten@gmail.com>
@ptiede
Copy link
Contributor Author

ptiede commented Aug 23, 2024

Is the extension really needed? Cant we just always do that with a more generic copyto! ?

I had also assumed that this already worked, we just can't test on GPUs in CI so it can break.

The JLArray extension should catch most of these. The only one it can't is
copyto!(::AbstractDimArray, ::Broadcasted{<:AbstractGPUArrayStyle})

But DimensionalStyle should rewrap the AbstractGPUArrayStyle so the copyto! we have will work as it already forwards to the parent of the AbstractDimArray.

Maybe something has changed... but there is surely a more elegant way to do this.

I added the methods when bc doesn't have DimensionalStyle, which happens at least in my code quite a bit.
I agree the method seems hacky, but I quickly ran into ambiguity errors when I tried to be more generic. I probably don't understand the broadcasting internals well enough tbh.

An an example, I tried to replace the extension with

function Base.copyto!(dest::AbstractDimArray, bc::Broadcasted{<:Broadcast.AbstractArrayStyle})
    copyto!(parent(dest), bc)
    dest
end

But I get the following ambiguity error

Candidates:
  copyto!(dest::AbstractDimArray, bc::Base.Broadcast.Broadcasted{<:Base.Broadcast.AbstractArrayStyle})
    @ DimensionalData ~/.julia/dev/DimensionalData/src/array/broadcast.jl:81
  copyto!(dest::AbstractArray, bc::Base.Broadcast.Broadcasted{<:GPUArraysCore.AbstractGPUArrayStyle})
    @ GPUArrays ~/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl:44

Possible fix, define
  copyto!(::AbstractDimArray, ::Base.Broadcast.Broadcasted{<:GPUArraysCore.AbstractGPUArrayStyle})

which is annoying...

@ptiede
Copy link
Contributor Author

ptiede commented Aug 23, 2024

@lazarusA I believe it should work for any Julia supported GPU. I don't have a Mac to try this on, however.

@rafaqz
Copy link
Owner

rafaqz commented Aug 23, 2024

This is a potential workaround without dependencies:

@inline function Base.Broadcast.materialize!(dest::AbstractDimArray, bc::Base.Broadcast.Broadcasted{<:Any})
    style = DimensionalData.DimensionalStyle(Base.Broadcast.combine_styles(parent(dest), bc))
    return Base.materialize!(style, parent(dest), bc)
end

We will need to handle the dimension checks befor we do parent(dest) but that should be relatively easy using the code already in broadcast.jl.

@ptiede
Copy link
Contributor Author

ptiede commented Aug 23, 2024

This is a potential workaround without dependencies:

@inline function Base.Broadcast.materialize!(dest::AbstractDimArray, bc::Base.Broadcast.Broadcasted{<:Any})
    style = DimensionalData.DimensionalStyle(Base.Broadcast.combine_styles(parent(dest), bc))
    return Base.materialize!(style, parent(dest), bc)
end

We will need to handle the dimension checks befor we do parent(dest) but that should be relatively easy using the code already in broadcast.jl.

That works great for me! I just made the following modifications

@inline function Base.Broadcast.materialize!(dest::AbstractDimArray, bc::Base.Broadcast.Broadcasted{<:Any})
    style = DimensionalData.DimensionalStyle(Base.Broadcast.combine_styles(parent(dest), bc))
    Base.Broadcast.materialize!(style, parent(dest), bc)
    return dest
end

What dimension checks do you mean? I thought the dimension checks in

function Base.copyto!(dest::AbstractArray, bc::Broadcasted{DimensionalStyle{S}}) where S
_dims = comparedims(dims(dest), _broadcasted_dims(bc); ignore_length_one=true, order=true)
copyto!(dest, _unwrap_broadcasted(bc))
A = _firstdimarray(bc)
if A isa Nothing || _dims isa Nothing
dest
else
rebuild(A, dest, _dims, refdims(A))
end
end

should cover us since materialize will always call that.

@rafaqz
Copy link
Owner

rafaqz commented Aug 23, 2024

Oh you're right, that's it then! A few more tests that all this works with JLArrays could be nice.

@rafaqz
Copy link
Owner

rafaqz commented Aug 23, 2024

No actually dest won't be a DimArray in copyto! So we need that check in materialize!

@ptiede
Copy link
Contributor Author

ptiede commented Aug 23, 2024

Ya I just realized that as well. Working on the fix.

@ptiede
Copy link
Contributor Author

ptiede commented Aug 23, 2024

Quick question. Is it expected behavior that

ab = DimArray(rand(2,2), (X, Y))
ba = DimArray(rand(2,2), (Y, X))
 z = zeros(2,2)

function inplace!(z, ab, ba)
     z .= ab .+ ba
end

returns a DimArray? It seems strange that the output type of an in place broadcast changes, but this may be wanted.

The only reason I bring this up is that is indeed the wanted behavior then I need the following implementation

function Base.copyto!(dest::AbstractArray, bc::Broadcasted{DimensionalStyle{S}}) where S
    _dims = comparedims(dims(dest), _broadcasted_dims(bc); ignore_length_one=true, order=true)
    copyto!(dest, _unwrap_broadcasted(bc))
    A = _firstdimarray(bc)
    if A isa Nothing || _dims isa Nothing
        dest
    else
        rebuild(A, dest, _dims, refdims(A))
    end
end

@inline function Base.Broadcast.materialize!(dest::AbstractDimArray, bc::Base.Broadcast.Broadcasted{<:Any})
    # needed because we need to check whether the dims are compatible in dest which are already 
    # stripped when sent to copyto!
    _dims = comparedims(dims(dest), _broadcasted_dims(bc); ignore_length_one=true, order=true)
    style = DimensionalData.DimensionalStyle(Base.Broadcast.combine_styles(parent(dest), bc))
    Base.Broadcast.materialize!(style, parent(dest), bc)
    A = _firstdimarray(bc)
    if A isa Nothing || _dims isa Nothing
        dest
    else
        rebuild(A, parent(dest), _dims, refdims(A))
    end
end

to do get the tests to pass. The is slightly non-ideal because it means comparedims is getting called twice for most broadcasts.

@ptiede
Copy link
Contributor Author

ptiede commented Aug 23, 2024

Ya actually this is probably needed otherwise some dimension matches will happen.

Copy link

codecov bot commented Aug 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.01%. Comparing base (d10d0fc) to head (7786d77).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #780      +/-   ##
==========================================
+ Coverage   82.64%   85.01%   +2.36%     
==========================================
  Files          44       47       +3     
  Lines        4132     4277     +145     
==========================================
+ Hits         3415     3636     +221     
+ Misses        717      641      -76     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rafaqz
Copy link
Owner

rafaqz commented Aug 23, 2024

Ahh probably your right it should just return dest as-is. There may be a reason for it... but this was a log time ago and I can't remember it.

We may need to look at some base broadcast docs to see the expected behavior

@rafaqz
Copy link
Owner

rafaqz commented Aug 23, 2024

Turns out not returning dest is the cause of some of the performance issue in #777. Lets just return dest

src/array/methods.jl Outdated Show resolved Hide resolved
Co-authored-by: Rafael Schouten <rafaelschouten@gmail.com>
@rafaqz
Copy link
Owner

rafaqz commented Aug 24, 2024

One thought... Should we instead loop over test objects rather than duplicating all that code?

@ptiede
Copy link
Contributor Author

ptiede commented Aug 24, 2024

Probably. The annoying part is that some of the tests won't work on the GPU because there isn't a compatible implementation using the more generic GPUArrays, e.g., it only works on CUDA or it is missing all together. Some examples I ran into are:

  • mapslices
  • median(a, dims=1)
  • all(a; dims=a)*
  • sort*
  • cumsum*, cumsum!*
  • sortslices
  • unique

*: missing JLArray method but works on CUDA.

I can try to rewrite the tests to be more generic, but I probably won't have time for a little bit, given all the cases. I am about to head out for a conference next week so a lot of my attention will be on that.

@rafaqz
Copy link
Owner

rafaqz commented Aug 24, 2024

Ahh typical the test array isn't a full working implementation.

That makes sense to do separately then, let's merge as is

@ptiede
Copy link
Contributor Author

ptiede commented Aug 24, 2024

Ok thanks for all your help with the PR btw!

@rafaqz rafaqz merged commit 2b2b380 into rafaqz:main Aug 24, 2024
8 checks passed
@rafaqz
Copy link
Owner

rafaqz commented Aug 24, 2024

Apologies I actually need to revert this PR, it has breaking changes. I'll lump it with the performance PR instead.

rafaqz added a commit that referenced this pull request Aug 24, 2024
rafaqz added a commit that referenced this pull request Sep 1, 2024
* Improve GPU functionality

* Add missing weakdeps

* Update src/array/broadcast.jl

Co-authored-by: Rafael Schouten <rafaelschouten@gmail.com>

* Update src/array/broadcast.jl

Co-authored-by: Rafael Schouten <rafaelschouten@gmail.com>

* Push materialize fix

* Clean up mapreduce and add a bunch of tests for JLArray broadcast

* Add some more JLArray tests

* Just return dest in broadcast

* Update src/array/methods.jl

Co-authored-by: Rafael Schouten <rafaelschouten@gmail.com>

* Format

---------

Co-authored-by: Rafael Schouten <rafaelschouten@gmail.com>
rafaqz added a commit that referenced this pull request Sep 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants