Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CuArrays -> CUDA #1204

Closed
wants to merge 21 commits into from
Closed

Conversation

DhairyaLGandhi
Copy link
Member

@DhairyaLGandhi DhairyaLGandhi commented Jun 1, 2020

This updates the compatibility bounds on CuArrays and moves it to CUDA.jl. CUDA.jl seems to lower bound on Julia 1.4 (@maleadt any chance we could run it on 1.3 somehow?).

PR Checklist

  • Tests are added
  • Entry in NEWS.md
  • Documentation, if applicable
  • Final review from @MikeInnes or @dhairyagandhi96 (for API changes).

@DhairyaLGandhi
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jun 1, 2020
@bors
Copy link
Contributor

bors bot commented Jun 1, 2020

try

Build failed:

@cossio
Copy link
Contributor

cossio commented Jun 1, 2020

fix #1194

@cossio
Copy link
Contributor

cossio commented Jun 1, 2020

Why do we need to support Julia 1.3?

@DhairyaLGandhi
Copy link
Member Author

We try to maintain backwards compatibility, for older versions and lts versions of the language. The 1.0 support wasn't dropped too long back.

@DhairyaLGandhi
Copy link
Member Author

I'm a bit confused, does this fix #1194 or is that a blocker?

@cossio
Copy link
Contributor

cossio commented Jun 1, 2020

This PR fixes #1194, as long as it makes Flux compatible with CUDA. I thought my comment would automatically close the issue when this gets merged, but apparently only the PR author can make these auto-close comments.

@DhairyaLGandhi
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jun 3, 2020
@bors
Copy link
Contributor

bors bot commented Jun 3, 2020

try

Build failed:

@DhairyaLGandhi
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jun 3, 2020
@bors
Copy link
Contributor

bors bot commented Jun 3, 2020

try

Build failed:

@DhairyaLGandhi
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jun 5, 2020
@bors
Copy link
Contributor

bors bot commented Jun 5, 2020

try

Build succeeded:

@DhairyaLGandhi
Copy link
Member Author

How about that! Needs FluxML/Zygote.jl#673

@DhairyaLGandhi
Copy link
Member Author

DhairyaLGandhi commented Jun 5, 2020

I believe CUDA.jl now supports 1.3 as well, so we don't have to up the version bounds on Flux. As suggested by Tim, I'll move over the tests on CUDA#master

bors bot added a commit that referenced this pull request Jun 22, 2020
@bors
Copy link
Contributor

bors bot commented Jun 22, 2020

try

Build failed:

@DhairyaLGandhi
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jun 22, 2020
@CarloLucibello
Copy link
Member

some of the failures are due to #1245 , we should fix it in zygote , although it's not clear when we introduced the regression

@bors
Copy link
Contributor

bors bot commented Jun 22, 2020

try

Build failed:

@CarloLucibello
Copy link
Member

This warning
https://travis-ci.org/github/FluxML/Flux.jl/jobs/700819230#L360
exposes some mixed types problems that were previously handled by NNlib through conversion. I think this should be fixed on the Flux side, probably just finding where in tests we are using Float64 and change it to Float32

@CarloLucibello
Copy link
Member

This error instead is weird https://travis-ci.org/github/FluxML/Flux.jl/jobs/700819230#L339
Maybe we can just deactivate NNPACK in NNlib and reconsider later

@DhairyaLGandhi
Copy link
Member Author

Yeah, that's what we had done with the environment variable NNLIB_USE_NNPACK

@DhairyaLGandhi
Copy link
Member Author

Why would handling it on Flux be beneficial?

@DhairyaLGandhi
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jun 29, 2020
@bors
Copy link
Contributor

bors bot commented Jun 29, 2020

try

Build failed:

@DhairyaLGandhi
Copy link
Member Author

DhairyaLGandhi commented Jun 30, 2020

Looking closer into the failings, it seems like the error is with the reported gradients, which worked fine until Zygote@0.4.20

julia> cuback(1.f0)
(Base.RefValue{Any}((cell = Base.RefValue{Any}((Wi = Float32[-0.028762825 -0.06328906  -0.003087131 -0.046078723; -0.009768781 -0.021495003  -0.0010484891 -0.015649816;  ; -0.013579503 -0.029880026  -0.0014574961 -0.021754682; -0.010848033 -0.023869764  -0.0011643259 -0.017378805], Wh = Float32[0.0 0.0  0.0 0.0; 0.0 0.0  0.0 0.0;  ; 0.0 0.0  0.0 0.0; 0.0 0.0  0.0 0.0], b = Float32[-0.06590673, -0.022384048, 0.069970295, -0.03814131, -0.019469338, 0.0, 0.0, 0.0, 0.0, 0.0, 0.26342788, 0.26144245, 0.09559605, 0.1319688, 0.32197332, -0.05778172, -0.019014964, 0.044791963, -0.03111588, -0.024857031], h = nothing, c = nothing)), init = nothing, state = (Float32[-0.08846992, 0.07109468, -0.15598866, -0.12647739, -0.002516523], Float32[0.48143172, 0.38970852, 0.38037091, 0.29649684, 0.40764716]))), Float32[-0.104312934; -0.112346165;  ; 0.091621324; -0.23611495])

and don't Zygote@0.4.21 onwards

julia> cuback(1.f0)
(Base.RefValue{Any}((cell = nothing, init = nothing, state = (Float32[-0.04327198, 0.024098, -0.077963844, -0.0560417, -0.017322173], Float32[0.46127376, 0.26323906, 0.18762967, 0.38597986, 0.39259586]))), Float32[0.17995863; 0.12015681;  ; 0.021471716; 0.031247895])

notice the nothing in the gradient for cell.

Edit: This is true for both cpu and GPU cases.

@CarloLucibello
Copy link
Member

yes it's #1245 . Maybe we should get this merged marking those tests as broken and fix somewhere else

@DhairyaLGandhi
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jul 1, 2020
@bors
Copy link
Contributor

bors bot commented Jul 1, 2020

try

Build failed:

@DhairyaLGandhi
Copy link
Member Author

I think we can merge this now, we need to patch NNlib and check the regression in Zygote.

@CarloLucibello CarloLucibello mentioned this pull request Jul 2, 2020
bors bot added a commit that referenced this pull request Jul 2, 2020
1258: move to CUDA.jl r=CarloLucibello a=CarloLucibello

continuing #1204 

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
bors bot added a commit that referenced this pull request Jul 2, 2020
1258: move to CUDA.jl r=CarloLucibello a=CarloLucibello

continuing #1204 

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
bors bot added a commit that referenced this pull request Jul 2, 2020
1258: move to CUDA.jl r=CarloLucibello a=CarloLucibello

continuing #1204 

Co-authored-by: Dhairya Gandhi <dhairya@juliacopmuting.com>
Co-authored-by: Dhairya Gandhi <dhairya@juliacomputing.com>
Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants