Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for CUDA availability at run time. #916

Merged
merged 7 commits into from
Nov 6, 2019
Merged

Conversation

maleadt
Copy link
Collaborator

@maleadt maleadt commented Nov 2, 2019

WIP: necessary changes over at CuArrays/CUDAdrv/CUDAnative have not been pushed/released yet.

Continuing the package loading / conditional dependency saga, here is another attempt to cover all requirements. With the recent CUDAapi.jl-based scheme, we made it possible to add regular Pkg dependencies on GPU packages, but that had one important flaw: once the application (e.g. Flux) was precompiled without GPU support, there was no easy way for the user to "fix" GPU support and reload Flux. So we added some hacks to detect that and remove the compile cache during __init__.

Furthermore, several users want the GPU packages and applications like Flux to be precompilable on a system without a GPU, e.g. the login node of a cluster, or during the build step of a container. This is similarly incompatible with conditionally loading GPU packages, which would bake the GPU-less state into the precompilation image.

There's also certain parts of our infrastructure, like Documenter (being tightly linked to Travis) and the new automatic package registry automerge bot, that expect packages to be loadable at all times.

Bottom line, the GPU stack should be loadable and precompilable regardless of wether it can and will be used. I'm working on exactly that right now, where CUDAdrv/CUDAnative/CuArrays would always be loadable, print a (silencable) warning if something goes, and have a $module.functional() method to query the state of the package. In this PR, I adapt Flux to use those APIs.

Advantages:

  • Flux can be precompiled, including optional GPU support
  • Changes to the system that affect GPU support are reflected by just reloading Flux

Disadvantages:

  • functions like gpu are now type unstable (returning either an Array or CuArray)
  • CUDNN is mandatory, as there's currently no easy way to conditionally enable that functionality based on a run time flag

Both disadvantages could be worked around by evaluating code at run time, but that would negate some of the precompilation advantages.

Thoughts? @MikeInnes

@maleadt maleadt mentioned this pull request Nov 2, 2019
@Sleort Sleort mentioned this pull request Nov 4, 2019
@MikeInnes
Copy link
Member

Seems sensible to me, given the options we have right now. gpu not being inferable is a very minor downside. Requiring CUDNN is bigger, but if we need to eval some code later on that seems ok (there is very little code on Flux's side, so the eval doesn't take too long).

@maleadt maleadt force-pushed the tb/runtime_use_cuda branch 2 times, most recently from 59c4889 to 117a823 Compare November 4, 2019 15:12
@maleadt maleadt changed the title RFC/WIP: Check for CUDA availability at run time. Check for CUDA availability at run time. Nov 4, 2019
@maleadt
Copy link
Collaborator Author

maleadt commented Nov 4, 2019

Updated to include newly released GPU packages that contain the necessary functionality. That also means only supporting Julia 1.2+ though.

Requiring CUDNN is bigger, but if we need to eval some code later on that seems ok (there is very little code on Flux's side, so the eval doesn't take too long).

OK, done.

@maleadt maleadt merged commit 08804a0 into master Nov 6, 2019
@maleadt maleadt deleted the tb/runtime_use_cuda branch November 6, 2019 08:47
@MikeInnes
Copy link
Member

Dropping support for 1.0 is not ideal; is there no way the loading changes can be backported to previous Cu* releases?

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Nov 6, 2019

Yeah, maintaining backwards compatibility would be nice.

@maleadt
Copy link
Collaborator Author

maleadt commented Nov 6, 2019

Yes, JuliaLang/julia#31403 as used in CUDAnative. I'll have a look at providing llvmcall alternatives for certain functionality.

@maleadt
Copy link
Collaborator Author

maleadt commented Nov 6, 2019

This was referenced Nov 6, 2019
@maleadt maleadt added the cuda label Nov 6, 2019
@findmyway
Copy link
Contributor

Hi, I think I encounter an error after this PR. Using the latest master branch with the following code:

module TestFlux
using Flux
end
julia> using TestFlux
[ Info: Precompiling TestFlux [158674fa-8238-5cab-b5ba-03dfc80d1318]
ERROR: LoadError: InitError: UndefVarError: functional not defined
Stacktrace:
 [1] getproperty at ./Base.jl:13 [inlined]
...

@maleadt
Copy link
Collaborator Author

maleadt commented Nov 7, 2019

Yes, thanks for the report, looking into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants