Check for CUDA availability at run time. #916

maleadt · 2019-11-02T10:35:08Z

~~WIP: necessary changes over at CuArrays/CUDAdrv/CUDAnative have not been pushed/released yet.~~

Continuing the package loading / conditional dependency saga, here is another attempt to cover all requirements. With the recent CUDAapi.jl-based scheme, we made it possible to add regular Pkg dependencies on GPU packages, but that had one important flaw: once the application (e.g. Flux) was precompiled without GPU support, there was no easy way for the user to "fix" GPU support and reload Flux. So we added some hacks to detect that and remove the compile cache during __init__.

Furthermore, several users want the GPU packages and applications like Flux to be precompilable on a system without a GPU, e.g. the login node of a cluster, or during the build step of a container. This is similarly incompatible with conditionally loading GPU packages, which would bake the GPU-less state into the precompilation image.

There's also certain parts of our infrastructure, like Documenter (being tightly linked to Travis) and the new automatic package registry automerge bot, that expect packages to be loadable at all times.

Bottom line, the GPU stack should be loadable and precompilable regardless of wether it can and will be used. I'm working on exactly that right now, where CUDAdrv/CUDAnative/CuArrays would always be loadable, print a (silencable) warning if something goes, and have a $module.functional() method to query the state of the package. In this PR, I adapt Flux to use those APIs.

Advantages:

Flux can be precompiled, including optional GPU support
Changes to the system that affect GPU support are reflected by just reloading Flux

Disadvantages:

functions like gpu are now type unstable (returning either an Array or CuArray)
~~CUDNN is mandatory, as there's currently no easy way to conditionally enable that functionality based on a run time flag~~

Both disadvantages could be worked around by evaluating code at run time, but that would negate some of the precompilation advantages.

Thoughts? @MikeInnes

MikeInnes · 2019-11-04T11:43:52Z

Seems sensible to me, given the options we have right now. gpu not being inferable is a very minor downside. Requiring CUDNN is bigger, but if we need to eval some code later on that seems ok (there is very little code on Flux's side, so the eval doesn't take too long).

maleadt · 2019-11-04T15:50:13Z

Updated to include newly released GPU packages that contain the necessary functionality. That also means only supporting Julia 1.2+ though.

Requiring CUDNN is bigger, but if we need to eval some code later on that seems ok (there is very little code on Flux's side, so the eval doesn't take too long).

OK, done.

MikeInnes · 2019-11-06T08:48:49Z

Dropping support for 1.0 is not ideal; is there no way the loading changes can be backported to previous Cu* releases?

DhairyaLGandhi · 2019-11-06T09:03:02Z

Yeah, maintaining backwards compatibility would be nice.

maleadt · 2019-11-06T09:21:48Z

Yes, JuliaLang/julia#31403 as used in CUDAnative. I'll have a look at providing llvmcall alternatives for certain functionality.

maleadt · 2019-11-06T09:38:06Z

JuliaGPU/CUDAnative.jl#489

findmyway · 2019-11-07T04:58:27Z

Hi, I think I encounter an error after this PR. Using the latest master branch with the following code:

module TestFlux
using Flux
end

julia> using TestFlux
[ Info: Precompiling TestFlux [158674fa-8238-5cab-b5ba-03dfc80d1318]
ERROR: LoadError: InitError: UndefVarError: functional not defined
Stacktrace:
 [1] getproperty at ./Base.jl:13 [inlined]
...

maleadt · 2019-11-07T08:06:46Z

Yes, thanks for the report, looking into it.

Check for CUDA availability at run time.

39ab740

maleadt mentioned this pull request Nov 2, 2019

Update has_cudnn #913

Closed

Sleort mentioned this pull request Nov 4, 2019

Failed to precompile Flux #917

Closed

maleadt added 4 commits November 4, 2019 15:27

Conditionally include the CUDNN glue code.

a82b76c

Bump GPU packages.

dbcdf4d

Fix GPU-less tests.

33d276c

Bump Julia version.

916d3da

maleadt force-pushed the tb/runtime_use_cuda branch 2 times, most recently from 59c4889 to 117a823 Compare November 4, 2019 15:12

Use latest GPU CI templates.

6e8f8c1

maleadt force-pushed the tb/runtime_use_cuda branch from 117a823 to 6e8f8c1 Compare November 4, 2019 15:42

maleadt changed the title ~~RFC/WIP: Check for CUDA availability at run time.~~ Check for CUDA availability at run time. Nov 4, 2019

Update packages.

c9f369d

maleadt merged commit 08804a0 into master Nov 6, 2019

maleadt mentioned this pull request Nov 6, 2019

LoadError: LoadError: UndefVarError: libcudnn not defined #918

Closed

maleadt deleted the tb/runtime_use_cuda branch November 6, 2019 08:47

This was referenced Nov 6, 2019

Update Dependencies #893

Closed

Restore Julia 1.0 compatibility. #922

Merged

maleadt added the cuda label Nov 6, 2019

maleadt mentioned this pull request Nov 7, 2019

CUDA package initialization improvements #924

Merged

maleadt mentioned this pull request Mar 2, 2020

Include cuda/cuda.jl during precompilation? #1064

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check for CUDA availability at run time. #916

Check for CUDA availability at run time. #916

maleadt commented Nov 2, 2019 •

edited

Loading

MikeInnes commented Nov 4, 2019

maleadt commented Nov 4, 2019

MikeInnes commented Nov 6, 2019

DhairyaLGandhi commented Nov 6, 2019 •

edited

Loading

maleadt commented Nov 6, 2019

maleadt commented Nov 6, 2019

findmyway commented Nov 7, 2019

maleadt commented Nov 7, 2019

Check for CUDA availability at run time. #916

Check for CUDA availability at run time. #916

Conversation

maleadt commented Nov 2, 2019 • edited Loading

MikeInnes commented Nov 4, 2019

maleadt commented Nov 4, 2019

MikeInnes commented Nov 6, 2019

DhairyaLGandhi commented Nov 6, 2019 • edited Loading

maleadt commented Nov 6, 2019

maleadt commented Nov 6, 2019

findmyway commented Nov 7, 2019

maleadt commented Nov 7, 2019

maleadt commented Nov 2, 2019 •

edited

Loading

DhairyaLGandhi commented Nov 6, 2019 •

edited

Loading