Closed
Description
Describe the bug
I believe this is a CUDA.jl bug rather than Flux?
The env var CUDA_VISIBLE_DEVICES
is the canonical way to control which devices CUDA can see.
However, setting it to empty (i.e. disabling all devices) causes CUDA.jl to error when used via Flux.
julia> using Flux
julia> ENV["CUDA_VISIBLE_DEVICES"]=""
""
julia> x = gpu(rand(Float32,2,2))
ERROR: CUDA error: initialization error (code 3, ERROR_NOT_INITIALIZED)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA ~/.julia/packages/CUDA/nYggH/lib/cudadrv/error.jl:91
[2] macro expansion
@ ~/.julia/packages/CUDA/nYggH/lib/cudadrv/error.jl:101 [inlined]
[3] cuDeviceGet
@ ~/.julia/packages/CUDA/nYggH/lib/utils/call.jl:26 [inlined]
[4] CuDevice
@ ~/.julia/packages/CUDA/nYggH/lib/cudadrv/devices.jl:16 [inlined]
[5] TaskLocalState
@ ~/.julia/packages/CUDA/nYggH/src/state.jl:50 [inlined]
[6] task_local_state!()
@ CUDA ~/.julia/packages/CUDA/nYggH/src/state.jl:73
[7] active_state
@ ~/.julia/packages/CUDA/nYggH/src/state.jl:106 [inlined]
[8] #_alloc#178
@ ~/.julia/packages/CUDA/nYggH/src/pool.jl:183 [inlined]
[9] #alloc#177
@ ~/.julia/packages/CUDA/nYggH/src/pool.jl:173 [inlined]
[10] alloc
@ ~/.julia/packages/CUDA/nYggH/src/pool.jl:169 [inlined]
[11] CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}(#unused#::UndefInitializer, dims::Tuple{Int64, Int64})
@ CUDA ~/.julia/packages/CUDA/nYggH/src/array.jl:44
[12] CuArray
@ ~/.julia/packages/CUDA/nYggH/src/array.jl:287 [inlined]
[13] adapt_storage
@ ~/.julia/packages/CUDA/nYggH/src/array.jl:536 [inlined]
[14] adapt_structure
@ ~/.julia/packages/Adapt/wASZA/src/Adapt.jl:42 [inlined]
[15] adapt
@ ~/.julia/packages/Adapt/wASZA/src/Adapt.jl:40 [inlined]
[16] #cu#189
@ ~/.julia/packages/CUDA/nYggH/src/array.jl:546 [inlined]
[17] cu
@ ~/.julia/packages/CUDA/nYggH/src/array.jl:546 [inlined]
[18] adapt_storage
@ ~/.julia/packages/Flux/BPPNj/src/functor.jl:66 [inlined]
[19] adapt_structure
@ ~/.julia/packages/Adapt/wASZA/src/Adapt.jl:42 [inlined]
[20] adapt
@ ~/.julia/packages/Adapt/wASZA/src/Adapt.jl:40 [inlined]
[21] #132
@ ~/.julia/packages/Flux/BPPNj/src/functor.jl:146 [inlined]
[22] fmap(f::Flux.var"#132#133", x::Matrix{Float32}; exclude::typeof(Flux._isbitsarray), walk::typeof(Functors._default_walk), cache::IdDict{Any, Any})
@ Functors ~/.julia/packages/Functors/hIysk/src/functor.jl:121
[23] gpu(x::Matrix{Float32})
@ Flux ~/.julia/packages/Flux/BPPNj/src/functor.jl:146
[24] top-level scope
@ REPL[7]:1
[25] top-level scope
@ ~/.julia/packages/CUDA/nYggH/src/initialization.jl:52
Manifest.toml
(jl_QuH7kn) pkg> st --manifest
Status `/tmp/jl_QuH7kn/Manifest.toml`
[621f4979] AbstractFFTs v1.1.0
[1520ce14] AbstractTrees v0.3.4
[79e6a3ab] Adapt v3.3.3
[4fba245c] ArrayInterface v3.2.2
[ab4f0b2a] BFloat16s v0.2.0
[fa961155] CEnum v0.4.1
[052768ef] CUDA v3.7.0
[082447d4] ChainRules v1.23.0
[d360d2e6] ChainRulesCore v1.11.6
[9e997f8a] ChangesOfVariables v0.1.2
[944b1d66] CodecZlib v0.7.0
[3da002f7] ColorTypes v0.11.0
[5ae59095] Colors v0.12.8
[bbf7d656] CommonSubexpressions v0.3.0
[34da2185] Compat v3.41.0
[9a962f9c] DataAPI v1.9.0
[864edb3b] DataStructures v0.18.11
[163ba53b] DiffResults v1.0.3
[b552c78f] DiffRules v1.9.0
[ffbed154] DocStringExtensions v0.8.6
[e2ba6199] ExprTools v0.1.8
[1a297f60] FillArrays v0.12.7
[53c48c17] FixedPointNumbers v0.8.4
[587475ba] Flux v0.12.8
[f6369f11] ForwardDiff v0.10.25
[d9f16b24] Functors v0.2.7
[0c68f7d7] GPUArrays v8.1.3
[61eb1bfa] GPUCompiler v0.13.11
[7869d1d1] IRTools v0.4.4
[615f187c] IfElse v0.1.1
[3587e190] InverseFunctions v0.1.2
[92d709cd] IrrationalConstants v0.1.1
[692b3bcd] JLLWrappers v1.4.0
[e5e0dc1b] Juno v0.8.4
[929cbde3] LLVM v4.7.1
[2ab3a3ac] LogExpFunctions v0.3.6
[1914dd2f] MacroTools v0.5.9
[e89f7d12] Media v0.5.0
[e1d29d7a] Missings v1.0.2
[872c559c] NNlib v0.7.34
[a00861dc] NNlibCUDA v0.1.11
[77ba4419] NaNMath v0.3.6
[bac558e1] OrderedCollections v1.4.1
[21216c6a] Preferences v1.2.3
[74087812] Random123 v1.4.2
[e6cf234a] RandomNumbers v1.5.3
[c1ae055f] RealDot v0.1.0
[189a3867] Reexport v1.2.2
[ae029012] Requires v1.3.0
[a2af1166] SortingAlgorithms v1.0.1
[276daf66] SpecialFunctions v2.0.0
[aedffcd0] Static v0.4.1
[90137ffa] StaticArrays v1.3.3
[82ae8749] StatsAPI v1.2.0
[2913bbd2] StatsBase v0.33.14
[a759f4b9] TimerOutputs v0.5.15
[3bb67fe8] TranscodingStreams v0.9.6
[a5390f91] ZipFile v0.9.4
[e88e6eb3] Zygote v0.6.34
[700de1a5] ZygoteRules v0.2.2
[dad2f222] LLVMExtra_jll v0.0.13+1
[efe28fd5] OpenSpecFun_jll v0.5.5+0
[0dad84c5] ArgTools
[56f22d72] Artifacts
[2a0f44e3] Base64
[ade2ca70] Dates
[8bb1440f] DelimitedFiles
[8ba89e20] Distributed
[f43a241f] Downloads
[b77e0a4c] InteractiveUtils
[4af54fe1] LazyArtifacts
[b27032c2] LibCURL
[76f85450] LibGit2
[8f399da3] Libdl
[37e2e46d] LinearAlgebra
[56ddb016] Logging
[d6f4376e] Markdown
[a63ad114] Mmap
[ca575930] NetworkOptions
[44cfe95a] Pkg
[de0858da] Printf
[9abbd945] Profile
[3fa0cd96] REPL
[9a3f8284] Random
[ea8e919c] SHA
[9e88b42a] Serialization
[1a1011a3] SharedArrays
[6462fe0b] Sockets
[2f01184e] SparseArrays
[10745b16] Statistics
[fa267f1f] TOML
[a4e569a6] Tar
[8dfed614] Test
[cf7118a7] UUIDs
[4ec0a83e] Unicode
[e66e0078] CompilerSupportLibraries_jll
[deac9b47] LibCURL_jll
[29816b5a] LibSSH2_jll
[c8ffd9c3] MbedTLS_jll
[14a3606d] MozillaCACerts_jll
[05823500] OpenLibm_jll
[83775a58] Zlib_jll
[8e850ede] nghttp2_jll
[3f19e933] p7zip_jll
Expected behavior
CUDA to silently act as if there's no CUDA-compliant GPU available.
Version info
Details on Julia:
julia> versioninfo()
Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: AMD Ryzen 9 5950X 16-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, generic)
Details on CUDA:
julia> Flux.CUDA.versioninfo()
CUDA toolkit 188160.51, artifact installation
NVIDIA driver 460.91.3, for CUDA 11.2
CUDA driver 11.2
Libraries:
- CUBLAS: 11.8.1
- CURAND: 10.2.9
- CUFFT: 10.7.0
- CUSOLVER: 11.3.2
- CUSPARSE: 11.7.1
- CUPTI: 16.0.0
- NVML: 11.0.0+460.91.3
- CUDNN: 8.30.2 (for CUDA 11.5.0)
Downloaded artifact: CUTENSOR
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)
Toolchain:
- Julia: 1.6.5
- LLVM: 11.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80
ERROR: CUDA error: initialization error (code 3, ERROR_NOT_INITIALIZED)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA ~/.julia/packages/CUDA/nYggH/lib/cudadrv/error.jl:91
[2] macro expansion
@ ~/.julia/packages/CUDA/nYggH/lib/cudadrv/error.jl:101 [inlined]
[3] cuDeviceGetCount
@ ~/.julia/packages/CUDA/nYggH/lib/utils/call.jl:26 [inlined]
[4] ndevices
@ ~/.julia/packages/CUDA/nYggH/lib/cudadrv/devices.jl:166 [inlined]
[5] length
@ ~/.julia/packages/CUDA/nYggH/lib/cudadrv/devices.jl:150 [inlined]
[6] iterate (repeats 2 times)
@ ~/.julia/packages/CUDA/nYggH/lib/cudadrv/devices.jl:147 [inlined]
[7] isempty
@ ./essentials.jl:767 [inlined]
[8] versioninfo(io::Base.TTY)
@ CUDA ~/.julia/packages/CUDA/nYggH/src/utilities.jl:70
[9] versioninfo()
@ CUDA ~/.julia/packages/CUDA/nYggH/src/utilities.jl:32
[10] top-level scope
@ REPL[12]:1
[11] top-level scope
@ ~/.julia/packages/CUDA/nYggH/src/initialization.jl:52
Additional context
Add any other context about the problem here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Metadata
Assignees
Labels
No labels