Skip to content

Error when env var CUDA_VISIBLE_DEVICES is set but empty #1336

Closed

Description

Describe the bug

I believe this is a CUDA.jl bug rather than Flux?

The env var CUDA_VISIBLE_DEVICES is the canonical way to control which devices CUDA can see.

However, setting it to empty (i.e. disabling all devices) causes CUDA.jl to error when used via Flux.

julia> using Flux

julia> ENV["CUDA_VISIBLE_DEVICES"]=""
""

julia> x = gpu(rand(Float32,2,2))
ERROR: CUDA error: initialization error (code 3, ERROR_NOT_INITIALIZED)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA ~/.julia/packages/CUDA/nYggH/lib/cudadrv/error.jl:91
  [2] macro expansion
    @ ~/.julia/packages/CUDA/nYggH/lib/cudadrv/error.jl:101 [inlined]
  [3] cuDeviceGet
    @ ~/.julia/packages/CUDA/nYggH/lib/utils/call.jl:26 [inlined]
  [4] CuDevice
    @ ~/.julia/packages/CUDA/nYggH/lib/cudadrv/devices.jl:16 [inlined]
  [5] TaskLocalState
    @ ~/.julia/packages/CUDA/nYggH/src/state.jl:50 [inlined]
  [6] task_local_state!()
    @ CUDA ~/.julia/packages/CUDA/nYggH/src/state.jl:73
  [7] active_state
    @ ~/.julia/packages/CUDA/nYggH/src/state.jl:106 [inlined]
  [8] #_alloc#178
    @ ~/.julia/packages/CUDA/nYggH/src/pool.jl:183 [inlined]
  [9] #alloc#177
    @ ~/.julia/packages/CUDA/nYggH/src/pool.jl:173 [inlined]
 [10] alloc
    @ ~/.julia/packages/CUDA/nYggH/src/pool.jl:169 [inlined]
 [11] CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}(#unused#::UndefInitializer, dims::Tuple{Int64, Int64})
    @ CUDA ~/.julia/packages/CUDA/nYggH/src/array.jl:44
 [12] CuArray
    @ ~/.julia/packages/CUDA/nYggH/src/array.jl:287 [inlined]
 [13] adapt_storage
    @ ~/.julia/packages/CUDA/nYggH/src/array.jl:536 [inlined]
 [14] adapt_structure
    @ ~/.julia/packages/Adapt/wASZA/src/Adapt.jl:42 [inlined]
 [15] adapt
    @ ~/.julia/packages/Adapt/wASZA/src/Adapt.jl:40 [inlined]
 [16] #cu#189
    @ ~/.julia/packages/CUDA/nYggH/src/array.jl:546 [inlined]
 [17] cu
    @ ~/.julia/packages/CUDA/nYggH/src/array.jl:546 [inlined]
 [18] adapt_storage
    @ ~/.julia/packages/Flux/BPPNj/src/functor.jl:66 [inlined]
 [19] adapt_structure
    @ ~/.julia/packages/Adapt/wASZA/src/Adapt.jl:42 [inlined]
 [20] adapt
    @ ~/.julia/packages/Adapt/wASZA/src/Adapt.jl:40 [inlined]
 [21] #132
    @ ~/.julia/packages/Flux/BPPNj/src/functor.jl:146 [inlined]
 [22] fmap(f::Flux.var"#132#133", x::Matrix{Float32}; exclude::typeof(Flux._isbitsarray), walk::typeof(Functors._default_walk), cache::IdDict{Any, Any})
    @ Functors ~/.julia/packages/Functors/hIysk/src/functor.jl:121
 [23] gpu(x::Matrix{Float32})
    @ Flux ~/.julia/packages/Flux/BPPNj/src/functor.jl:146
 [24] top-level scope
    @ REPL[7]:1
 [25] top-level scope
    @ ~/.julia/packages/CUDA/nYggH/src/initialization.jl:52


Manifest.toml

(jl_QuH7kn) pkg> st --manifest
      Status `/tmp/jl_QuH7kn/Manifest.toml`
  [621f4979] AbstractFFTs v1.1.0
  [1520ce14] AbstractTrees v0.3.4
  [79e6a3ab] Adapt v3.3.3
  [4fba245c] ArrayInterface v3.2.2
  [ab4f0b2a] BFloat16s v0.2.0
  [fa961155] CEnum v0.4.1
  [052768ef] CUDA v3.7.0
  [082447d4] ChainRules v1.23.0
  [d360d2e6] ChainRulesCore v1.11.6
  [9e997f8a] ChangesOfVariables v0.1.2
  [944b1d66] CodecZlib v0.7.0
  [3da002f7] ColorTypes v0.11.0
  [5ae59095] Colors v0.12.8
  [bbf7d656] CommonSubexpressions v0.3.0
  [34da2185] Compat v3.41.0
  [9a962f9c] DataAPI v1.9.0
  [864edb3b] DataStructures v0.18.11
  [163ba53b] DiffResults v1.0.3
  [b552c78f] DiffRules v1.9.0
  [ffbed154] DocStringExtensions v0.8.6
  [e2ba6199] ExprTools v0.1.8
  [1a297f60] FillArrays v0.12.7
  [53c48c17] FixedPointNumbers v0.8.4
  [587475ba] Flux v0.12.8
  [f6369f11] ForwardDiff v0.10.25
  [d9f16b24] Functors v0.2.7
  [0c68f7d7] GPUArrays v8.1.3
  [61eb1bfa] GPUCompiler v0.13.11
  [7869d1d1] IRTools v0.4.4
  [615f187c] IfElse v0.1.1
  [3587e190] InverseFunctions v0.1.2
  [92d709cd] IrrationalConstants v0.1.1
  [692b3bcd] JLLWrappers v1.4.0
  [e5e0dc1b] Juno v0.8.4
  [929cbde3] LLVM v4.7.1
  [2ab3a3ac] LogExpFunctions v0.3.6
  [1914dd2f] MacroTools v0.5.9
  [e89f7d12] Media v0.5.0
  [e1d29d7a] Missings v1.0.2
  [872c559c] NNlib v0.7.34
  [a00861dc] NNlibCUDA v0.1.11
  [77ba4419] NaNMath v0.3.6
  [bac558e1] OrderedCollections v1.4.1
  [21216c6a] Preferences v1.2.3
  [74087812] Random123 v1.4.2
  [e6cf234a] RandomNumbers v1.5.3
  [c1ae055f] RealDot v0.1.0
  [189a3867] Reexport v1.2.2
  [ae029012] Requires v1.3.0
  [a2af1166] SortingAlgorithms v1.0.1
  [276daf66] SpecialFunctions v2.0.0
  [aedffcd0] Static v0.4.1
  [90137ffa] StaticArrays v1.3.3
  [82ae8749] StatsAPI v1.2.0
  [2913bbd2] StatsBase v0.33.14
  [a759f4b9] TimerOutputs v0.5.15
  [3bb67fe8] TranscodingStreams v0.9.6
  [a5390f91] ZipFile v0.9.4
  [e88e6eb3] Zygote v0.6.34
  [700de1a5] ZygoteRules v0.2.2
  [dad2f222] LLVMExtra_jll v0.0.13+1
  [efe28fd5] OpenSpecFun_jll v0.5.5+0
  [0dad84c5] ArgTools
  [56f22d72] Artifacts
  [2a0f44e3] Base64
  [ade2ca70] Dates
  [8bb1440f] DelimitedFiles
  [8ba89e20] Distributed
  [f43a241f] Downloads
  [b77e0a4c] InteractiveUtils
  [4af54fe1] LazyArtifacts
  [b27032c2] LibCURL
  [76f85450] LibGit2
  [8f399da3] Libdl
  [37e2e46d] LinearAlgebra
  [56ddb016] Logging
  [d6f4376e] Markdown
  [a63ad114] Mmap
  [ca575930] NetworkOptions
  [44cfe95a] Pkg
  [de0858da] Printf
  [9abbd945] Profile
  [3fa0cd96] REPL
  [9a3f8284] Random
  [ea8e919c] SHA
  [9e88b42a] Serialization
  [1a1011a3] SharedArrays
  [6462fe0b] Sockets
  [2f01184e] SparseArrays
  [10745b16] Statistics
  [fa267f1f] TOML
  [a4e569a6] Tar
  [8dfed614] Test
  [cf7118a7] UUIDs
  [4ec0a83e] Unicode
  [e66e0078] CompilerSupportLibraries_jll
  [deac9b47] LibCURL_jll
  [29816b5a] LibSSH2_jll
  [c8ffd9c3] MbedTLS_jll
  [14a3606d] MozillaCACerts_jll
  [05823500] OpenLibm_jll
  [83775a58] Zlib_jll
  [8e850ede] nghttp2_jll
  [3f19e933] p7zip_jll

Expected behavior

CUDA to silently act as if there's no CUDA-compliant GPU available.

Version info

Details on Julia:

julia> versioninfo()
Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Ryzen 9 5950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, generic)

Details on CUDA:

julia> Flux.CUDA.versioninfo()
CUDA toolkit 188160.51, artifact installation
NVIDIA driver 460.91.3, for CUDA 11.2
CUDA driver 11.2

Libraries: 
- CUBLAS: 11.8.1
- CURAND: 10.2.9
- CUFFT: 10.7.0
- CUSOLVER: 11.3.2
- CUSPARSE: 11.7.1
- CUPTI: 16.0.0
- NVML: 11.0.0+460.91.3
- CUDNN: 8.30.2 (for CUDA 11.5.0)
  Downloaded artifact: CUTENSOR
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:
- Julia: 1.6.5
- LLVM: 11.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

ERROR: CUDA error: initialization error (code 3, ERROR_NOT_INITIALIZED)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA ~/.julia/packages/CUDA/nYggH/lib/cudadrv/error.jl:91
  [2] macro expansion
    @ ~/.julia/packages/CUDA/nYggH/lib/cudadrv/error.jl:101 [inlined]
  [3] cuDeviceGetCount
    @ ~/.julia/packages/CUDA/nYggH/lib/utils/call.jl:26 [inlined]
  [4] ndevices
    @ ~/.julia/packages/CUDA/nYggH/lib/cudadrv/devices.jl:166 [inlined]
  [5] length
    @ ~/.julia/packages/CUDA/nYggH/lib/cudadrv/devices.jl:150 [inlined]
  [6] iterate (repeats 2 times)
    @ ~/.julia/packages/CUDA/nYggH/lib/cudadrv/devices.jl:147 [inlined]
  [7] isempty
    @ ./essentials.jl:767 [inlined]
  [8] versioninfo(io::Base.TTY)
    @ CUDA ~/.julia/packages/CUDA/nYggH/src/utilities.jl:70
  [9] versioninfo()
    @ CUDA ~/.julia/packages/CUDA/nYggH/src/utilities.jl:32
 [10] top-level scope
    @ REPL[12]:1
 [11] top-level scope
    @ ~/.julia/packages/CUDA/nYggH/src/initialization.jl:52

Additional context

Add any other context about the problem here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions