Skip to content

More sporadic 1.11 hangs #412

@christiangnrd

Description

@christiangnrd

Now that 1.11 compilations are fixed, we're back to sporadic hanging of CI.

See example 1, example 2, example 3.

Bonus local output
(Metal) pkg> test
     Testing Metal
      Status `/private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_aiC8B7/Project.toml`
  [79e6a3ab] Adapt v4.0.4
  [ab4f0b2a] BFloat16s v0.5.0
  [6e4b80f9] BenchmarkTools v1.5.0
  [0c68f7d7] GPUArrays v10.3.1
  [63c18a36] KernelAbstractions v0.9.25
  [929cbde3] LLVM v9.0.0
  [dde4c033] Metal v1.3.0 `~/.julia/dev/Metal`
  [e86c9b32] ObjectiveC v3.0.0
  [276daf66] SpecialFunctions v2.4.0
  [90137ffa] StaticArrays v1.9.7
  [10745b16] Statistics v1.11.1
  [ade2ca70] Dates v1.11.0
  [8ba89e20] Distributed v1.11.0
  [b77e0a4c] InteractiveUtils v1.11.0
  [37e2e46d] LinearAlgebra v1.11.0
  [de0858da] Printf v1.11.0
  [3fa0cd96] REPL v1.11.0
  [9a3f8284] Random v1.11.0
  [2f01184e] SparseArrays v1.11.0
  [8dfed614] Test v1.11.0
      Status `/private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_aiC8B7/Manifest.toml`
  [79e6a3ab] Adapt v4.0.4
  [a9b6321e] Atomix v0.1.0
  [ab4f0b2a] BFloat16s v0.5.0
  [6e4b80f9] BenchmarkTools v1.5.0
  [fa961155] CEnum v0.5.0
  [523fee87] CodecBzip2 v0.8.4
  [ffbed154] DocStringExtensions v0.9.3
  [e2ba6199] ExprTools v0.1.10
  [0c68f7d7] GPUArrays v10.3.1
  [46192b85] GPUArraysCore v0.1.6
  [61eb1bfa] GPUCompiler v0.27.4
  [92d709cd] IrrationalConstants v0.2.2
  [692b3bcd] JLLWrappers v1.5.0
  [682c06a0] JSON v0.21.4
  [63c18a36] KernelAbstractions v0.9.25
  [929cbde3] LLVM v9.0.0
  [2ab3a3ac] LogExpFunctions v0.3.28
  [1914dd2f] MacroTools v0.5.13
  [dde4c033] Metal v1.3.0 `~/.julia/dev/Metal`
  [d8793406] ObjectFile v0.4.2
  [e86c9b32] ObjectiveC v3.0.0
  [69de0a69] Parsers v2.8.1
  [aea7be01] PrecompileTools v1.2.1
  [21216c6a] Preferences v1.4.3
  [189a3867] Reexport v1.2.2
  [ae029012] Requires v1.3.0
  [6c6a2e73] Scratch v1.2.1
  [276daf66] SpecialFunctions v2.4.0
  [90137ffa] StaticArrays v1.9.7
  [1e83bf80] StaticArraysCore v1.4.3
  [10745b16] Statistics v1.11.1
  [53d494c1] StructIO v0.3.1
  [a759f4b9] TimerOutputs v0.5.24
  [3bb67fe8] TranscodingStreams v0.11.2
  [013be700] UnsafeAtomics v0.2.1
  [d80eeb9a] UnsafeAtomicsLLVM v0.2.1
  [6e34b625] Bzip2_jll v1.0.8+1
  [2e619515] Expat_jll v2.6.2+0
  [f52de702] LLVMDowngrader_jll v0.3.0+1
  [dad2f222] LLVMExtra_jll v0.0.33+0
  [7106de7a] LibMPDec_jll v2.5.1+0
⌅ [e9f186c6] Libffi_jll v3.2.2+1
  [458c3c95] OpenSSL_jll v3.0.14+0
  [efe28fd5] OpenSpecFun_jll v0.5.5+0
  [93d3a430] Python_jll v3.10.14+0
  [76ed43ae] SQLite_jll v3.45.3+0
  [ffd25f8a] XZ_jll v5.4.6+0
  [0dad84c5] ArgTools v1.1.2
  [56f22d72] Artifacts v1.11.0
  [2a0f44e3] Base64 v1.11.0
  [ade2ca70] Dates v1.11.0
  [8ba89e20] Distributed v1.11.0
  [f43a241f] Downloads v1.6.0
  [7b1f6079] FileWatching v1.11.0
  [b77e0a4c] InteractiveUtils v1.11.0
  [4af54fe1] LazyArtifacts v1.11.0
  [b27032c2] LibCURL v0.6.4
  [76f85450] LibGit2 v1.11.0
  [8f399da3] Libdl v1.11.0
  [37e2e46d] LinearAlgebra v1.11.0
  [56ddb016] Logging v1.11.0
  [d6f4376e] Markdown v1.11.0
  [a63ad114] Mmap v1.11.0
  [ca575930] NetworkOptions v1.2.0
  [44cfe95a] Pkg v1.11.0
  [de0858da] Printf v1.11.0
  [9abbd945] Profile v1.11.0
  [3fa0cd96] REPL v1.11.0
  [9a3f8284] Random v1.11.0
  [ea8e919c] SHA v0.7.0
  [9e88b42a] Serialization v1.11.0
  [6462fe0b] Sockets v1.11.0
  [2f01184e] SparseArrays v1.11.0
  [f489334b] StyledStrings v1.11.0
  [fa267f1f] TOML v1.0.3
  [a4e569a6] Tar v1.10.0
  [8dfed614] Test v1.11.0
  [cf7118a7] UUIDs v1.11.0
  [4ec0a83e] Unicode v1.11.0
  [e66e0078] CompilerSupportLibraries_jll v1.1.1+0
  [deac9b47] LibCURL_jll v8.6.0+0
  [e37daf67] LibGit2_jll v1.7.2+0
  [29816b5a] LibSSH2_jll v1.11.0+1
  [c8ffd9c3] MbedTLS_jll v2.28.6+0
  [14a3606d] MozillaCACerts_jll v2023.12.12
  [4536629a] OpenBLAS_jll v0.3.27+1
  [05823500] OpenLibm_jll v0.8.1+2
  [bea87d4a] SuiteSparse_jll v7.7.0+0
  [83775a58] Zlib_jll v1.2.13+1
  [8e850b90] libblastrampoline_jll v5.11.0+0
  [8e850ede] nghttp2_jll v1.59.0+0
  [3f19e933] p7zip_jll v17.4.0+2
        Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading.
     Testing Running tests...
┌ Info: System information:
│ macOS 14.6.1, Darwin 23.6.0
│ 
│ Toolchain:
│ - Julia: 1.11.0-rc3
│ - LLVM: 16.0.6
│ 
│ Julia packages: 
│ - Metal.jl: 1.3.0
│ - LLVMDowngrader_jll: 0.3.0+1
│ 
│ 1 device:
└ - Apple M2 Max (64.000 KiB allocated)
[ Info: Running 8 tests in parallel. If this is too many, specify the `--jobs` argument to the tests, or set the JULIA_CPU_THREADS environment variable.
                                                  |          | ---------------- CPU ---------------- |
Test                                     (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
metallib                                      (8) |     0.62 |   0.01 |  2.0 |     214.88 |   547.77 |
pool                                          (9) |     1.14 |   0.03 |  2.5 |     325.48 |   595.09 |
      From worker 8:	Starting recording with the Blank template and GPU, Time Profiler, Metal Application, Metal GPU Counters, Metal Resource Events, os_signpost Instruments. Attaching to: julia (23790). 
      From worker 8:	Ctrl-C to stop the recording
      From worker 8:	Stopping recording...
metal                                         (7) |     4.59 |   0.17 |  3.8 |     930.59 |   813.38 |
      From worker 7:	┌ Warning: Skipping script tests
      From worker 7:	└ @ Main ~/.julia/dev/Metal/test/scripts.jl:9
scripts                                       (7) |     0.55 |   0.00 |  0.0 |      78.42 |   822.00 |
      From worker 10:	2024-08-29 10:43:58.916 julia[23792:284856] Metal GPU Frame Capture Enabled
      From worker 8:	Recording completed. Saving output file...
      From worker 8:	Output file saved as: julia_1.trace
      From worker 8:	[ Info: System trace saved to /private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_pNVkk4/julia_1.trace; open the resulting trace in Instruments
profiling                                     (8) |     9.36 |   0.01 |  0.1 |     114.46 |   570.70 |
      From worker 10:	[ Info: GPU frame capture saved to /private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_UfApxE/julia_1.gputrace; open the resulting trace in Xcode
      From worker 10:	[ Info: GPU frame capture saved to /private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_UfApxE/julia_2.gputrace; open the resulting trace in Xcode
capturing                                    (10) |     9.87 |   0.18 |  1.8 |    1080.20 |   719.06 |
execution                                     (5) |    18.02 |   0.28 |  1.6 |    2198.88 |   766.20 |
mps/matrix                                   (11) |     0.58 |   0.00 |  0.0 |      82.76 |   556.66 |
mps/size                                     (11) |     0.04 |   0.00 |  0.0 |       1.41 |   564.39 |
examples                                      (4) |    23.70 |   0.57 |  2.4 |    2662.96 |  2038.67 |
mps/vector                                   (11) |     0.13 |   0.00 |  0.0 |      12.66 |   565.38 |
random                                        (9) |    24.50 |   0.40 |  1.6 |    3500.97 |   978.94 |
gpuarrays/indexing scalar                     (4) |     6.65 |   0.08 |  1.2 |    1091.20 |  2089.61 |
kernelabstractions                            (6) |    31.11 |   0.65 |  2.1 |    4947.82 |  1138.53 |
mps/linalg                                    (5) |    16.84 |   0.35 |  2.1 |    3096.05 |   976.09 |
mps/copy                                      (8) |    33.03 |   0.75 |  2.3 |    6544.86 |   989.70 |
array                                         (2) |    45.54 |   0.71 |  1.6 |    6176.39 |  1094.27 |
gpuarrays/interface                           (2) |     0.94 |   0.01 |  1.4 |     236.47 |  1142.72 |
gpuarrays/indexing find                       (5) |    15.72 |   0.42 |  2.7 |    5267.93 |  1181.33 |
gpuarrays/math/power                          (4) |    20.75 |   0.48 |  2.3 |    4750.50 |  2305.39 |
gpuarrays/reductions/any all count            (2) |     8.32 |   0.12 |  1.5 |    1691.97 |  1204.00 |
gpuarrays/uniformscaling                      (4) |     4.67 |   0.02 |  0.5 |     502.92 |  2437.33 |
gpuarrays/linalg/mul!/vector-matrix           (6) |    28.20 |   0.41 |  1.5 |    5208.05 |  1341.78 |
gpuarrays/math/intrinsics                     (6) |     0.91 |   0.00 |  0.0 |     107.54 |  1347.39 |
gpuarrays/indexing multidimensional           (8) |    25.80 |   0.45 |  1.7 |    5203.15 |  1279.70 |
gpuarrays/reductions/mapreducedim!_large      (2) |    27.58 |   0.79 |  2.9 |    6800.30 |  1778.45 |
gpuarrays/linalg/norm                         (6) |    28.24 |   0.54 |  1.9 |    6401.18 |  1584.08 |
gpuarrays/reductions/reducedim!              (11) |    67.84 |   1.30 |  1.9 |   12473.56 |  1249.77 |
gpuarrays/vectors                            (11) |     0.18 |   0.00 |  0.0 |      27.23 |  1252.45 |
gpuarrays/linalg/mul!/matrix-matrix           (4) |    38.05 |   0.45 |  1.2 |    5872.34 |  2530.66 |
gpuarrays/statistics                          (8) |    26.77 |   0.40 |  1.5 |    4574.18 |  1491.39 |
gpuarrays/linalg                              (9) |    73.68 |   1.55 |  2.1 |   14299.43 |  1578.47 |
gpuarrays/random                             (11) |     8.23 |   0.08 |  1.0 |    1146.98 |  1366.86 |
gpuarrays/constructors                        (6) |    16.62 |   0.15 |  0.9 |    1804.54 |  1701.91 |
gpuarrays/base                                (4) |    17.28 |   0.44 |  2.6 |    3418.14 |  2573.62 |
gpuarrays/reductions/== isequal               (8) |    25.87 |   0.53 |  2.1 |    5815.18 |  1610.95 |
gpuarrays/reductions/mapreducedim!           (11) |    26.83 |   0.43 |  1.6 |    5249.56 |  1554.47 |
gpuarrays/reductions/minimum maximum extrema  (5) |   105.34 |   2.37 |  2.2 |   22907.86 |  1966.19 |
gpuarrays/reductions/reduce                   (6) |    56.45 |   1.32 |  2.3 |   13019.02 |  2042.95 |
gpuarrays/reductions/mapreduce                (2) |    81.31 |   1.81 |  2.2 |   17933.78 |  2046.83 |
gpuarrays/reductions/sum prod                 (4) |    77.72 |   1.54 |  2.0 |   16463.69 |  3235.77 |
gpuarrays/broadcasting                        (9) |   108.09 |   1.93 |  1.8 |   19730.51 |  2196.94 |
┌ Warning: Test timed out: device/intrinsics
└ @ Main ~/.julia/dev/Metal/test/runtests.jl:247
      From worker 7:	
      From worker 7:	======================================================================================
      From worker 7:	Information request received. A stacktrace will print followed by a 1.0 second profile
      From worker 7:	======================================================================================
      From worker 7:	
      From worker 7:	signal (29): Information request: 29
      From worker 7:	__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
      From worker 7:	unknown function (ip: 0x0)
      From worker 7:	__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
      From worker 7:	unknown function (ip: 0x0)
      From worker 7:	
      From worker 7:	==============================================================
      From worker 7:	Profile collected. A report will print at the next yield point
      From worker 7:	==============================================================
      From worker 7:	
┌ Warning: Couldn't kill worker 7, killing process 23789 forcefully
└ @ Main ~/.julia/dev/Metal/test/runtests.jl:256
      From worker 7:	
      From worker 7:	[23789] signal 15: Terminated: 15
      From worker 7:	in expression starting at /Users/christian/.julia/dev/Metal/test/device/intrinsics.jl:4
      From worker 7:	__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
      From worker 7:	unknown function (ip: 0x0)
      From worker 7:	__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
      From worker 7:	unknown function (ip: 0x0)
      From worker 7:	Allocations: 35316542 (Pool: 35315514; Big: 1028); GC: 24
device/intrinsics                             (7) |         failed at 2024-08-29T10:52:12.607
Testing finished in 8 minutes, 23 seconds, 33 milliseconds
device/intrinsics: Error During Test at none:1
  Got exception outside of a @test
  ProcessExitedException(7)

Test Summary:                                  | Pass  Error  Broken  Total  Time
  Overall                                      | 9617      1       9   9627      
    metallib                                   |   25                    25      
    pool                                       |    5                     5      
    metal                                      |  131                   131      
    scripts                                    |                          0      
    profiling                                  |    1                     1      
    capturing                                  |   24                    24      
    execution                                  |   35                    35      
    mps/matrix                                 |   75                    75      
    mps/size                                   |    9                     9      
    examples                                   |    4                     4      
    mps/vector                                 |   34                    34      
    random                                     |  817              1    818      
    gpuarrays/indexing scalar                  |  399                   399      
    kernelabstractions                         | 2179              8   2187      
    mps/linalg                                 |   45                    45      
    mps/copy                                   |  144                   144      
    array                                      |  374                   374      
    gpuarrays/interface                        |    7                     7      
    gpuarrays/indexing find                    |   45                    45      
    gpuarrays/math/power                       |   60                    60      
    gpuarrays/reductions/any all count         |  101                   101      
    gpuarrays/uniformscaling                   |   56                    56      
    gpuarrays/linalg/mul!/vector-matrix        |  140                   140      
    gpuarrays/math/intrinsics                  |   10                    10      
    gpuarrays/indexing multidimensional        |   89                    89      
    gpuarrays/reductions/mapreducedim!_large   |   40                    40      
    gpuarrays/linalg/norm                      |  264                   264      
    gpuarrays/reductions/reducedim!            |  160                   160      
    gpuarrays/vectors                          |   10                    10      
    gpuarrays/linalg/mul!/matrix-matrix        |  360                   360      
    gpuarrays/statistics                       |   52                    52      
    gpuarrays/linalg                           |  397                   397      
    gpuarrays/random                           |   52                    52      
    gpuarrays/constructors                     |  832                   832      
    gpuarrays/base                             |   93                    93      
    gpuarrays/reductions/== isequal            |  248                   248      
    gpuarrays/reductions/mapreducedim!         |  260                   260      
    gpuarrays/reductions/minimum maximum extrema |  555                   555      
    gpuarrays/reductions/reduce                |  220                   220      
    gpuarrays/reductions/mapreduce             |  330                   330      
    gpuarrays/reductions/sum prod              |  636                   636      
    gpuarrays/broadcasting                     |  299                   299      
    device/intrinsics                          |           1              1      
    FAILURE

Error in testset device/intrinsics:
Error During Test at none:1
  Got exception outside of a @test
  ProcessExitedException(7)
ERROR: LoadError: Test run finished with errors
in expression starting at /Users/christian/.julia/dev/Metal/test/runtests.jl:410
ERROR: Package Metal errored during testing

Maybe related to #329?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions