Open
Description
I think we should run all API Validation tests and allow them to fail instead of only running a subset so that we are aware of the failures. Once we've dealt with them we can make it mandatory.
Currently failing tests:
gpuarrays/linalg/mul!/vector-matrix
mps/linalg
(heisenbug, see local example)mps/copy
Local example:
(Metal) pkg> test
Testing Metal
...
Testing Running tests...
2024-10-18 16:29:58.279 julia[36961:444153] Metal API Validation Enabled
2024-10-18 16:29:58.279 julia[36961:444153] Metal GPU Validation Enabled
┌ Info: System information:
│ macOS 15.0.1, Darwin 24.0.0
│
│ Toolchain:
│ - Julia: 1.11.1
│ - LLVM: 16.0.6
│
│ Julia packages:
│ - Metal.jl: 1.4.0
│ - GPUArrays: 11.0.0
│ - GPUCompiler: 1.0.0
│ - KernelAbstractions: 0.9.28
│ - ObjectiveC: 3.1.0
│ - LLVM: 9.1.2
│ - LLVMDowngrader_jll: 0.3.0+1
│
│ Environment:
│ - MTL_SHADER_VALIDATION: 1
│ - MTL_DEBUG_LAYER: 1
│
│ 1 device:
└ - Apple M2 Max (192.000 KiB allocated)
[ Info: Running 8 tests in parallel. If this is too many, specify the `--jobs` argument to the tests, or set the JULIA_CPU_THREADS environment variable.
From worker 7: 2024-10-18 16:30:06.803 julia[36969:444278] Metal API Validation Enabled
From worker 7: 2024-10-18 16:30:06.803 julia[36969:444278] Metal GPU Validation Enabled
From worker 4: 2024-10-18 16:30:06.814 julia[36966:444275] Metal API Validation Enabled
From worker 4: 2024-10-18 16:30:06.814 julia[36966:444275] Metal GPU Validation Enabled
From worker 9: 2024-10-18 16:30:06.817 julia[36971:444280] Metal API Validation Enabled
From worker 9: 2024-10-18 16:30:06.817 julia[36971:444280] Metal GPU Validation Enabled
From worker 3: 2024-10-18 16:30:06.828 julia[36965:444274] Metal API Validation Enabled
From worker 3: 2024-10-18 16:30:06.829 julia[36965:444274] Metal GPU Validation Enabled
From worker 8: 2024-10-18 16:30:06.831 julia[36970:444279] Metal API Validation Enabled
From worker 8: 2024-10-18 16:30:06.831 julia[36970:444279] Metal GPU Validation Enabled
From worker 6: 2024-10-18 16:30:06.842 julia[36968:444277] Metal API Validation Enabled
From worker 6: 2024-10-18 16:30:06.843 julia[36968:444277] Metal GPU Validation Enabled
From worker 2: 2024-10-18 16:30:06.843 julia[36964:444270] Metal API Validation Enabled
From worker 2: 2024-10-18 16:30:06.843 julia[36964:444270] Metal GPU Validation Enabled
From worker 5: 2024-10-18 16:30:06.863 julia[36967:444276] Metal API Validation Enabled
From worker 5: 2024-10-18 16:30:06.864 julia[36967:444276] Metal GPU Validation Enabled
| | ---------------- CPU ---------------- |
Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
metallib (8) | 0.68 | 0.01 | 1.9 | 209.75 | 573.78 |
pool (9) | 1.12 | 0.03 | 2.5 | 320.76 | 596.08 |
From worker 10: 2024-10-18 16:30:14.310 julia[36977:444462] Metal API Validation Enabled
From worker 10: 2024-10-18 16:30:14.311 julia[36977:444462] Metal GPU Validation Enabled
From worker 8: Starting recording with the Blank template and GPU, Time Profiler, Metal Application, Metal GPU Counters, Metal Resource Events, os_signpost Instruments. Attaching to: julia (36970).
From worker 8: Ctrl-C to stop the recording
From worker 8: Stopping recording...
metal (7) | 4.18 | 0.12 | 2.8 | 528.20 | 712.36 |
From worker 7: ┌ Warning: Skipping script tests
From worker 7: └ @ Main ~/.julia/dev/Metal/test/scripts.jl:9
scripts (7) | 0.86 | 0.00 | 0.0 | 76.59 | 716.12 |
From worker 8: Recording completed. Saving output file...
From worker 8: Output file saved as: julia_1.trace
From worker 8: [ Info: System trace saved to /private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_6ZIMtu/julia_1.trace; open the resulting trace in Instruments
profiling (8) | 6.73 | 0.00 | 0.0 | 99.23 | 593.14 |
From worker 10: ┌ Warning: Skipping capturing tests; capturing is not supported with Metal Shader Validation enabled
From worker 10: └ @ Main ~/.julia/dev/Metal/test/capturing.jl:4
capturing (10) | 0.82 | 0.00 | 0.0 | 85.42 | 560.77 |
From worker 11: 2024-10-18 16:30:24.248 julia[37027:445348] Metal API Validation Enabled
From worker 11: 2024-10-18 16:30:24.248 julia[37027:445348] Metal GPU Validation Enabled
execution (5) | 16.66 | 0.25 | 1.5 | 1773.28 | 793.55 |
mps/matrix (5) | 0.37 | 0.00 | 0.0 | 52.49 | 798.92 |
mps/size (5) | 0.04 | 0.00 | 0.0 | 1.41 | 799.62 |
mps/vector (5) | 0.14 | 0.00 | 0.0 | 19.17 | 800.42 |
examples (4) | 25.74 | 0.64 | 2.5 | 2717.08 | 2026.69 |
gpuarrays/indexing scalar (5) | 9.58 | 0.11 | 1.2 | 1401.12 | 881.42 |
kernelabstractions (6) | 30.00 | 0.56 | 1.9 | 3955.16 | 1033.52 |
random (9) | 31.51 | 0.49 | 1.5 | 3735.46 | 990.28 |
device/intrinsics (7) | 36.95 | 0.47 | 1.3 | 4235.62 | 1026.02 |
From worker 11:
From worker 11: [37027] signal 10 (1): Bus error: 10
From worker 11: in expression starting at /Users/christian/.julia/dev/Metal/test/mps/linalg.jl:3
From worker 11: objc_msgSend at /usr/lib/libobjc.A.dylib (unknown line)
From worker 11: _ZN24resolvedSharedPacketDataI23GPUDebugBadAccessPacketEC2ERKS0_15MTLFunctionTypeP24MTLGPUDebugCommandBufferP17MTLGPUDebugGPULog at /System/Library/PrivateFrameworks/MetalTools.framework/Versions/A/MetalTools (unknown line)
From worker 11: Allocations: 85721828 (Pool: 85719313; Big: 2515); GC: 47
mps/linalg (11) | failed at 2024-10-18T16:30:59.210
Worker 11 terminated.
Unhandled Task ERROR: EOFError: read end of file
Stacktrace:
[1] (::Base.var"#wait_locked#832")(s::Sockets.TCPSocket, buf::IOBuffer, nb::Int64)
@ Base ./stream.jl:970
[2] unsafe_read(s::Sockets.TCPSocket, p::Ptr{UInt8}, nb::UInt64)
@ Base ./stream.jl:978
[3] unsafe_read
@ ./io.jl:891 [inlined]
[4] unsafe_read(s::Sockets.TCPSocket, p::Base.RefValue{NTuple{4, Int64}}, n::Int64)
@ Base ./io.jl:890
[5] read!
@ ./io.jl:895 [inlined]
[6] deserialize_hdr_raw
@ ~/.julia/juliaup/julia-1.11.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Distributed/src/messages.jl:167 [inlined]
[7] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
@ Distributed ~/.julia/juliaup/julia-1.11.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Distributed/src/process_messages.jl:172
[8] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
@ Distributed ~/.julia/juliaup/julia-1.11.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Distributed/src/process_messages.jl:133
[9] (::Distributed.var"#103#104"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})()
@ Distributed ~/.julia/juliaup/julia-1.11.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Distributed/src/process_messages.jl:121
From worker 12: 2024-10-18 16:31:03.011 julia[37573:446632] Metal API Validation Enabled
From worker 12: 2024-10-18 16:31:03.011 julia[37573:446632] Metal GPU Validation Enabled
gpuarrays/math/power (6) | 26.93 | 0.53 | 2.0 | 4850.07 | 1274.00 |
array (2) | 64.57 | 1.23 | 1.9 | 8220.88 | 1769.12 |
gpuarrays/indexing find (7) | 23.17 | 0.57 | 2.4 | 5480.38 | 1208.73 |
gpuarrays/linalg/mul!/vector-matrix (9) | failed at 2024-10-18T16:31:20.021
gpuarrays/reductions/any all count (6) | 11.26 | 0.13 | 1.1 | 1729.80 | 1403.30 |
From worker 13: 2024-10-18 16:31:24.098 julia[37969:447437] Metal API Validation Enabled
From worker 13: 2024-10-18 16:31:24.098 julia[37969:447437] Metal GPU Validation Enabled
gpuarrays/uniformscaling (7) | 7.15 | 0.04 | 0.6 | 635.54 | 1348.91 |
gpuarrays/math/intrinsics (7) | 3.70 | 0.03 | 0.7 | 374.92 | 1410.30 |
mps/copy (8) | failed at 2024-10-18T16:31:34.072
From worker 14: 2024-10-18 16:31:37.928 julia[38219:447992] Metal API Validation Enabled
From worker 14: 2024-10-18 16:31:37.928 julia[38219:447992] Metal GPU Validation Enabled
gpuarrays/indexing multidimensional (12) | 52.60 | 0.74 | 1.4 | 6553.91 | 1061.39 |
gpuarrays/reductions/reducedim! (4) | 85.07 | 1.29 | 1.5 | 11756.37 | 2308.91 |
gpuarrays/linalg/norm (7) | 38.33 | 0.49 | 1.3 | 5759.17 | 1591.80 |
gpuarrays/vectors (7) | 0.17 | 0.00 | 0.0 | 22.95 | 1593.03 |
gpuarrays/linalg/mul!/matrix-matrix (6) | 56.91 | 0.43 | 0.8 | 5222.27 | 1547.66 |
gpuarrays/random (7) | 12.45 | 0.08 | 0.6 | 1200.88 | 1678.14 |
gpuarrays/linalg (5) | 104.86 | 1.66 | 1.6 | 14532.96 | 1559.11 |
gpuarrays/reductions/mapreducedim!_large (13) | 57.07 | 1.34 | 2.3 | 8654.05 | 1452.14 |
gpuarrays/constructors (4) | 22.09 | 0.19 | 0.9 | 2061.44 | 2376.53 |
gpuarrays/statistics (14) | 48.62 | 0.70 | 1.4 | 5975.28 | 955.05 |
gpuarrays/base (6) | 25.21 | 0.58 | 2.3 | 4725.99 | 1839.33 |
gpuarrays/reductions/== isequal (7) | 43.42 | 0.54 | 1.2 | 6198.30 | 2041.39 |
gpuarrays/reductions/reduce (4) | 61.15 | 1.22 | 2.0 | 11113.19 | 2376.53 |
gpuarrays/reductions/minimum maximum extrema (2) | 140.04 | 2.39 | 1.7 | 21722.48 | 2168.75 |
gpuarrays/reductions/mapreduce (12) | 114.96 | 1.89 | 1.6 | 17942.78 | 1959.66 |
gpuarrays/reductions/mapreducedim! (13) | 104.12 | 1.57 | 1.5 | 14456.61 | 2160.31 |
gpuarrays/reductions/sum prod (14) | 109.30 | 1.71 | 1.6 | 16192.86 | 2012.47 |
gpuarrays/broadcasting (5) | 152.22 | 2.05 | 1.3 | 19808.61 | 2611.33 |
Testing finished in 4 minutes, 47 seconds, 973 milliseconds
mps/linalg: Error During Test at none:1
Got exception outside of a @test
ProcessExitedException(11)
Worker 9 failed running test gpuarrays/linalg/mul!/vector-matrix:
Some tests did not pass: 139 passed, 1 failed, 0 errored, 0 broken.
gpuarrays/linalg/mul!/vector-matrix: Test Failed at /Users/christian/.julia/dev/GPUArrays/test/testsuite/linalg.jl:315
Expression: compare(*, AT, f(A), x)
Stacktrace:
[1] backtrace()
@ Base ./error.jl:114
[2] record(ts::Test.DefaultTestSet, t::Union{Test.Error, Test.Fail}; print_result::Bool)
@ Test ~/.julia/juliaup/julia-1.11.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Test/src/Test.jl:1107
[3] record(ts::Test.DefaultTestSet, t::Union{Test.Error, Test.Fail})
@ Test ~/.julia/juliaup/julia-1.11.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Test/src/Test.jl:1100
[4] top-level scope
@ ~/.julia/dev/Metal/test/runtests.jl:379
[5] include(fname::String)
@ Main ./sysimg.jl:38
[6] top-level scope
@ none:6
[7] eval
@ ./boot.jl:430 [inlined]
[8] exec_options(opts::Base.JLOptions)
@ Base ./client.jl:296
[9] _start()
@ Base ./client.jl:531
Worker 8 failed running test mps/copy:
Some tests did not pass: 143 passed, 1 failed, 0 errored, 64 broken.
mps/copy: Test Failed at /Users/christian/.julia/dev/Metal/test/mps/copy.jl:46
Expression: dstMat == srcMat
Evaluated: Int8[-7 -37 … -28 -9; -23 -38 … -89 -106; … ; 77 12 … 71 116; -92 -6 … -103 -51] == Int8[-7 -37 … -28 -9; -23 -38 … -89 -106; … ; 77 12 … 71 116; -92 -6 … -103 -51]
Stacktrace:
[1] record(ts::Test.DefaultTestSet, t::Union{Test.Error, Test.Fail}; print_result::Bool)
@ Test ~/.julia/juliaup/julia-1.11.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Test/src/Test.jl:1107
[2] record(ts::Test.DefaultTestSet, t::Union{Test.Error, Test.Fail})
@ Test ~/.julia/juliaup/julia-1.11.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Test/src/Test.jl:1100
[3] top-level scope
@ ~/.julia/dev/Metal/test/runtests.jl:379
[4] include(fname::String)
@ Main ./sysimg.jl:38
[5] top-level scope
@ none:6
[6] eval
@ ./boot.jl:430 [inlined]
[7] exec_options(opts::Base.JLOptions)
@ Base ./client.jl:296
[8] _start()
@ Base ./client.jl:531
Test Summary: | Pass Fail Error Broken Total Time
Overall | 9688 2 1 104 9795
metallib | 25 25
pool | 5 5
metal | 128 128
scripts | 0
profiling | 1 1
capturing | 0
execution | 37 37
mps/matrix | 76 76
mps/size | 9 9
mps/vector | 34 34
examples | 4 4
gpuarrays/indexing scalar | 399 399
kernelabstractions | 2179 8 2187
random | 818 818
device/intrinsics | 129 129
mps/linalg | 1 1
gpuarrays/math/power | 60 60
array | 409 32 441
gpuarrays/indexing find | 45 45
gpuarrays/linalg/mul!/vector-matrix | 139 1 140
gpuarrays/reductions/any all count | 101 101
gpuarrays/uniformscaling | 56 56
gpuarrays/math/intrinsics | 10 10
mps/copy | 143 1 64 208
gpuarrays/indexing multidimensional | 89 89
gpuarrays/reductions/reducedim! | 160 160
gpuarrays/linalg/norm | 264 264
gpuarrays/vectors | 10 10
gpuarrays/linalg/mul!/matrix-matrix | 360 360
gpuarrays/random | 52 52
gpuarrays/linalg | 397 397
gpuarrays/reductions/mapreducedim!_large | 40 40
gpuarrays/constructors | 832 832
gpuarrays/statistics | 52 52
gpuarrays/base | 95 95
gpuarrays/reductions/== isequal | 230 230
gpuarrays/reductions/reduce | 220 220
gpuarrays/reductions/minimum maximum extrema | 555 555
gpuarrays/reductions/mapreduce | 330 330
gpuarrays/reductions/mapreducedim! | 260 260
gpuarrays/reductions/sum prod | 636 636
gpuarrays/broadcasting | 299 299
FAILURE
Error in testset mps/linalg:
Error During Test at none:1
Got exception outside of a @test
ProcessExitedException(11)
Error in testset gpuarrays/linalg/mul!/vector-matrix:
Test Failed at /Users/christian/.julia/dev/GPUArrays/test/testsuite/linalg.jl:315
Expression: compare(*, AT, f(A), x)
Error in testset mps/copy:
Test Failed at /Users/christian/.julia/dev/Metal/test/mps/copy.jl:46
Expression: dstMat == srcMat
Evaluated: Int8[-7 -37 … -28 -9; -23 -38 … -89 -106; … ; 77 12 … 71 116; -92 -6 … -103 -51] == Int8[-7 -37 … -28 -9; -23 -38 … -89 -106; … ; 77 12 … 71 116; -92 -6 … -103 -51]
ERROR: LoadError: Test run finished with errors
in expression starting at /Users/christian/.julia/dev/Metal/test/runtests.jl:410
ERROR: Package Metal errored during testing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Metadata
Assignees
Labels
No labels