Skip to content

Conversation

Moelf
Copy link
Contributor

@Moelf Moelf commented Jul 26, 2025

fix #59

depends on JuliaGPU/Metal.jl#636

@Moelf
Copy link
Contributor Author

Moelf commented Jul 27, 2025

@testset "AtomixMetalExt:test_inc_threadgroup" begin
    A = Metal.MtlVector(Float32(1):Float32(3))
    metal() do
        GC.@preserve A begin
            B = MtlThreadGroupArray(Float32, 3)
            refB = Atomix.IndexableRef(B, (1,))
            pre, post = Atomix.modify!(refB, +, Float32(1))
            A[1] = B[1]
            A[2] = pre
            A[3] = post
        end
    end
    @test collect(A) == Float32[2, 1, 2]
end

@maleadt I'm trying to add this as a test, but it doesn't work and I'm not entirely sure why:

 caused by: NSError: Compiler encountered an internal error (AGXMetalG16G_B0, code 3)
  Stacktrace:
    [1] Metal.MTL.MTLComputePipelineState(dev::Metal.MTL.MTLDeviceInstance, fun::Metal.MTL.MTLFunctionInstance)
      @ Metal.MTL ~/.julia/packages/Metal/Pwc41/lib/mtl/compute_pipeline.jl:28
    [2] macro expansion
      @ ~/.julia/packages/Metal/Pwc41/src/compiler/compilation.jl:187 [inlined]
    [3] macro expansion
      @ ~/.julia/packages/ObjectiveC/UNTzb/src/os.jl:264 [inlined]
    [4] macro expansion
      @ ~/.julia/packages/Metal/Pwc41/src/compiler/compilation.jl:182 [inlined]
    [5] (::Metal.var"#175#176"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, @NamedTuple{ir::String, air::Vector{UInt8}, metallib::Vector{UInt8}, entry::String}})()
      @ Metal ~/.julia/packages/ObjectiveC/UNTzb/src/foundation.jl:644
    [6] macro expansion
      @ ~/.julia/packages/ObjectiveC/UNTzb/src/foundation.jl:572 [inlined]
    [7] macro expansion
      @ ./lock.jl:273 [inlined]
    [8] ObjectiveC.Foundation.NSAutoreleasePool(f::Metal.var"#175#176"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, @NamedTuple{ir::String, air::Vector{UInt8}, metallib::Vector{UInt8}, entry::String}})
      @ ObjectiveC.Foundation ~/.julia/packages/ObjectiveC/UNTzb/src/foundation.jl:564
    [9] link(job::GPUCompiler.CompilerJob, compiled::@NamedTuple{ir::String, air::Vector{UInt8}, metallib::Vector{UInt8}, entry::String})
      @ Metal ~/.julia/packages/ObjectiveC/UNTzb/src/foundation.jl:643
   [10] actual_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, compiler::typeof(Metal.compile), linker::typeof(Metal.link))
      @ GPUCompiler ~/.julia/packages/GPUCompiler/Ecaql/src/execution.jl:270
   [11] cached_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, compiler::Function, linker::Function)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/Ecaql/src/execution.jl:159
   [12] macro expansion
      @ ~/.julia/packages/Metal/Pwc41/src/compiler/execution.jl:189 [inlined]
   [13] macro expansion
      @ ./lock.jl:273 [inlined]
   [14] mtlfunction(f::var"#g#1"{var"#6#7"{MtlDeviceVector{Float32, 1}}}, tt::Type{Tuple{}}; name::Nothing, kwargs::@Kwargs{})
      @ Metal ~/.julia/packages/Metal/Pwc41/src/compiler/execution.jl:184
   [15] mtlfunction
      @ ~/.julia/packages/Metal/Pwc41/src/compiler/execution.jl:182 [inlined]
   [16] macro expansion
      @ ~/.julia/packages/Metal/Pwc41/src/compiler/execution.jl:85 [inlined]

@Moelf
Copy link
Contributor Author

Moelf commented Jul 27, 2025

https://buildkite.com/julialang/atomix-dot-jl/builds/49/steps/canvas?jid=019849e5-5701-482e-9e07-adb46e4c32f5#019849e5-5701-482e-9e07-adb46e4c32f5/235-353

this is weird, on CI:

  • 1.11 runs, wrong result for the B[1] but pre, post are correct
  • 1.10 runs, wrong results for everything

Copy link
Member

@maleadt maleadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works fine with JuliaGPU/Metal.jl#640

Moelf and others added 2 commits July 28, 2025 08:12
Co-authored-by: Tim Besard <tim.besard@gmail.com>
Co-authored-by: Tim Besard <tim.besard@gmail.com>
@Moelf
Copy link
Contributor Author

Moelf commented Jul 30, 2025

this should be ready to go now

Copy link
Member

@maleadt maleadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI still fails.

Moelf and others added 2 commits July 30, 2025 16:32
Co-authored-by: Tim Besard <tim.besard@gmail.com>
Copy link

codecov bot commented Jul 30, 2025

Codecov Report

❌ Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.20%. Comparing base (e60c518) to head (7462933).
⚠️ Report is 19 commits behind head on main.

Files with missing lines Patch % Lines
ext/AtomixMetalExt.jl 0.00% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main      #60       +/-   ##
===========================================
- Coverage   92.95%   52.20%   -40.76%     
===========================================
  Files           6       10        +4     
  Lines         142      272      +130     
===========================================
+ Hits          132      142       +10     
- Misses         10      130      +120     
Flag Coverage Δ
Pkg.test 52.20% <0.00%> (-40.76%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Moelf
Copy link
Contributor Author

Moelf commented Jul 30, 2025

Test Summary:                  | Pass  Total  Time
AtomixMetalExt:extension_found |    1      1  0.2s
Test Summary:           | Pass  Total   Time
AtomixMetalExt:test_cas |    1      1  11.4s
Test Summary:           | Pass  Total  Time
AtomixMetalExt:test_inc |    1      1  0.1s
AtomixMetalExt:test_inc_threadgroup: Error During Test at /Users/akako/Documents/github/Atomix.jl/test/test_atomix_metal.jl:72
  Got exception outside of a @test
  Compilation to native code failed; see below for details.
  If you think this is a bug, please file an issue and attach the following files:
  - /var/folders/d3/2k4zv7hs7gx1r19fkjb7__3r0000gn/T/jl_VU9Q78LeDG.ll
  - /var/folders/d3/2k4zv7hs7gx1r19fkjb7__3r0000gn/T/jl_BaDW10pM3Q.air
  - /var/folders/d3/2k4zv7hs7gx1r19fkjb7__3r0000gn/T/jl_3y7YHUfMlQ.metallib
  - 

idk why it still fails for my M4, but at least CI is happy?

debug.zip

@maleadt
Copy link
Member

maleadt commented Jul 31, 2025

Are you using the latest Metal.jl?

@Moelf
Copy link
Contributor Author

Moelf commented Aug 2, 2025

julia> Pkg.test("Atomix", test_args=["--Metal"])

seems to pull the latest, because I see:

   Resolving package versions...
    Updating `/private/var/folders/d3/2k4zv7hs7gx1r19fkjb7__3r0000gn/T/jl_9mq3wf/Project.toml`
  [dde4c033] + Metal v1.7.0
    Updating `/private/var/folders/d3/2k4zv7hs7gx1r19fkjb7__3r0000gn/T/jl_9mq3wf/Manifest.toml`
  [79e6a3ab] + Adapt v4.3.0
  [ab4f0b2a] + BFloat16s v0.5.1
  [fa961155] + CEnum v0.5.0
  [523fee87] + CodecBzip2 v0.8.5
  [e2ba6199] + ExprTools v0.1.10
  [0c68f7d7] + GPUArrays v11.2.3
  [46192b85] + GPUArraysCore v0.2.0
  [61eb1bfa] + GPUCompiler v1.6.1
  [096a3bc2] + GPUToolbox v0.3.0
  [076d061b] + HashArrayMappedTries v0.2.0
  [692b3bcd] + JLLWrappers v1.7.1
  [63c18a36] + KernelAbstractions v0.9.38
  [929cbde3] + LLVM v9.4.2
  [1914dd2f] + MacroTools v0.5.16
  [dde4c033] + Metal v1.7.0
  [e86c9b32] + ObjectiveC v3.4.2
⌅ [aea7be01] + PrecompileTools v1.2.1
  [21216c6a] + Preferences v1.4.3
  [189a3867] + Reexport v1.2.2
  [ae029012] + Requires v1.3.1
  [7e506255] + ScopedValues v1.4.0
  [6c6a2e73] + Scratch v1.3.0
  [90137ffa] + StaticArrays v1.9.14
  [1e83bf80] + StaticArraysCore v1.4.3
  [10745b16] + Statistics v1.11.1
  [e689c965] + Tracy v0.1.5
  [3bb67fe8] + TranscodingStreams v0.11.3
  [6e34b625] + Bzip2_jll v1.0.9+0
  [f52de702] + LLVMDowngrader_jll v0.6.0+1
  [dad2f222] + LLVMExtra_jll v0.0.37+2
  [ad6e5548] + LibTracyClient_jll v0.9.1+6
  [4af54fe1] + LazyArtifacts v1.11.0
  [37e2e46d] + LinearAlgebra v1.11.0
  [e66e0078] + CompilerSupportLibraries_jll v1.1.1+0
  [4536629a] + OpenBLAS_jll v0.3.27+1
  [8e850b90] + libblastrampoline_jll v5.11.0+0
        Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated -m`
Test Summary:                  | Pass  Total  Time
AtomixMetalExt:extension_found |    1      1  0.2s
Test Summary:           | Pass  Total   Time
AtomixMetalExt:test_cas |    1      1  11.5s
Test Summary:           | Pass  Total  Time
AtomixMetalExt:test_inc |    1      1  0.2s
AtomixMetalExt:test_inc_threadgroup: Error During Test at /Users/akako/Documents/github/Atomix.jl/test/test_atomix_metal.jl:72
  Got exception outside of a @test
  Compilation to native code failed; see below for details.
  If you think this is a bug, please file an issue and attach the following files:
  - /var/folders/d3/2k4zv7hs7gx1r19fkjb7__3r0000gn/T/jl_O7gzaLs9O9.ll
  - /var/folders/d3/2k4zv7hs7gx1r19fkjb7__3r0000gn/T/jl_4JGT2zeNuL.air
  - /var/folders/d3/2k4zv7hs7gx1r19fkjb7__3r0000gn/T/jl_QHOFKhIz6T.metallib
  Stacktrace:
    [1] error(s::String)
      @ Base ./error.jl:35

@maleadt
Copy link
Member

maleadt commented Aug 5, 2025

It crashes on:

call float @air.atomic.local.add.f32(float addrspace(3)* bitcast ([12 x i8] addrspace(3)* @threadgroup_memory to float addrspace(3)*), float 1.000000e+00, i32 0, i32 1, i1 true)
❯ /private/var/run/com.apple.security.cryptexd/mnt/com.apple.MobileAsset.MetalToolchain-v17.1.5285.9.ldlovp/Metal.xctoolchain/usr/metal/32023/bin/applegpu-nt -arch applegpu_g15s -platform_version macos 15 15.5 -sysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk broken.metallib -N broken.mtlp-json -o /dev/null
zsh: segmentation fault   -arch applegpu_g15s -platform_version macos 15 15.5 -sysroot  broken.metallib

Can you try removing the Float32 from: https://github.com/JuliaGPU/Metal.jl/blob/9b8c6043c223433a9b601ffc3bf2dcc060017cd9/src/device/intrinsics/atomics.jl#L79-L80

If that works, we'll have to make sure Metal.jl only dispatches to air.atomic.local.add.f32 for non-threadgroup memory (we can't do the bitcasting here, so should fall back to the CAS loop).

@Moelf
Copy link
Contributor Author

Moelf commented Aug 5, 2025

unfortunately, removing that results in

AtomixMetalExt:test_inc |    1      1  0.2s
AtomixMetalExt:test_inc_threadgroup: Error During Test at /Users/akako/Documents/github/Atomix.jl/test/test_atomix_metal.jl:72
  Got exception outside of a @test
  InvalidIRError: compiling MethodInstance for (::var"#g#1"{var"#6#7"{MtlDeviceVector{Float32, 1}}})() resulted in invalid LLVM IR
  Reason: unsupported call to an unknown function (call to gpu_malloc)
  Stacktrace:
   [1] malloc
     @ ~/.julia/packages/GPUCompiler/Ecaql/src/runtime.jl:85
   [2] gc_pool_alloc
     @ ~/.julia/packages/GPUCompiler/Ecaql/src/runtime.jl:116
   [3] modify!
     @ ~/.julia/packages/Atomix/UFuJD/ext/AtomixMetalExt.jl:38
   [4] modify!
     @ ~/.julia/packages/Atomix/UFuJD/src/generic.jl:120
   [5] #6
     @ ~/Documents/github/Atomix.jl/test/test_atomix_metal.jl:79
   [6] g
     @ ~/Documents/github/Atomix.jl/test/test_atomix_metal.jl:12
  Reason: unsupported dynamic function invocation (call to atomic_fetch_add_explicit)
  Stacktrace:
   [1] modify!
     @ ~/.julia/packages/Atomix/UFuJD/ext/AtomixMetalExt.jl:38
   [2] modify!
     @ ~/.julia/packages/Atomix/UFuJD/src/generic.jl:120
   [3] #6
     @ ~/Documents/github/Atomix.jl/test/test_atomix_metal.jl:79
   [4] g
     @ ~/Documents/github/Atomix.jl/test/test_atomix_metal.jl:12
  Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erroneous code with Cthulhu.jl

@Moelf
Copy link
Contributor Author

Moelf commented Aug 5, 2025

I also don't understand why it seemingly works on older Apple M chips (CI?), not M4

@maleadt
Copy link
Member

maleadt commented Aug 5, 2025

Ah right, that's because Metal.jl doesn't have atomic_fetch_add_explicit dispatch to atomic_fetch_op_explicit. We should probably add that here. Or add a fallback to Metal.jl for only the ThreadGroup case...

I also don't understand why it seemingly works on older Apple M chips (CI?), not M4

I don't think this is related to M4; your code is somehow still dispatching to an add intrinsic of Float32 without doing a CAS loop. Maybe check @device_code_llvm to see where that call to air.atomic.local.add.f32 is coming from?

@Moelf
Copy link
Contributor Author

Moelf commented Aug 5, 2025

I'm getting the error by just running tests of this repo locally

@maleadt
Copy link
Member

maleadt commented Aug 5, 2025

That's... interesting. Still, can you look at the debug info where the intrinsic comes from?

@Moelf
Copy link
Contributor Author

Moelf commented Aug 5, 2025

idk why, but it works now...

(@gpu) pkg> st
Status `~/.julia/environments/gpu/Project.toml`
  [a9b6321e] Atomix v1.1.1 `../../../Documents/github/Atomix.jl`
  [dde4c033] Metal v1.7.0

using Test
using Metal, Atomix
using Atomix
using Atomix.Internal: referenceable
using Atomix: @atomic, @atomicreplace, @atomicswap

function metal(f)
    function g()
        f()
        nothing
    end
    Metal.@metal g()
end

@testset "" begin
A = Metal.MtlVector(Float32(1):Float32(3))

metal() do
    GC.@preserve A begin
        B = Metal.MtlThreadGroupArray(Float32, 3)
        B[1] = A[1]
        refB = Atomix.IndexableRef(B, (1,))
        pre, post = Atomix.modify!(refB, +, Float32(1))
        A[1] = B[1]
        A[2] = pre
        A[3] = post
    end
end
@test collect(A) == Float32[2, 1, 2]
end



> julia --startup-file=no --project=@gpu test.jl
Test Summary: | Pass  Total  Time
              |    1      1  7.0s

@maleadt
Copy link
Member

maleadt commented Aug 6, 2025

So running tests works fine? Maybe you had a Manifest.toml in your test folder changing which packages were being used?

@Moelf
Copy link
Contributor Author

Moelf commented Aug 10, 2025

Maybe you had a Manifest.toml in your test folder changing which packages were being used?

yes, the Manifest.toml was the problem. But now I'm confused why was this not enough:

diff --git a/test/runtests.jl b/test/runtests.jl
index 80f8d1f..e3ad3fd 100644
--- a/test/runtests.jl
+++ b/test/runtests.jl
@@ -139,7 +139,8 @@ end
 #   julia> Pkg.test("Atomix", test_args=["--Metal"])
 if "--Metal" in ARGS
     import Pkg
-    Pkg.add("Metal")
+    Pkg.develop(path="/Users/akako/Documents/github/Metal.jl")
+    Pkg.update()
     include("test_atomix_metal.jl")
 elseif "--CUDA" in ARGS
     import Pkg

I had to delete the tests/Manifest.toml in addition to the above change, ugh, this can be merged now!

@Moelf Moelf requested a review from maleadt August 10, 2025 15:07
@maleadt
Copy link
Member

maleadt commented Aug 11, 2025

I'm not sure. In any case, the compat of Metal.jl should probably be bumped to v1.7.0, which incidentally should have caught the above issue.

@Moelf Moelf force-pushed the cas_fp_threadgroup branch from cb803c5 to 7462933 Compare August 11, 2025 14:53
@Moelf
Copy link
Contributor Author

Moelf commented Aug 11, 2025

@maleadt true, ok now ready to merge I think

@maleadt maleadt merged commit 7400237 into JuliaConcurrent:main Aug 11, 2025
6 of 8 checks passed
@Moelf Moelf deleted the cas_fp_threadgroup branch August 11, 2025 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FP atomic add not lowering to CAS on Metal.jl
2 participants