Skip to content

codegen: add a pass for late conversion of known modify ops to call atomicrmw #57010

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Jan 10, 2025

The ExpandAtomicModify can recognize our pseudo-intrinsic julia.atomicmodify and convert it into some of known atomicrmw expressions, or simplify it with more inlining, as applicable. Ideally we could get this pass upstreamed, since there's nothing specific to julia about this pass, and LLVM's IR cannot express this quite correctly without making it a new intrinsic.

This ensures that now our @atomic modify is as fast as Threads.Atomic!

julia> @code_llvm Threads.atomic_add!(r, 10)
; Function Signature: atomic_add!(Base.Threads.Atomic{Int64}, Int64)
;  @ atomics.jl:307 within `atomic_add!`
define i64 @"julia_atomic_add!_2680"(ptr noundef nonnull align 8 dereferenceable(8) %"x::Atomic", i64 signext %"v::Int64") #0 {
top:
; ┌ @ Base_compiler.jl:94 within `modifyproperty!`
   %0 = atomicrmw add ptr %"x::Atomic", i64 %"v::Int64" acq_rel, align 8
; └
; ┌ @ Base_compiler.jl:54 within `getproperty`
   ret i64 %0
; └
}

@gbaraldi
Copy link
Member

Why is there such a large aotcompile.cpp diff? Looks unrelated

@oscardssmith oscardssmith added performance Must go faster multithreading Base.Threads and related functionality atomics labels Jan 10, 2025
@vtjnash
Copy link
Member Author

vtjnash commented Jan 13, 2025

Why is there such a large aotcompile.cpp diff? Looks unrelated

It is mostly support code for this, since if you don't have the IR emitted for the target, then it has to use a loop&call, which isn't want people want to see, so we need to make sure to emit the target code too into all of the correct compile units. We could land some of it separately, but it wouldn't do anything (be mostly untested) until this landed

@vtjnash vtjnash force-pushed the jn/atomic-modify-opt branch 2 times, most recently from 666985c to c98b43f Compare January 17, 2025 14:32
@oscardssmith
Copy link
Member

oscardssmith commented Jan 27, 2025

Is this ready to merge? Edit: no. CI seems pretty unhappy with this currently.

@DilumAluthge
Copy link
Member

Also there's a merge conflict.

@vtjnash
Copy link
Member Author

vtjnash commented Jan 28, 2025

Please review, but hold off on merging. I think there might be one other PR I try to merge first

@oscardssmith
Copy link
Member

CI failures look real: Specifically the following seems to be a multithreading test that breaks with this PR:

threads_exec.jl with JULIA_NUM_THREADS == 1,0: Test Failed at /cache/build/tester-demeter6-15/julialang/julia-master/julia-67ad4b675e/share/julia/test/threads_exec.jl:444
  Expression: accum[] + Int(var[]) === sum(0:nloops)
   Evaluated: 129821317250305000 === 500500

@vtjnash vtjnash force-pushed the jn/atomic-modify-opt branch from 67ad4b6 to dbecbe8 Compare February 8, 2025 20:38
@@ -9476,7 +9552,7 @@ static jl_llvm_functions_t
Instruction &prologue_end = ctx.builder.GetInsertBlock()->back();

// step 11a. Emit the entry safepoint
if (JL_FEAT_TEST(ctx, safepoint_on_entry))
if (params.safepoint_on_entry && JL_FEAT_TEST(ctx, safepoint_on_entry))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks weird, shouldn't ctx be derived from params?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx derives from params.params not params (I agree the naming choice here was a choice)

vtjnash added 2 commits March 28, 2025 14:35
…tomicrmw

The ExpandAtomicModify can recognize our pseudo-intrinsic
julia.atomicmodify and convert it into some of known atomicrmw
expressions, or simplify it with more inlining, as applicable.

This ensures that now our `@atomic` modify is as fast as
`Threads.Atomic` for the cases we implement now.
@vtjnash vtjnash force-pushed the jn/atomic-modify-opt branch from 7eb8691 to 5cfdf66 Compare March 28, 2025 14:35
@vtjnash
Copy link
Member Author

vtjnash commented Mar 28, 2025

This is now ready for someone to review, particularly any correctness implications of optimizing atomics that I didn't already address

@oscardssmith oscardssmith requested a review from gbaraldi March 28, 2025 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
atomics multithreading Base.Threads and related functionality performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants