Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When adopting a thread, spin until GC isn't running. #49934

Merged
merged 9 commits into from
May 26, 2023
Merged

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented May 23, 2023

Fixes #49928, alternative to #49930.

@maleadt
Copy link
Member Author

maleadt commented May 23, 2023

I'm not sure how to fix the analyzer error; jl_adopt_thread is currently annotated JL_NOTSAFEPOINT_LEAVE (what does the LEAVE mean?), but many of the functions it uses call into the GC. I guess the analyzer wasn't visiting this function before?

julia/src/threading.c:424:21: error: Calling potential safepoint as SimpleFunctionCall from function annotated JL_NOTSAFEPOINT [julia.GCChecker]
    jl_task_t *ct = jl_init_root_task(ptls, stack_lo, stack_hi);
                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
julia/src/julia_internal.h:863:12: note: Tried to call method defined here
jl_task_t *jl_init_root_task(jl_ptls_t ptls, void *stack_lo, void *stack_hi);
           ^~~~~~~~~~~~~~~~~

We can't annotate jl_init_root_task JL_NOTSAFEPOINT as it calls jl_gc_alloc.

@vchuravy
Copy link
Member

This only fixes part of the issue. You still need to fix the issue where we get past this check and then GC starts and we hit a safepoint without being able to execute that safepoint on this thread yet.

@maleadt maleadt force-pushed the tb/adopt_gc branch 2 times, most recently from c7cb0b0 to 1bd2f6a Compare May 23, 2023 12:36
src/threading.c Outdated Show resolved Hide resolved
src/safepoint.c Outdated Show resolved Hide resolved
@vchuravy
Copy link
Member

Besides the discussion around memory model this looks.good to me.

I will leave final review to @vtjnash and @JeffBezanson

@gbaraldi
Copy link
Member

What happens if a thread calls GC.enable() and tries to enter GC in the meanwhile?

@maleadt
Copy link
Member Author

maleadt commented May 23, 2023

What happens if a thread calls GC.enable() and tries to enter GC in the meanwhile?

GC.enable() only does anything if you've previously disabled GC, so that should be fine.

@gbaraldi
Copy link
Member

Oh, I see it now we have a per thread check.

@maleadt maleadt added multithreading Base.Threads and related functionality bugfix This change fixes an existing bug backport 1.9 Change should be backported to release-1.9 labels May 24, 2023
@maleadt
Copy link
Member Author

maleadt commented May 24, 2023

Couple of CI hangs, with one printing the following backtrace:

KILLING BY THREAD TEST WATCHDOG
======================================================================================
Information request received. A stacktrace will print followed by a 1.0 second profile
======================================================================================
cmd: /cache/build/default-armageddon-1/julialang/julia-master/julia-c2ef05e211/bin/julia 9614 running 2 of 5
signal (10): User defined signal 1
pthread_cond_wait at /lib/aarch64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
jl_safepoint_wait_gc at /cache/build/default-armageddon-7/julialang/julia-master/src/safepoint.c:172
jl_safepoint_start_gc at /cache/build/default-armageddon-7/julialang/julia-master/src/safepoint.c:124
ijl_gc_collect at /cache/build/default-armageddon-7/julialang/julia-master/src/gc.c:3513
gc at ./gcutils.jl:129 [inlined]
macro expansion at /cache/build/default-armageddon-1/julialang/julia-master/julia-c2ef05e211/share/julia/test/threads_exec.jl:209 [inlined]
#251#threadsfor_fun#21 at ./threadingconstructs.jl:184
#251#threadsfor_fun at ./threadingconstructs.jl:151 [inlined]
#1 at ./threadingconstructs.jl:129
unknown function (ip: 0xffffa1e52f4b)

So that seems related.

@maleadt maleadt marked this pull request as draft May 24, 2023 09:58
maleadt and others added 6 commits May 25, 2023 12:43
Otherwise GC might run after the foreign thread exits the spin loop,
but before it is initialized and ready to handle GC safepoints.
Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com>
Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com>
// introduce a race between it and this thread checking if the GC is enabled and only
// then setting jl_gc_running. To avoid that, check again now that we won that race.
if (jl_atomic_load_acquire(&jl_gc_disable_counter)) {
jl_atomic_store_release(&jl_gc_running, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seem logical to be a release, since there are no other stores in this function to act upon

Copy link
Member

@vtjnash vtjnash May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you intended the much more expensive seq-cst semantics so that it establishes an atomic ordering with the prior load?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. The load during jl_adopt_thread is an atomic acquire, which I intended to match with this release store. I'm not sure why it would need to be matched up with stores in this very function?

@KristofferC KristofferC mentioned this pull request Jun 6, 2023
36 tasks
maleadt added a commit that referenced this pull request Jun 22, 2023
Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com>

(cherry-picked from commit 4ef9fb1)
@KristofferC KristofferC removed the backport 1.9 Change should be backported to release-1.9 label Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug multithreading Base.Threads and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

macOS: Adopting a thread during GC results in bus error
5 participants