Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Foreign threads] Install signal-handlers early #49930

Closed
wants to merge 1 commit into from

Conversation

vchuravy
Copy link
Member

In #49928 we identified a couple of problems:

  1. jl_init_threadtls sets the gc_state to 0, effectivly bypassing the gc_unsafe_enter transition
  2. jl_init_root_tasks assumes that GC is turned off and installs the signal-handler very late

This moves the signal handler installation before the the gc_unsafe transition.

@vchuravy
Copy link
Member Author

@JeffBezanson and @vtjnash this

julia/src/task.c

Lines 1624 to 1638 in 944b28c

// We need `gcstack` in `Task` to allocate Julia objects; *including* the `Task` type.
// However, to allocate a `Task` via `jl_gc_alloc` as done in `jl_init_root_task`,
// we need the `Task` type itself. We use stack-allocated "raw" `jl_task_t` struct to
// workaround this chicken-and-egg problem. Note that this relies on GC to be turned
// off as GC fails because we don't/can't allocate the type tag.
struct {
jl_value_t *type;
jl_task_t value;
} bootstrap_task = {0};
jl_set_pgcstack(&bootstrap_task.value.gcstack);
bootstrap_task.value.ptls = ptls;
if (jl_nothing == NULL) // make a placeholder
jl_nothing = jl_gc_permobj(0, jl_nothing_type);
jl_task_t *ct = (jl_task_t*)jl_gc_alloc(ptls, sizeof(jl_task_t), jl_task_type);
jl_set_typetagof(ct, jl_task_tag, 0);
being executed with GC running seems problematic, but I don't see a solution.

Besides turning GC off in jl_adopt_thread.

@vchuravy
Copy link
Member Author

Something like:

diff --git a/src/threading.c b/src/threading.c
index 7dcefaf067..3889c7e704 100644
--- a/src/threading.c
+++ b/src/threading.c
@@ -424,10 +424,15 @@ JL_DLLEXPORT jl_gcframe_t **jl_adopt_thread(void) JL_NOTSAFEPOINT_LEAVE
     if (jl_options.handle_signals == JL_OPTIONS_HANDLE_SIGNALS_ON)
         jl_install_thread_signal_handler(ptls);
 
+    // Contains a safepoint thus signal-handlers must be installed
     (void)jl_gc_unsafe_enter(ptls);
+    int en = jl_gc_enable(0);
+
     // warning: this changes `jl_current_task`, so be careful not to call that from this function
     jl_task_t *ct = jl_init_root_task(ptls, stack_lo, stack_hi, 0);
     JL_GC_PROMISE_ROOTED(ct);
+
+    jl_gc_enable(en);
     uv_random(NULL, NULL, &ct->rngState, sizeof(ct->rngState), 0, NULL);
     return &ct->gcstack;
 }

Maybe?

@maleadt
Copy link
Member

maleadt commented May 23, 2023

Running this as-is results in a failed assertion: jl_gc_add_finalizer checks that ptls->gc_state == 0. This call happens from jl_init_threadtls -> jl_gc_add_quiescent, i.e., before jl_adopt_thread has called jl_gc_unsafe_enter.

Temporarily disabling that call to free just results in failure during jl_install_thread_signal_handler though, which calls alloc_sigstack -> jl_malloc_stack where jl_current_task is assumed to have been initialized. However, I presume that only happens in jl_init_root_task, and this PR moved jl_install_thread_signal_handler to before that.

So both moving the signal handler set-up earlier and changing the default gc_state seem to violate what the existing task set-up code assumes. At the risk of missing something obvious here, can't we do something simpler and have the foreign thread spin until it's actually safe to enter GC unsafe region, i.e., until GC has run?

@vtjnash
Copy link
Member

vtjnash commented May 23, 2023

At the risk of missing something obvious here, can't we do something simpler and have the foreign thread spin until it's actually safe to enter GC unsafe region, i.e., until GC has run?

That is what a GC-unsafe transition is.

@vchuravy
Copy link
Member Author

That is what a GC-unsafe transition is.

While true the current implementation is a using the signal handler so we either need to set that up before we first transition a thread or use a slightly different mechanism for first transition.

I think we need to more carefully audit what code is run during setup off a foreign thread / replace them with safe alternatives

@vchuravy
Copy link
Member Author

Owed in favor to #49934, I think that approach is easier to make sound.

@vchuravy vchuravy closed this May 23, 2023
@vchuravy vchuravy deleted the vc/foreign_thread branch May 23, 2023 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants