You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Nanny is managing a subprocess in which a Worker is started. If that process exits a cascade of on_exit callbacks are triggered.
The order in which things happen is
AsyncProcess._on_exit
This one is not doing a lot. It sets an event s.t. the process is joinable. Then it triggers another on_exit
WorkerProcess._on_exit is just calling WorkerProcess.mark_stopped
WorkerProcess.mark_stopped is resetting some state in WorkerProcess and is calling another on_exit
Nanny._on_worker_exit_sync is scheduling a coroutine on the loop which is the next on_exit
Nanny._on_worker_exit is unregistering the worker from the scheduler and if need be restarts the worker process
This chain of events is not only confusing but also subject to race conditions. Particularly that the final, most relevant on_exit callback is scheduled with a loop.call_soon allows for various race conditions.
These race conditions are currently not a direct issue. Most race conditions are actually buffered by various idempotent implementations of close/start but once we touch this structure, this is getting a bit shaky.
I debugged this during the investigation of #7312 but this chain is not directly causing the issue. This issue is mostly to document the situation.
The text was updated successfully, but these errors were encountered:
fjetter
changed the title
Confusing Nanny on_exit callback structure
[Draft] Confusing Nanny on_exit callback structure
Nov 16, 2022
The Nanny is managing a subprocess in which a Worker is started. If that process exits a cascade of on_exit callbacks are triggered.
The order in which things happen is
AsyncProcess._on_exit
This one is not doing a lot. It sets an event s.t. the process is joinable. Then it triggers another on_exit
WorkerProcess._on_exit
is just callingWorkerProcess.mark_stopped
WorkerProcess.mark_stopped
is resetting some state inWorkerProcess
and is calling another on_exitNanny._on_worker_exit_sync
is scheduling a coroutine on the loop which is the next on_exitNanny._on_worker_exit
is unregistering the worker from the scheduler and if need be restarts the worker processThis chain of events is not only confusing but also subject to race conditions. Particularly that the final, most relevant on_exit callback is scheduled with a
loop.call_soon
allows for various race conditions.These race conditions are currently not a direct issue. Most race conditions are actually buffered by various idempotent implementations of close/start but once we touch this structure, this is getting a bit shaky.
I debugged this during the investigation of #7312 but this chain is not directly causing the issue. This issue is mostly to document the situation.
The text was updated successfully, but these errors were encountered: