-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
Do not drain worker tasks on main thread #47452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Are https://chromium-review.googlesource.com/c/v8/v8/+/4403395 and https://chromium-review.googlesource.com/c/v8/v8/+/4405689 necessary for this to work? |
Yes, they are necessary for this. |
I think @addaleax would be the best person to answer the questions since she wrote some of the initial patches of the migration to v8 platform, but my 2 cents:
Yes.
This seems to date back to the initial patches of V8 platform integration, but the same question was asked in https://github.com/v8/node/pull/69/files#r187129604 too. From some local testing it seems if we don't drain the worker tasks here (specifically, in the event loop), some foreground task (e.g. a wasm async compile task, which probably comes from the cjs-module-lexer) could get posted after the event loop already finishes, and the general processing order can be messed up - which is why there are a lot of exit code 13 failures in the test here, that means we have some leftover microtasks not yet cleared from the queue during shutdown. We run the microtasks after running foreground tasks here Line 423 in 284e6ac
InternalCallbackScope destructor) when the Node.js environment is still around e.g. when we are still running the event loop. (Personally I am a bit puzzled why we are using InternalCallbackScope for this because its destructor does..a lot of things, probably way more than what PerIsolatePlatformData::RunForegroundTask needs).
|
It looks like there are linter and test failures, could you please reabase and address those? |
This issue/PR was marked as stalled, it will be automatically closed in 30 days. If it should remain open, please leave a comment explaining why it should remain open. |
Closing this because it has stalled. Feel free to reopen if this issue/PR is still relevant, or to ping the collaborator who labelled it stalled if you have any questions. |
I checked out the PR, rebased it and fixed the linter issues but for some reason I'm unable to push it back:
In any case, here's a copy on my fork: https://github.com/targos/node/tree/do-not-drain-worker-tasks |
14b1a3e
to
1bc4970
Compare
@aduh95 Thanks! Now I'm curious: how did you do it? |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #47452 +/- ##
==========================================
+ Coverage 88.04% 88.06% +0.01%
==========================================
Files 651 652 +1
Lines 183409 183543 +134
Branches 35828 35864 +36
==========================================
+ Hits 161487 161635 +148
+ Misses 15174 15160 -14
Partials 6748 6748
|
I rebased using the GH "Update branch" API – I initially tried pushing your branch using git, and saw the same error as you. |
This might be already known, but from my testing, it's More specifically, it's the Lines 639 to 641 in 2545b9e
|
I think the tests here are failing because the user blocking tasks still need to be blocked in the main thread (or otherwise e.g. wasm compilation tasks might get dropped even though the user script needs them to complete). Opened #58047 to try that idea. |
According to the documentation, the v8 tasks should be executed based on priority. Previously we always execute the tasks in FIFO order, this changes the NodePlatform implementation to execute the higher priority tasks first. The tasks used to schedule timers for the delayed tasks are run in FIFO order since priority is irrelavent for the timer scheduling part while the tasks unwrapped by the timer callbacks are still ordered by priority. PR-URL: #58047 Refs: #47452 Refs: #54918 Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
we should not be blocking on the worker tasks on the main thread in one go. Doing so leads to two problems: 1. If any of the worker tasks post another foreground task and wait for it to complete, and that foreground task is posted right after we flush the foreground task queue and before the foreground thread goes into sleep, we'll never be able to wake up to execute that foreground task and in turn the worker task will never complete, and we have a deadlock. 2. Worker tasks can be posted from any thread, not necessarily associated with the current isolate, and we can be blocking on a worker task that is associated with a completely unrelated isolate in the event loop. This is suboptimal. However, not blocking on the worker tasks at all can lead to loss of some critical user-blocking worker tasks e.g. wasm async compilation tasks, which should block the main thread until they are completed, as the documentation suggets. As a compromise, we currently only block on user-blocking tasks to reduce the chance of deadlocks while making sure that criticl user-blocking tasks are not lost. PR-URL: #58047 Refs: #47452 Refs: #54918 Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
According to the documentation, the v8 tasks should be executed based on priority. Previously we always execute the tasks in FIFO order, this changes the NodePlatform implementation to execute the higher priority tasks first. The tasks used to schedule timers for the delayed tasks are run in FIFO order since priority is irrelavent for the timer scheduling part while the tasks unwrapped by the timer callbacks are still ordered by priority. PR-URL: #58047 Refs: #47452 Refs: #54918 Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
we should not be blocking on the worker tasks on the main thread in one go. Doing so leads to two problems: 1. If any of the worker tasks post another foreground task and wait for it to complete, and that foreground task is posted right after we flush the foreground task queue and before the foreground thread goes into sleep, we'll never be able to wake up to execute that foreground task and in turn the worker task will never complete, and we have a deadlock. 2. Worker tasks can be posted from any thread, not necessarily associated with the current isolate, and we can be blocking on a worker task that is associated with a completely unrelated isolate in the event loop. This is suboptimal. However, not blocking on the worker tasks at all can lead to loss of some critical user-blocking worker tasks e.g. wasm async compilation tasks, which should block the main thread until they are completed, as the documentation suggets. As a compromise, we currently only block on user-blocking tasks to reduce the chance of deadlocks while making sure that criticl user-blocking tasks are not lost. PR-URL: #58047 Refs: #47452 Refs: #54918 Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Hi!
This PR stops draining worker tasks on the main thread. I believe it's not necessary since V8 will join its own worker tasks as part of v8::Isolate::Dispose. It may also cause deadlocks now that those worker tasks may ask the main thread to perform a GC, see the discussion in this V8 issue here. This should help enable concurrent sparkplug again, which will get/is disabled with this PR.
Do you know why Node is draining worker tasks here? Am I seeing right that NodePlatform::DrainTasks is also invoked from the event loop here? If so, this might be a problem for performance as we block the main thread until those worker tasks are finished when the event loop is empty.
I have to admit that I didn't run tests yet for that PR and it may require V8 patches as well (see the V8 issue linked above) but I opened the PR already to discuss this.