-
Notifications
You must be signed in to change notification settings - Fork 29.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
100% CPU caused by thread spike (in Fibers, Meteor) #20083
Comments
node already has the patch, and v8 is oss, why not just submit the patch to them? this also seems like the wrong place to be asking. |
@KoenLav please note v8 doesn't accept prs to that repo, they use gerrit. you can follow this guide to learn how to submit a patch: http://dev.chromium.org/developers/contributing-code i'm going to close this for now, feel free to ping me if you have any other questions. |
ping @nodejs/v8 just in case. Although seems like this is on the right track as is. |
@devsnek no problem to leave this here closed (people should still be able to find it more easily). |
@devsnek the V8 patch has landed (https://chromium-review.googlesource.com/c/v8/v8/+/1014407). Any indication of when it will be available in Node.js? |
@KoenLav I have requested an upstream merge for V8 6.7 and 6.6 so that that can end up in Node 10. Let's see if upstream is willing to back merge. Otherwise I'm +1 on floating it as a patch here otherwise. |
@KoenLav you can run |
Yes, but wait for a disposition upstream first. |
Yes, that is possible. The LTS policy is that a fix has to be released on the current branch for a bit before we back-port to LTS branches. This helps shake any potential instability. The process as I see it, from here:
If all looks good, this will subsequently be considered for a merge back to older release branches. |
@ofrobots as the merge has been approved for V8 6.7, what would be the proper course of action from here? |
Our fuzzers found a crash so I would like to understand what's going on there before moving forward with this. @ofrobots has been CC'ed on the issue. |
@hashseed link? |
@KoenLav unfortunately I can't share the link as it is security sensitive by default. Kenton has access to it though. |
@Hasheed would be happy to try and shed some light if you CC ;) |
It appears to have been a problem with the fuzzer setup, not an actual bug in the code. |
@kentonv good to hear! Have they communicated any ETA on merging this commit to V8 6.7? Or is it already done? |
I've opened the 6.7 merge change upstream. Once there is LGTM, this can land upstream, and will be picked up by Node.js when we move up to V8 6.7 (~ end of the month). For V8 6.6, we need to float the patch here. I've opened PR here: #20727 |
Original commit message: ThreadDataTable: Change global linked list to per-Isolate hash map. For use cases with a large number of threads or a large number of isolates (or both), ThreadDataTable can be a major performance bottleneck due to O(n) lookup time of the linked list. Switching to a hash map reduces this to O(1). Example 1: Sandstorm.io, a Node.js app that utilizes "fibers", was observed spending the majority of CPU time iterating over the ThreadDataTable. See: https://sandstorm.io/news/2016-09-30-fiber-bomb-debugging-story Example 2: Cloudflare's Workers engine, a high-multi-tenancy web server framework built on V8 (but not Node), creates large numbers of threads and isolates per-process. It saw a 34x improvement in throughput when we applied this patch. Cloudflare has been using a patch in production since the Workers launch which replaces the linked list with a hash map -- but still global. This commit builds on that but goes further and creates a separate hash map and mutex for each isolate, with the table being a member of the Isolate class. This avoids any globals and should reduce lock contention. Bug: v8:5338 Change-Id: If0d11509afb2e043b888c376e36d3463db931b47 Reviewed-on: https://chromium-review.googlesource.com/1014407 Reviewed-by: Yang Guo <yangguo@chromium.org> Commit-Queue: Yang Guo <yangguo@chromium.org> Cr-Commit-Position: refs/heads/master@{nodejs#52753} PR-URL: nodejs#20727 Ref: nodejs#20083
Original commit message: ThreadDataTable: Change global linked list to per-Isolate hash map. For use cases with a large number of threads or a large number of isolates (or both), ThreadDataTable can be a major performance bottleneck due to O(n) lookup time of the linked list. Switching to a hash map reduces this to O(1). Example 1: Sandstorm.io, a Node.js app that utilizes "fibers", was observed spending the majority of CPU time iterating over the ThreadDataTable. See: https://sandstorm.io/news/2016-09-30-fiber-bomb-debugging-story Example 2: Cloudflare's Workers engine, a high-multi-tenancy web server framework built on V8 (but not Node), creates large numbers of threads and isolates per-process. It saw a 34x improvement in throughput when we applied this patch. Cloudflare has been using a patch in production since the Worker launch which replaces the linked list with a hash map -- but still global. This commit builds on that but goes further and creates a separate hash map and mutex for each isolate, with the table being a member of the Isolate class. This avoids any globals and should reduce lock contention. Bug: v8:5338 Change-Id: If0d11509afb2e043b888c376e36d3463db931b47 Reviewed-on: https://chromium-review.googlesource.com/1014407 Reviewed-by: Yang Guo <yangguo@chromium.org> Commit-Queue: Yang Guo <yangguo@chromium.org> Cr-Commit-Position: refs/heads/master@{#52753} PR-URL: #20727 Ref: #20083 Refs: #20083 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com>
Original commit message: ThreadDataTable: Change global linked list to per-Isolate hash map. For use cases with a large number of threads or a large number of isolates (or both), ThreadDataTable can be a major performance bottleneck due to O(n) lookup time of the linked list. Switching to a hash map reduces this to O(1). Example 1: Sandstorm.io, a Node.js app that utilizes "fibers", was observed spending the majority of CPU time iterating over the ThreadDataTable. See: https://sandstorm.io/news/2016-09-30-fiber-bomb-debugging-story Example 2: Cloudflare's Workers engine, a high-multi-tenancy web server framework built on V8 (but not Node), creates large numbers of threads and isolates per-process. It saw a 34x improvement in throughput when we applied this patch. Cloudflare has been using a patch in production since the Worker launch which replaces the linked list with a hash map -- but still global. This commit builds on that but goes further and creates a separate hash map and mutex for each isolate, with the table being a member of the Isolate class. This avoids any globals and should reduce lock contention. Bug: v8:5338 Change-Id: If0d11509afb2e043b888c376e36d3463db931b47 Reviewed-on: https://chromium-review.googlesource.com/1014407 Reviewed-by: Yang Guo <yangguo@chromium.org> Commit-Queue: Yang Guo <yangguo@chromium.org> Cr-Commit-Position: refs/heads/master@{#52753} PR-URL: #20727 Ref: #20083 Refs: #20083 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com>
This is the v8.x backport of nodejs#20727. Original commit message: ThreadDataTable: Change global linked list to per-Isolate hash map. For use cases with a large number of threads or a large number of isolates (or both), ThreadDataTable can be a major performance bottleneck due to O(n) lookup time of the linked list. Switching to a hash map reduces this to O(1). Example 1: Sandstorm.io, a Node.js app that utilizes "fibers", was observed spending the majority of CPU time iterating over the ThreadDataTable. See: https://sandstorm.io/news/2016-09-30-fiber-bomb-debugging-story Example 2: Cloudflare's Workers engine, a high-multi-tenancy web server framework built on V8 (but not Node), creates large numbers of threads and isolates per-process. It saw a 34x improvement in throughput when we applied this patch. Cloudflare has been using a patch in production since the Worker launch which replaces the linked list with a hash map -- but still global. This commit builds on that but goes further and creates a separate hash map and mutex for each isolate, with the table being a member of the Isolate class. This avoids any globals and should reduce lock contention. Bug: v8:5338 Change-Id: If0d11509afb2e043b888c376e36d3463db931b47 Reviewed-on: https://chromium-review.googlesource.com/1014407 Reviewed-by: Yang Guo <yangguo@chromium.org> Commit-Queue: Yang Guo <yangguo@chromium.org> Cr-Commit-Position: refs/heads/master@{nodejs#52753} Backport-PR-URL: nodejs#21529 PR-URL: nodejs#20727 Refs: nodejs#20083 Refs: nodejs#20083 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com>
This is the v8.x backport of #20727. Original commit message: ThreadDataTable: Change global linked list to per-Isolate hash map. For use cases with a large number of threads or a large number of isolates (or both), ThreadDataTable can be a major performance bottleneck due to O(n) lookup time of the linked list. Switching to a hash map reduces this to O(1). Example 1: Sandstorm.io, a Node.js app that utilizes "fibers", was observed spending the majority of CPU time iterating over the ThreadDataTable. See: https://sandstorm.io/news/2016-09-30-fiber-bomb-debugging-story Example 2: Cloudflare's Workers engine, a high-multi-tenancy web server framework built on V8 (but not Node), creates large numbers of threads and isolates per-process. It saw a 34x improvement in throughput when we applied this patch. Cloudflare has been using a patch in production since the Worker launch which replaces the linked list with a hash map -- but still global. This commit builds on that but goes further and creates a separate hash map and mutex for each isolate, with the table being a member of the Isolate class. This avoids any globals and should reduce lock contention. Bug: v8:5338 Change-Id: If0d11509afb2e043b888c376e36d3463db931b47 Reviewed-on: https://chromium-review.googlesource.com/1014407 Reviewed-by: Yang Guo <yangguo@chromium.org> Commit-Queue: Yang Guo <yangguo@chromium.org> Cr-Commit-Position: refs/heads/master@{#52753} Backport-PR-URL: #21529 PR-URL: #20727 Refs: #20083 Refs: #20083 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com>
Version: 8.11.1
Platform: any
We have been experiencing 100% CPU in production, presumably caused by the use of Fibers in Node.js by Meteor and the way the data lookup for threads is implemented in V8.
While this issue probably should not be fixed in this repository it seemed worthwhile to track it here as well, in order to find out whether more users of Node.js are suffering from this issue.
As @kentonv describes the issue in the Meteor repository:
V8 uses a linked list to map thread IDs to some thread-local data. Fibers aren't threads, but the fibers package fakes out V8 into thinking that fibers are threads, so this applies to fibers too. If a large number of fibers are created, the list becomes long, and so every lookup into the table becomes slow, because a lookup in a linked list is O(n). These lookups happen frequently, slowing down everything. Using a hash table makes the lookups O(1).
Issue on Meteor repository:
meteor/meteor#9796
Issue on Fibers repository:
laverdet/node-fibers#371
Commit which fixes the issue in Node.js:
0b88256
V8 issue:
https://bugs.chromium.org/p/v8/issues/detail?id=5338
Proposed contribution which fixes the issue in V8:
Not available yet.
The text was updated successfully, but these errors were encountered: