-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Fix some scaling issues with the global queue in the thread pool #69386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- The global concurrent queue is not scaling well in some situations on machines with a large number of processors. A lot of contention is seen in dequeuing work on some benchmarks. - I initially tried switching to queuing work from thread pool threads more independently with more efficient work stealing, but it looks like that may need more investigation/experimentation. This is a simpler change that seems to work reasonably well for now. - Beyond 32 procs, added additional concurrent queues, one per 16 procs. When a worker thread begins dispatching work, it assigns one of those additional queues to itself, trying to limit assignments to 16 worker threads per queue. - Work items queued by a worker thread queues to the assigned queue. The worker thread dequeues from the assigned queue first, and later tries to dequeue from other queues. Work items queued by non-thread-pool threads continue to go into the global queue. - When a worker thread stops dispatching work items, it unassigns itself from the queue, and may transfer work items from it if it was the last thread assigned to the queue. - In the observed cases, work items are distributed to the different queues and contention is reduced due to a limited number of threads operating on each queue
Tagging subscribers to this area: @mangod9 Issue Details
|
RPS numbers for ASP.NET perf on an 80-proc arm64 Linux machine below. For this machine's comparison I included dotnet/aspnetcore#40476 on both the before and after sides because in some cases the thread pool global queue bottleneck was only showing up after that fix.
No significant change on a 48-proc x64 Linux machine:
No significant change on other citrine machines where only one queue is used. |
FYI - @sebastienros @JulieLeeMSFT |
And these numbers are without solving the same issue in aspnet? In my tests this was still required so I find this very promising. |
@kouvel is this RPS measurement? |
For the ampere machine I included your fix from dotnet/aspnetcore#40476 in the before and after columns so the numbers include the aspnet fix, as in some tests the thread pool scaling issue was only showing up after that. For the other machines I didn't include the aspnet fix.
Yes, these are RPS numbers. I'll check the mean latency on the ampere machine and will add those. |
Here are the mean latency numbers for ASP.NET perf on the 80-proc arm64 Linux machine:
|
src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs
Show resolved
Hide resolved
@kouvel - any update on this PR? Is there anything left? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one nit. I am sorry it took me so long, I've forgotten about this PR.
src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs
Outdated
Show resolved
Hide resolved
This maintains the FIFO nature of work items queued by a given thread to the global queue. Technically it loses the FIFO nature of multiple threads queueing to the global queue, right? e.g. if thread 1 queues a work item, then signals to thread 2 which in turn queues a work item, it's possible thread 2's work item could be dequeued before thread 1's, yes? I assume we think the chances of that negatively impacting something are slim. |
Yea it's a tradeoff. Maintaining a strict global FIFO order for work items comes with scalability issues. Based on my other testing, breaking the FIFO order more (such as with independent queuing of work by threads with work stealing), currently has some issues with ordering of certain work items, although that model may work better eventually. This change tries to maintain some FIFO ordering while breaking some of it to scale better. I don't anticipate many issues from it since where it's applicable, the scalability issue seems to be a larger issue. Another potential tradeoff is that in some cases on larger machines with low load, some threads may have to spend more CPU time searching more queues for work, resulting in more CPU time for the same work done. I haven't seen it yet, though that one may be workable if it becomes an issue. |
* Update SOS to show thread pool work items from new queues - Depends on dotnet/runtime#69386 - The PR above added new queues of work items. This change updates the `ThreadPool -wi` command to include showing work items from those new queues. - Previously on x86 it looks like it was reading 8 bytes from an array element of pointer size and only using the lower 4 bytes. Fixed to read only pointer size in a few places.
Improvements on Linux-arm64: dotnet/perf-autofiling-issues#6172, dotnet/perf-autofiling-issues#6157 |
@kouvel is there any chance you are going to blog about it? or do a Platform talk? it looks like a very interesting problem, I would love to hear the whole story behind it |
There's some more info about the issue I ran into in dotnet/aspnetcore#41391. There are more things to investigate, including polling for IO on worker threads. It is to be determined what other changes may help and if it would work better than currently. We can chat if you'd like. |
Fixes #67845