-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Fix SemaphoreSlim throughput #13766
Fix SemaphoreSlim throughput #13766
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
// Licensed to the .NET Foundation under one or more agreements. | ||
// Licensed to the .NET Foundation under one or more agreements. | ||
// The .NET Foundation licenses this file to you under the MIT license. | ||
// See the LICENSE file in the project root for more information. | ||
|
||
|
@@ -56,7 +56,13 @@ public class SemaphoreSlim : IDisposable | |
// The number of synchronously waiting threads, it is set to zero in the constructor and increments before blocking the | ||
// threading and decrements it back after that. It is used as flag for the release call to know if there are | ||
// waiting threads in the monitor or not. | ||
private volatile int m_waitCount; | ||
private int m_waitCount; | ||
|
||
/// <summary> | ||
/// This is used to help prevent waking more waiters than necessary. It's not perfect and sometimes more waiters than | ||
/// necessary may still be woken, see <see cref="WaitUntilCountOrTimeout"/>. | ||
/// </summary> | ||
private int m_countOfWaitersPulsedToWake; | ||
|
||
// Dummy object used to in lock statements to protect the semaphore count, wait handle and cancelation | ||
private object m_lockObj; | ||
|
@@ -348,15 +354,15 @@ public bool Wait(int millisecondsTimeout, CancellationToken cancellationToken) | |
// | ||
|
||
// Monitor.Enter followed by Monitor.Wait is much more expensive than waiting on an event as it involves another | ||
// spin, contention, etc. The usual number of spin iterations that would otherwise be used here is doubled to | ||
// spin, contention, etc. The usual number of spin iterations that would otherwise be used here is increased to | ||
// lessen that extra expense of doing a proper wait. | ||
int spinCount = SpinWait.SpinCountforSpinBeforeWait * 2; | ||
int sleep1Threshold = SpinWait.Sleep1ThresholdForSpinBeforeWait * 2; | ||
int spinCount = SpinWait.SpinCountforSpinBeforeWait * 4; | ||
int sleep1Threshold = SpinWait.Sleep1ThresholdForSpinBeforeWait * 4; | ||
|
||
SpinWait spin = new SpinWait(); | ||
while (true) | ||
SpinWait spinner = new SpinWait(); | ||
while (spinner.Count < spinCount) | ||
{ | ||
spin.SpinOnce(sleep1Threshold); | ||
spinner.SpinOnce(sleep1Threshold); | ||
|
||
if (m_currentCount != 0) | ||
{ | ||
|
@@ -481,8 +487,22 @@ private bool WaitUntilCountOrTimeout(int millisecondsTimeout, uint startTime, Ca | |
return false; | ||
} | ||
} | ||
|
||
// ** the actual wait ** | ||
if (!Monitor.Wait(m_lockObj, remainingWaitMilliseconds)) | ||
bool waitSuccessful = Monitor.Wait(m_lockObj, remainingWaitMilliseconds); | ||
|
||
// This waiter has woken up and this needs to be reflected in the count of waiters pulsed to wake. Since we | ||
// don't have thread-specific pulse state, there is not enough information to tell whether this thread woke up | ||
// because it was pulsed. For instance, this thread may have timed out and may have been waiting to reacquire | ||
// the lock before returning from Monitor.Wait, in which case we don't know whether this thread got pulsed. So | ||
// in any woken case, decrement the count if possible. As such, timeouts could cause more waiters to wake than | ||
// necessary. | ||
if (m_countOfWaitersPulsedToWake != 0) | ||
{ | ||
--m_countOfWaitersPulsedToWake; | ||
} | ||
|
||
if (!waitSuccessful) | ||
{ | ||
return false; | ||
} | ||
|
@@ -804,13 +824,27 @@ public int Release(int releaseCount) | |
// Increment the count by the actual release count | ||
currentCount += releaseCount; | ||
|
||
// Signal to any synchronous waiters | ||
// Signal to any synchronous waiters, taking into account how many waiters have previously been pulsed to wake | ||
// but have not yet woken | ||
int waitCount = m_waitCount; | ||
|
||
int waitersToNotify = Math.Min(releaseCount, waitCount); | ||
for (int i = 0; i < waitersToNotify; i++) | ||
Debug.Assert(m_countOfWaitersPulsedToWake <= waitCount); | ||
int waitersToNotify = Math.Min(currentCount, waitCount) - m_countOfWaitersPulsedToWake; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Can't we instead of keep tracking of pulsed count, we just do Min(currentCount, waitCount, releaseCount)? this should give the min # of threads we can pulse it to awake? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think that's sufficient, for instance suppose the max semaphore count is 1 and there are three threads:
As long as the last step repeats, it should not pulse any waiter until the waiter pulsed on the first release wakes up and acknowledges the wake. While T1 repeatedly acquires and releases the lock, all waiters will be pulsed unnecessarily when only one of them can try for the lock anyway. The new variable tries to track as best as it can (with the information available) how many waiters were pulsed but have not woken up yet, so that that number can be discounted on the next release. It is a conservative number, timeouts will also acknowledge wakeup as though the thread was pulsed because the info is not there to say whether the thread was pulsed. That just means an extra waiter will be pulsed, it shouldn't affect correctness. |
||
if (waitersToNotify > 0) | ||
{ | ||
Monitor.Pulse(m_lockObj); | ||
// Ideally, limiting to a maximum of releaseCount would not be necessary and could be an assert instead, but | ||
// since WaitUntilCountOrTimeout() does not have enough information to tell whether a woken thread was | ||
// pulsed, it's possible for m_countOfWaitersPulsedToWake to be less than the number of threads that have | ||
// actually been pulsed to wake. | ||
if (waitersToNotify > releaseCount) | ||
{ | ||
waitersToNotify = releaseCount; | ||
} | ||
|
||
m_countOfWaitersPulsedToWake += waitersToNotify; | ||
for (int i = 0; i < waitersToNotify; i++) | ||
{ | ||
Monitor.Pulse(m_lockObj); | ||
} | ||
} | ||
|
||
// Now signal to any asynchronous waiters, if there are any. While we've already | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why you removed the volatile on this field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's always accessed from inside a lock (m_lockObj)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure accessing it inside the lock is sufficient? My understanding is the lock implemented using managed thread Id with some compareexchange mechanism. so theoretically, there is a possibility the nonvolatile field will not get updated inside the lock block. I am not sure though :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's at least an interlocked operation at acquiring and releasing the lock, so it should get updated properly, otherwise it would be a bug in the lock. Another possibility is that volatile ensures some kind of ordering inside the lock, but I don't see anything like that, all of the state updates and most of the reads are inside the lock so changing ordering would not observe any difference in value of the fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are a couple of enter/leave paths:
These are C++ versions of the assembly code for the fast paths that actually run, there are interlocked operations on enter and exit. Also checked all of the contention paths and wait paths, they also issue an interlocked operation.