Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Fix SemaphoreSlim throughput #13766

Merged
merged 1 commit into from
Sep 5, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/mscorlib/shared/System/Threading/SpinWait.cs
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ public struct SpinWait
/// only a suggested value and typically works well when the proper wait is something like an event.
///
/// Spinning less can lead to early waiting and more context switching, spinning more can decrease latency but may use
/// up some CPU time unnecessarily. Depends on the situation too, for instance SemaphoreSlim uses double this number
/// up some CPU time unnecessarily. Depends on the situation too, for instance SemaphoreSlim uses more iterations
/// because the waiting there is currently a lot more expensive (involves more spinning, taking a lock, etc.). It also
/// depends on the likelihood of the spin being successful and how long the wait would be but those are not accounted
/// for here.
Expand Down
62 changes: 48 additions & 14 deletions src/mscorlib/src/System/Threading/SemaphoreSlim.cs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Licensed to the .NET Foundation under one or more agreements.
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

Expand Down Expand Up @@ -56,7 +56,13 @@ public class SemaphoreSlim : IDisposable
// The number of synchronously waiting threads, it is set to zero in the constructor and increments before blocking the
// threading and decrements it back after that. It is used as flag for the release call to know if there are
// waiting threads in the monitor or not.
private volatile int m_waitCount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

volatile [](start = 16, length = 8)

why you removed the volatile on this field?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always accessed from inside a lock (m_lockObj)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure accessing it inside the lock is sufficient? My understanding is the lock implemented using managed thread Id with some compareexchange mechanism. so theoretically, there is a possibility the nonvolatile field will not get updated inside the lock block. I am not sure though :-)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's at least an interlocked operation at acquiring and releasing the lock, so it should get updated properly, otherwise it would be a bug in the lock. Another possibility is that volatile ensures some kind of ordering inside the lock, but I don't see anything like that, all of the state updates and most of the reads are inside the lock so changing ordering would not observe any difference in value of the fields.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a couple of enter/leave paths:

These are C++ versions of the assembly code for the fast paths that actually run, there are interlocked operations on enter and exit. Also checked all of the contention paths and wait paths, they also issue an interlocked operation.

private int m_waitCount;

/// <summary>
/// This is used to help prevent waking more waiters than necessary. It's not perfect and sometimes more waiters than
/// necessary may still be woken, see <see cref="WaitUntilCountOrTimeout"/>.
/// </summary>
private int m_countOfWaitersPulsedToWake;

// Dummy object used to in lock statements to protect the semaphore count, wait handle and cancelation
private object m_lockObj;
Expand Down Expand Up @@ -348,15 +354,15 @@ public bool Wait(int millisecondsTimeout, CancellationToken cancellationToken)
//

// Monitor.Enter followed by Monitor.Wait is much more expensive than waiting on an event as it involves another
// spin, contention, etc. The usual number of spin iterations that would otherwise be used here is doubled to
// spin, contention, etc. The usual number of spin iterations that would otherwise be used here is increased to
// lessen that extra expense of doing a proper wait.
int spinCount = SpinWait.SpinCountforSpinBeforeWait * 2;
int sleep1Threshold = SpinWait.Sleep1ThresholdForSpinBeforeWait * 2;
int spinCount = SpinWait.SpinCountforSpinBeforeWait * 4;
int sleep1Threshold = SpinWait.Sleep1ThresholdForSpinBeforeWait * 4;

SpinWait spin = new SpinWait();
while (true)
SpinWait spinner = new SpinWait();
while (spinner.Count < spinCount)
{
spin.SpinOnce(sleep1Threshold);
spinner.SpinOnce(sleep1Threshold);

if (m_currentCount != 0)
{
Expand Down Expand Up @@ -481,8 +487,22 @@ private bool WaitUntilCountOrTimeout(int millisecondsTimeout, uint startTime, Ca
return false;
}
}

// ** the actual wait **
if (!Monitor.Wait(m_lockObj, remainingWaitMilliseconds))
bool waitSuccessful = Monitor.Wait(m_lockObj, remainingWaitMilliseconds);

// This waiter has woken up and this needs to be reflected in the count of waiters pulsed to wake. Since we
// don't have thread-specific pulse state, there is not enough information to tell whether this thread woke up
// because it was pulsed. For instance, this thread may have timed out and may have been waiting to reacquire
// the lock before returning from Monitor.Wait, in which case we don't know whether this thread got pulsed. So
// in any woken case, decrement the count if possible. As such, timeouts could cause more waiters to wake than
// necessary.
if (m_countOfWaitersPulsedToWake != 0)
{
--m_countOfWaitersPulsedToWake;
}

if (!waitSuccessful)
{
return false;
}
Expand Down Expand Up @@ -804,13 +824,27 @@ public int Release(int releaseCount)
// Increment the count by the actual release count
currentCount += releaseCount;

// Signal to any synchronous waiters
// Signal to any synchronous waiters, taking into account how many waiters have previously been pulsed to wake
// but have not yet woken
int waitCount = m_waitCount;

int waitersToNotify = Math.Min(releaseCount, waitCount);
for (int i = 0; i < waitersToNotify; i++)
Debug.Assert(m_countOfWaitersPulsedToWake <= waitCount);
int waitersToNotify = Math.Min(currentCount, waitCount) - m_countOfWaitersPulsedToWake;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int waitersToNotify = Math.Min(currentCount, waitCount) [](start = 16, length = 55)

Can't we instead of keep tracking of pulsed count, we just do

Min(currentCount, waitCount, releaseCount)? this should give the min # of threads we can pulse it to awake?
The issue I am seeing in m_countOfWaitersPulsedToWake, is not really reflecting the pulse count all the time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's sufficient, for instance suppose the max semaphore count is 1 and there are three threads:

  • T1 acquires the semaphore
  • Semaphore count is 0, T2 is waiting, T3 is waiting
  • T1 releases the semaphore - count is 1, releaseCount is 1, waitCount is 2. One waiter is pulsed, say it's T2.
  • T1 acquires and releases the semaphore again - T2 hasn't woken up yet, count is 1, releaseCount is 1, waitCount is 2. Another waiter is pulsed, this time it will be T3.

As long as the last step repeats, it should not pulse any waiter until the waiter pulsed on the first release wakes up and acknowledges the wake. While T1 repeatedly acquires and releases the lock, all waiters will be pulsed unnecessarily when only one of them can try for the lock anyway.

The new variable tries to track as best as it can (with the information available) how many waiters were pulsed but have not woken up yet, so that that number can be discounted on the next release. It is a conservative number, timeouts will also acknowledge wakeup as though the thread was pulsed because the info is not there to say whether the thread was pulsed. That just means an extra waiter will be pulsed, it shouldn't affect correctness.

if (waitersToNotify > 0)
{
Monitor.Pulse(m_lockObj);
// Ideally, limiting to a maximum of releaseCount would not be necessary and could be an assert instead, but
// since WaitUntilCountOrTimeout() does not have enough information to tell whether a woken thread was
// pulsed, it's possible for m_countOfWaitersPulsedToWake to be less than the number of threads that have
// actually been pulsed to wake.
if (waitersToNotify > releaseCount)
{
waitersToNotify = releaseCount;
}

m_countOfWaitersPulsedToWake += waitersToNotify;
for (int i = 0; i < waitersToNotify; i++)
{
Monitor.Pulse(m_lockObj);
}
}

// Now signal to any asynchronous waiters, if there are any. While we've already
Expand Down