Fix SemaphoreSlim throughput #13766

kouvel · 2017-09-02T21:24:53Z

In #13670, by mistake I made the spin loop infinite, that is now fixed.

As a result the numbers I had provided in that PR for SemaphoreSlim were skewed, and fixing it caused the throughput to get even lower. To compensate, I have found and fixed one culprit for the low throughput problem:

Every release wakes up a waiter. Effectively, when there is a thread acquiring and releasing the semaphore, waiters don't get to remain in a wait state.
Added a field to keep track of how many waiters were pulsed to wake but have not yet woken, and took that into account in Release() to not wake up more waiters than necessary.
Retuned and increased the number of spin iterations. The total spin delay is still less than before the above PR.

Spin                                  Left score     Right score     ∆ Score %
------------------------------------  -------------  --------------  ---------
SemaphoreSlimLatency 1Pc              237.56 ±0.61%   249.42 ±0.25%      4.99%
SemaphoreSlimLatency 1Pc Delay        209.73 ±1.97%   167.61 ±0.84%    -20.08%
SemaphoreSlimLatency 2Pc              233.03 ±0.70%   235.25 ±0.22%      0.95%
SemaphoreSlimLatency 2Pc Delay        177.67 ±2.44%   171.36 ±0.74%     -3.55%
SemaphoreSlimThroughput 1Pc           214.28 ±9.92%  3112.52 ±0.93%   1352.52%
SemaphoreSlimWaitDrainRate 1Pc        404.05 ±6.84%   486.19 ±1.64%     20.33%
SemaphoreSlimWaitDrainRate 1Pc Delay  271.85 ±8.78%   424.44 ±1.65%     56.13%
SemaphoreSlimWaitDrainRate 2Pc        628.02 ±1.68%   623.87 ±1.58%     -0.66%
SemaphoreSlimWaitDrainRate 2Pc Delay  658.39 ±1.54%   651.09 ±1.41%     -1.11%
------------------------------------  -------------  --------------  ---------
Total                                 300.66 ±3.89%   423.66 ±1.03%     40.91%

In #13670, by mistake I made the spin loop infinite, that is now fixed. As a result the numbers I had provided in that PR for SemaphoreSlim were skewed, and fixing it caused the throughput to get even lower. To compensate, I have found and fixed one culprit for the low throughput problem: - Every release wakes up a waiter. Effectively, when there is a thread acquiring and releasing the semaphore, waiters don't get to remain in a wait state. - Added a field to keep track of how many waiters were pulsed to wake but have not yet woken, and took that into account in Release() to not wake up more waiters than necessary. - Retuned and increased the number of spin iterations. The total spin delay is still less than before the above PR.

kouvel · 2017-09-05T18:16:10Z

@dotnet-bot test CentOS7.1 x64 Debug Build and Test

kouvel · 2017-09-05T18:29:45Z

This also fixes the CoreFX threading test hangs seen in dotnet/corefx#23784

tarekgh · 2017-09-05T20:50:32Z

src/mscorlib/src/System/Threading/SemaphoreSlim.cs

@@ -56,7 +56,13 @@ public class SemaphoreSlim : IDisposable
        // The number of synchronously waiting threads, it is set to zero in the constructor and increments before blocking the
        // threading and decrements it back after that. It is used as flag for the release call to know if there are
        // waiting threads in the monitor or not.
-        private volatile int m_waitCount;


volatile [](start = 16, length = 8)

why you removed the volatile on this field?

It's always accessed from inside a lock (m_lockObj)

Are we sure accessing it inside the lock is sufficient? My understanding is the lock implemented using managed thread Id with some compareexchange mechanism. so theoretically, there is a possibility the nonvolatile field will not get updated inside the lock block. I am not sure though :-)

There's at least an interlocked operation at acquiring and releasing the lock, so it should get updated properly, otherwise it would be a bug in the lock. Another possibility is that volatile ensures some kind of ordering inside the lock, but I don't see anything like that, all of the state updates and most of the reads are inside the lock so changing ordering would not observe any difference in value of the fields.

Here are a couple of enter/leave paths:

https://github.com/dotnet/coreclr/blob/master/src/vm/syncblk.inl#L47

https://github.com/dotnet/coreclr/blob/master/src/vm/syncblk.inl#L218

These are C++ versions of the assembly code for the fast paths that actually run, there are interlocked operations on enter and exit. Also checked all of the contention paths and wait paths, they also issue an interlocked operation.

tarekgh · 2017-09-05T21:11:14Z

src/mscorlib/src/System/Threading/SemaphoreSlim.cs

-                int waitersToNotify = Math.Min(releaseCount, waitCount);
-                for (int i = 0; i < waitersToNotify; i++)
+                Debug.Assert(m_countOfWaitersPulsedToWake <= waitCount);
+                int waitersToNotify = Math.Min(currentCount, waitCount) - m_countOfWaitersPulsedToWake;


int waitersToNotify = Math.Min(currentCount, waitCount) [](start = 16, length = 55)

Can't we instead of keep tracking of pulsed count, we just do

Min(currentCount, waitCount, releaseCount)? this should give the min # of threads we can pulse it to awake?
The issue I am seeing in m_countOfWaitersPulsedToWake, is not really reflecting the pulse count all the time.

I don't think that's sufficient, for instance suppose the max semaphore count is 1 and there are three threads:

T1 acquires the semaphore

Semaphore count is 0, T2 is waiting, T3 is waiting

T1 releases the semaphore - count is 1, releaseCount is 1, waitCount is 2. One waiter is pulsed, say it's T2.

T1 acquires and releases the semaphore again - T2 hasn't woken up yet, count is 1, releaseCount is 1, waitCount is 2. Another waiter is pulsed, this time it will be T3.

As long as the last step repeats, it should not pulse any waiter until the waiter pulsed on the first release wakes up and acknowledges the wake. While T1 repeatedly acquires and releases the lock, all waiters will be pulsed unnecessarily when only one of them can try for the lock anyway.

The new variable tries to track as best as it can (with the information available) how many waiters were pulsed but have not woken up yet, so that that number can be discounted on the next release. It is a conservative number, timeouts will also acknowledge wakeup as though the thread was pulsed because the info is not there to say whether the thread was pulsed. That just means an extra waiter will be pulsed, it shouldn't affect correctness.

tarekgh · 2017-09-05T21:33:22Z

LGTM

kouvel · 2017-09-05T21:52:16Z

Thanks!

stephentoub · 2017-09-05T22:45:25Z

LGTM2

kouvel added area-System.Threading bug Product bug (most likely) labels Sep 2, 2017

kouvel added this to the 2.1.0 milestone Sep 2, 2017

kouvel self-assigned this Sep 2, 2017

kouvel requested a review from tarekgh September 2, 2017 21:24

dnfclas added the cla-already-signed label Sep 2, 2017

kouvel requested a review from stephentoub September 5, 2017 18:16

kouvel mentioned this pull request Sep 5, 2017

Update CoreClr, CoreFx, CoreSetup, ProjectNTfs, ProjectNTfsTestILC, Standard to preview2-25708-01, preview2-25707-02, preview2-25707-02, beta-25708-00, beta-25708-00, preview1-25707-01, respectively (master) dotnet/corefx#23784

Merged

tarekgh reviewed Sep 5, 2017

View reviewed changes

kouvel merged commit e75f090 into dotnet:master Sep 5, 2017

kouvel deleted the SemaphoreSlimFix branch September 5, 2017 21:52

Anipik mentioned this pull request Jun 27, 2018

Moved SafeWaitHandle and cancellationToken to shared #18662

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix SemaphoreSlim throughput #13766

Fix SemaphoreSlim throughput #13766

Uh oh!

kouvel commented Sep 2, 2017

Uh oh!

kouvel commented Sep 5, 2017

Uh oh!

kouvel commented Sep 5, 2017

Uh oh!

tarekgh Sep 5, 2017

Uh oh!

kouvel Sep 5, 2017

Uh oh!

tarekgh Sep 5, 2017

Uh oh!

kouvel Sep 5, 2017

Uh oh!

kouvel Sep 5, 2017

Uh oh!

tarekgh Sep 5, 2017

Uh oh!

kouvel Sep 5, 2017

Uh oh!

tarekgh commented Sep 5, 2017

Uh oh!

kouvel commented Sep 5, 2017

Uh oh!

stephentoub commented Sep 5, 2017

Uh oh!

Uh oh!

Fix SemaphoreSlim throughput #13766

Fix SemaphoreSlim throughput #13766

Uh oh!

Conversation

kouvel commented Sep 2, 2017

Uh oh!

kouvel commented Sep 5, 2017

Uh oh!

kouvel commented Sep 5, 2017

Uh oh!

tarekgh Sep 5, 2017

Choose a reason for hiding this comment

Uh oh!

kouvel Sep 5, 2017

Choose a reason for hiding this comment

Uh oh!

tarekgh Sep 5, 2017

Choose a reason for hiding this comment

Uh oh!

kouvel Sep 5, 2017

Choose a reason for hiding this comment

Uh oh!

kouvel Sep 5, 2017

Choose a reason for hiding this comment

Uh oh!

tarekgh Sep 5, 2017

Choose a reason for hiding this comment

Uh oh!

kouvel Sep 5, 2017

Choose a reason for hiding this comment

Uh oh!

tarekgh commented Sep 5, 2017

Uh oh!

kouvel commented Sep 5, 2017

Uh oh!

stephentoub commented Sep 5, 2017

Uh oh!

Uh oh!