Description
Experiments indicate that ReaderWriterLockSlim basically "falls apart" (i.e., generates catastrophically bad throughput) whenever it is subjected to high contention generated across a large number of processors.
I have been using the following test app to investigate this (against .NET 4.7): https://gist.github.com/ChrisAhna/37731dc47c30fa4080e9b21f5158bd14
Running on a 16-core machine with Hyperthreading enabled (i.e., %NUMBER_OF_PROCESSORS% is 32), ReaderWriterLockSlim EnterWriteLock/ExitWriteLock generates the following aggregate throughput as contention is added:
$ start /AFFINITY 0x55555555 TestWriteLockContention.exe ReentrantRwlockSlim
Running ReentrantRwlockSlim write lock contention scenarios with process affinity set to 0x0000000055555555...
Initial output should appear in approximately 5 seconds...
ThreadCount=0001: Elapsed=5031ms FinalCount=78629820 CountsPerSecond=15627939.6240608
ThreadCount=0002: Elapsed=5015ms FinalCount=58395460 CountsPerSecond=11642530.0315396
ThreadCount=0004: Elapsed=5031ms FinalCount=41754073 CountsPerSecond=8298913.8566536
ThreadCount=0008: Elapsed=5062ms FinalCount=28070417 CountsPerSecond=5544650.8425038
ThreadCount=0016: Elapsed=5046ms FinalCount=20933858 CountsPerSecond=4147862.49292784
ThreadCount=0032: Elapsed=5045ms FinalCount=2378450 CountsPerSecond=471384.048135497
ThreadCount=0064: Elapsed=5109ms FinalCount=334673 CountsPerSecond=65494.322219392
ThreadCount=0128: Elapsed=5125ms FinalCount=122824 CountsPerSecond=23965.3905918281
ThreadCount=0256: Elapsed=5078ms FinalCount=8056 CountsPerSecond=1586.39200187468
ThreadCount=0512: Elapsed=5343ms FinalCount=4740 CountsPerSecond=887.010489273302
ThreadCount=1024: Elapsed=5921ms FinalCount=6589 CountsPerSecond=1112.64013055698
ThreadCount=2048: Elapsed=7454ms FinalCount=49008 CountsPerSecond=6574.34819405038
ThreadCount=4096: Elapsed=5734ms FinalCount=44959 CountsPerSecond=7840.13593312397
All scenarios are complete, press any key to exit...
Note that the total throughput with 512 threads is about 13000 times slower than the total throughput with 2 threads.
In contrast, on the same machine, using "lock (obj) { ... }" instead of the EnterWriteLock/ExitWriteLock sequence generates throughput at high thread counts that is not even 2 times slower than throughput with 2 threads:
$ start /AFFINITY 0x55555555 TestWriteLockContention.exe StandardObjectLock
Running StandardObjectLock write lock contention scenarios with process affinity set to 0x0000000055555555...
Initial output should appear in approximately 5 seconds...
ThreadCount=0001: Elapsed=5015ms FinalCount=164436010 CountsPerSecond=32784088.17451
ThreadCount=0002: Elapsed=5015ms FinalCount=108796243 CountsPerSecond=21691151.399297
ThreadCount=0004: Elapsed=5015ms FinalCount=96353028 CountsPerSecond=19210275.7290728
ThreadCount=0008: Elapsed=5015ms FinalCount=111299873 CountsPerSecond=22190412.096161
ThreadCount=0016: Elapsed=5015ms FinalCount=73382682 CountsPerSecond=14630694.0463446
ThreadCount=0032: Elapsed=5015ms FinalCount=80234151 CountsPerSecond=15996616.1819402
ThreadCount=0064: Elapsed=5015ms FinalCount=83002540 CountsPerSecond=16548564.3732314
ThreadCount=0128: Elapsed=5015ms FinalCount=78919963 CountsPerSecond=15734601.3716025
ThreadCount=0256: Elapsed=5015ms FinalCount=84043801 CountsPerSecond=16756164.9441036
ThreadCount=0512: Elapsed=5031ms FinalCount=83017116 CountsPerSecond=16500069.7332038
ThreadCount=1024: Elapsed=5078ms FinalCount=84650561 CountsPerSecond=16669410.6192323
ThreadCount=2048: Elapsed=5140ms FinalCount=69318837 CountsPerSecond=13484212.9644246
ThreadCount=4096: Elapsed=5296ms FinalCount=82805924 CountsPerSecond=15632759.585396
All scenarios are complete, press any key to exit...