-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Call TryGetValue in overload of ConcurrentDictionary.GetOrAdd to avoid acquiring lock #63
Conversation
acquiring lock. There are two overloads of ConcurrentDictionary.GetOrAdd. One accepts a factory method to generate the value if the key is not found, and calls TryGetValue for the existence check which is a lock-free operation. The other overload, however, does not call TryGetValue - it simply calls straight through to TryAddInternal, which will obtain a bucket lock *before it does the existence check*. This violates the documented claim of ConcurrentDictionary of lock-free reads. This commit simply adds the same TryGetValue check, making the two overloads consistent.
Thanks, @russgray. The fix LGTM. Stylewise, I'd prefer to see this simplified to avoid multiple return statements for the same thing, e.g. instead of: TValue resultingValue;
if (TryGetValue(key, out resultingValue))
{
return resultingValue;
}
TryAddInternal(key, value, false, true, out resultingValue);
return resultingValue; doing: TValue resultingValue;
if (!TryGetValue(key, out resultingValue))
{
TryAddInternal(key, value, false, true, out resultingValue);
}
return resultingValue; I realize you were just copying the style that was already there for the other overload. Want to change both? |
Actually, scratch that. This should be on the CoreFX repository, not in CoreCLR (I didn't notice which repository we were talking about). Can you close this PR and reopen an appropriate one over on CoreFX for the ConcurrentDictionary there rather than the ConcurrentDictionary in mscorlib? (Note that the ConcurrentDictionary in CoreFX has evolved from the one here, so you'll see some slightly different code involving TryGetValueInternal in order to avoid duplicate hashcode generation.) |
No problem, I'll duplicate the change with the style tweak on the other repo |
Thanks. |
For 4 byte integer multiplication, JIT emits a bad-code which is valid only for 8 byte (i8) multiplication. For the fix, I use ```smull```(signed)/```umull```(unsigned) instructions that contain 8 byte results from 4 byte by 4 byte multiplication. So only one multiplication is needed instead of two for this case. By simply shifting the results, we could get the upper results that is used for sign check to detect overflow. Similar transform is made for the unsigned case. But for this case, we might trash the destination register by the extra register, I added register check to optionally emit additional multiplication. I guess we might revisit this to allocate two register like the signed one to ensure only one multiplication for 4 bye case, but I think we should follow this up later if needed. Before smulh w10, w8, w9 --> Incorrect use: smulh is for obtaining the upper bits [127:64] of x8 * x9 mul w8, w8, w9 cmp x10, x8, ASR dotnet#63 After smull x8, x8, x9 --> x8 = w8 * w9 lsr x10, x8, dotnet#32 --> shift the upper bit of x8 to get sign bit cmp w10, w8, ASR #31 --> check sign bit
For 4 byte integer multiplication, JIT emits a bad-code which is valid only for 8 byte (i8) multiplication. For the fix, I use ```smull```(signed)/```umull```(unsigned) instructions that contain 8 byte results from 4 byte by 4 byte multiplication. So only one multiplication is needed instead of two for this case. By simply shifting the results, we could get the upper results that is used to detect overflow. Similar transform is made for the unsigned case. Lower is also changed to reserve a register for overflow check. Before smulh w10, w8, w9 --> Incorrect use: smulh is for obtaining the upper bits [127:64] of x8 * x9 mul w8, w8, w9 cmp x10, x8, ASR dotnet#63 After smull x8, x8, x9 --> x8 = w8 * w9 lsr x10, x8, dotnet#32 --> shift the upper bit of x8 to get sign bit cmp w10, w8, ASR #31 --> check sign bit
For 4 byte integer multiplication, JIT emits a bad-code which is valid only for 8 byte (i8) multiplication. For the fix, I use ```smull```(signed)/```umull```(unsigned) instructions that contain 8 byte results from 4 byte by 4 byte multiplication. So only one multiplication is needed instead of two for this case. By simply shifting the results, we could get the upper results that is used to detect overflow. Similar transform is made for the unsigned case. Lower is also changed to reserve a register for overflow check. Before smulh w10, w8, w9 --> Incorrect use: smulh is for obtaining the upper bits [127:64] of x8 * x9 mul w8, w8, w9 cmp x10, x8, ASR dotnet#63 After smull x8, x8, x9 --> x8 = w8 * w9 lsr x10, x8, dotnet#32 --> shift the upper bit of x8 to get sign bit cmp w10, w8, ASR #31 --> check sign bit
There are two overloads of ConcurrentDictionary.GetOrAdd. One accepts a
factory method to generate the value if the key is not found, and calls
TryGetValue for the existence check which is a lock-free operation.
The other overload, however, does not call TryGetValue - it simply calls
straight through to TryAddInternal, which will obtain a bucket lock before it
does the existence check. This violates the documented claim of
ConcurrentDictionary of lock-free reads. This commit simply adds the same
TryGetValue check, making the two overloads consistent.