Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Dictionary<>.TryInsert() by using ref var #108119

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

SergeiPavlov
Copy link
Contributor

@SergeiPavlov SergeiPavlov commented Sep 23, 2024

Benchmarking shows some improvement:
Windows, x64, .NET9

| Method                          | Mean     | Error     | StdDev    | Median   | Ratio | RatioSD |
|-------------------------------- |---------:|----------:|----------:|---------:|------:|--------:|
| 'Old Dictionary this[i] setter' | 4.938 ns | 0.2048 ns | 0.5710 ns | 4.976 ns |  1.01 |    0.16 |
| 'New Dictionary this[i] setter' | 4.592 ns | 0.1819 ns | 0.5071 ns | 4.440 ns |  0.94 |    0.15 |

Code:
    const int N = 100_000;
    Bench.Dictionary<int, int> newDict = new(N);

    [Benchmark(Description = "New Dictionary this[i] setter", OperationsPerInvoke = N)]
    public void AddToNewDictionary()
    {
        for (int n = N; n-- > 0;)
            newDict[n] = n;
    }

The ref var is called Entry e to avoid collision with Entry entry in the outer scope.

Also:

  • Pre-store entries.Length into variable to avoid extracting it two times on every iteration.

CollectionsMarshalHelper.GetValueRefOrAddDefault() is modified for consistency

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Sep 23, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

@hez2010
Copy link
Contributor

hez2010 commented Sep 23, 2024

avoid extracting it two times on every iteration

I don't think this should be done. We were intently calling array.Length in the branch condition every time because the JIT can recognize the pattern to remove unnecessary bound checks for array access inside the branch.

@SergeiPavlov
Copy link
Contributor Author

SergeiPavlov commented Sep 23, 2024

I don't think this should be done.

The JIT is not always smart.
I'd suggest to change following code:

uint collisionCount = 0;
...
collisionCount++;
if (collisionCount > (uint)entries.Length)
    throw ...;

to

uint collisionsRemaining = entries.Length;
...
if (--collisionsRemaining == 0)
    throw ...;

Comparing with zero would be easier

@huoyaoyuan
Copy link
Member

Comparing with zero would be easier

JIT already does this if it can prove for safety.

The code you are touching includes some tricks for JIT. We should inspect the assembly code to ensure everything is working properly.

@SergeiPavlov
Copy link
Contributor Author

SergeiPavlov commented Sep 23, 2024

We should inspect the assembly code

Looked at the asm with [DisassemblyDiagnoser]:

   --collisionCount;
   if (collisionCount == 0)

translates into

       dec       r11d
       je        near ptr M01_L22

and the current code

collisionCount++;
if (collisionCount > (uint)entries.Length)

translates into

       inc       r12d
       cmp       [rsp+30],r12d
       jae       near ptr M01_L09

So the JIT has proven that entries.Length can be stored in the local var [rsp+30], but it still requires more CPU instructions

ref int bucket = ref GetBucket(hashCode);
int i = bucket - 1; // Value in _buckets is 1-based

if (typeof(TKey).IsValueType && // comparer can only be null for value types; enable JIT to eliminate entire if block for ref types
comparer == null)
{
// ValueType: Devirtualize with EqualityComparer<TKey>.Default intrinsic
while ((uint)i < (uint)entries.Length)
while ((uint)i < entriesLength)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: (uint)i <= (uint)array.Length is already a trick for the JIT, when it can't prove i for non-negative. Extracting array length into a variable is an anti-pattern and will only increase the chance for JIT to mess things up.

{
if (entries[i].hashCode == hashCode && EqualityComparer<TKey>.Default.Equals(entries[i].key, key))
ref Entry e = ref entries[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this really matters. Array accessing is a trivial expression and JIT is capable to do the CSE. Is there any assembly difference for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array accessing is a trivial expression and JIT is capable to do the CSE.

The Dictionary uses ref Entry already in other place:
https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/Dictionary.cs#L889

@huoyaoyuan
Copy link
Member

huoyaoyuan commented Sep 23, 2024

We should inspect the assembly code

Looked at the asm with [DisassemblyDiagnoser]:

I'm referring to the disassembly for current change. Based on my knowledge I'd expect the change to result in zero difference.

Comparing with zero would be easier

This is already an optimization employed by JIT:

https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-9/#loops

Also note that such micro tweaks are mostly handled the JIT folks. They know the current behavior of the JIT to optimize the code, and the architectural capabilities to optimize them in the near future. We do want to write code in more intuitive ways and only do such manual tweaks and tricks for critical cases.

@SergeiPavlov
Copy link
Contributor Author

SergeiPavlov commented Sep 23, 2024

I'm referring to the disassembly for current change. Based on my knowledge I'd expect the change to result in zero difference.

Well
Translation of

if (entries[i].hashCode == hashCode && EqualityComparer<TKey>.Default.Equals(entries[i].key, key))

to

       mov       r9d,r10d
       shl       r9,4
       cmp       [r14+r9+10],r13d
       jne       short M01_L08
       cmp       [r14+r9+18],esi

ref Entry e = ref entries[i];
if (e.hashCode == hashCode && EqualityComparer<TKey>.Default.Equals(e.key, key))

to

       mov       eax,r8d
       shl       rax,4
       lea       r8,[r14+rax+10]
       cmp       [r8],r13d
       jne       near ptr M01_L11
       cmp       [r8+8],esi

The code looks equivally efficient.
But the final criterion is performance. I see ~10% improvement over many benchmarks runs

@eiriktsarpalis
Copy link
Member

Have you tried running the full benchmark suite as defined in https://github.com/dotnet/performance/tree/main/src/benchmarks/micro/libraries/System.Collections?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Collections community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants