Skip to content

Specialize Dictionary<,> for string key #116045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented May 27, 2025

A quick search on grep.app reveals 243 574 exact, case-sensitive matches for Dictionary<. Of these, 172 763 (about 71%) use String as the TKey type; many of the remaining dictionaries use value-types as keys.

Presumably, in most of these cases the default (ordinal) comparer is used, and per @GrabYourPitchforks dictionary rehashing when we switch to the randomized comparer is extremely rare. That suggests it would be worthwhile to specialize Dictionary<string, TValue> for the common ordinal-comparer scenario, a sort of hand-made PGO or generic specialization for ref types.

Under the hood, FindValue on a string-keyed dictionary is always a shared generic method (_Canon). The JIT is unable to devirtualize any call inside it (There was an issue logged about this, but I couldn’t locate it.)

The check I’m proposing is lightweight and should speed up lookups by roughly 50% when using the OrdinalComparer, while incurring only about a 5% slowdown for other reference-keyed dictionaries. With some extra efforts we can, in theory, limit the overhead to String keys only with help of IsKnownConstant(typeof(TKey)) && typeof(TKey) == typeof(string), but my attempt to do so wasn't pretty.

@stephentoub @jkotas @kg - is this something we think is worth doing or the regression is not acceptable?

Bonus idea: constant fold "string literal key".GetNonRandomizedHashCode() in JIT.

@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 27, 2025
@EgorBo
Copy link
Member Author

EgorBo commented May 27, 2025

@EgorBot -amd -arm

using System;
using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

public class Bench
{
    static Dictionary<string, int> Dictionary_OC = Enumerable.Range(0, 1000)
        .ToDictionary(i => i.ToString(), i => i);

    static Dictionary<string, int> Dictionary_OCIC = Enumerable.Range(0, 1000)
        .ToDictionary(i => i.ToString(), i => i, StringComparer.OrdinalIgnoreCase);

    [Benchmark]
    [Arguments("1")]
    [Arguments("999")]
    public int Lookup_best(string key) => Dictionary_OC[key];

    [Benchmark]
    [Arguments("1000000")]
    public bool Contains_best(string key) => Dictionary_OC.ContainsKey(key);


    [Benchmark]
    [Arguments("1")]
    [Arguments("999")]
    public int Lookup_worst(string key) => Dictionary_OCIC[key];

    [Benchmark]
    [Arguments("1000000")]
    public bool Contains_worst(string key) => Dictionary_OCIC.ContainsKey(key);
}

@jkotas jkotas added area-System.Collections and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels May 27, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

@stephentoub
Copy link
Member

while incurring only about a 5% slowdown for other reference-keyed dictionaries.

While I'm certainly excited at the idea of a significant improvement for the 70% case, a 5% regression for other reference-type keys seems problematically large. We also have other similar types, like HashSet, ConcurrentDictionary, etc., and it'd be unfortunate if this was constrained to only Dictionary.

@kg
Copy link
Member

kg commented May 27, 2025

while incurring only about a 5% slowdown for other reference-keyed dictionaries.

While I'm certainly excited at the idea of a significant improvement for the 70% case, a 5% regression for other reference-type keys seems problematically large. We also have other similar types, like HashSet, ConcurrentDictionary, etc., and it'd be unfortunate if this was constrained to only Dictionary.

Agreed. I love the improvement for string keys but 5% for everything else feels like a hard sell. I guess for struct-typed keys we won't be paying the cost, right, since the JIT specializes code for those instead of using Canon? That would mean users of record structs and tuples neither benefit from faster string key lookups nor are penalized by this. Users of record classes would pay the penalty but not benefit, yeah?

Does this add overhead for custom comparers or would those be a way to bypass the overhead here?

@EgorBo
Copy link
Member Author

EgorBo commented May 27, 2025

5% regression for other reference-type keys seems problematically large.

Does this add overhead for custom comparers or would those be a way to bypass the overhead here?

So with some acrobatics in JIT, I can limit the overhead to String TKey only (when comparer is not the default one e.g. OrdinalComparerIgnoreCase), presumably it will be even less than 5% (e.g. 3-4%) due to slow HashCode/Equals anyway.

Would that be acceptable?

@danmoseley
Copy link
Member

What is the remaining cost for the non default string comparer - just the check for it?

But, I'd be inclined to say this is worth it based on the counter factual argument: if we already had such performance we would not reverse it to get a few % in the non default case.

Btw can grep.app be used to guess at how common default comparer is?

@jkotas
Copy link
Member

jkotas commented May 27, 2025

speed up lookups by roughly 50%

Are the key lengths used by your benchmark representative of typical real-world use case?

My guess is that the string lengths for typical real-world use cases are quite a bit longer. We may want to use XxHash3 for non-randomized comparer to optimize those cases (see #98838 for the discussion - IIRC, it was motivated by analyzing real workload).

@EgorBo
Copy link
Member Author

EgorBo commented May 27, 2025

What is the remaining cost for the non default string comparer - just the check for it?

Yep, presumably it's just two asm instructions (on x64 at least)

Btw can grep.app be used to guess at how common default comparer is?

Not sure, I guess CodeQL might be a better option here

Are the key lengths used by EgorBot/runtime-utils#360 (comment) representative of typical real-world use case?

@jkotas, Can't easily tell, my analysis on some 1P implied they're rather small than large (e.g. not more than 50 chars).
From quick experiments, it seems XxHash32 shines only on large inputs, e.g. check this bechmark. The cutoff is around 40-50 chars (it's a lot lower if we compare against the randomized Marvin)

That's why I tried to optimize the shared generics/virtual call overhead rather than the hash function. Although, I think it won't hurt to re-use some parts of XxHash3 in corelib for long inputs, I've made an attempt a while ago: EgorBo@8ab3135

@kg
Copy link
Member

kg commented May 27, 2025

5% regression for other reference-type keys seems problematically large.

Does this add overhead for custom comparers or would those be a way to bypass the overhead here?

So with some acrobatics in JIT, I can limit the overhead to String TKey only (when comparer is not the default one e.g. OrdinalComparerIgnoreCase), presumably it will be even less than 5% (e.g. 3-4%) due to slow HashCode/Equals anyway.

Would that be acceptable?

IMO if it's string tkey only that sounds like a reasonable tradeoff. People who care about perf there are using the ordinal comparer, and if they're using something else it's probably for business requirements (case insensitive lookups, etc) where the requirements are more important than being blazing fast.

I do think it's probably worth doing tests with longer string keys like jkotas pointed out though. I would expect the average key length is much higher than 4. Xml tag names, people's full names, filesystem paths, stringified enum values, language names, etc.

@stephentoub
Copy link
Member

But, I'd be inclined to say this is worth it based on the counter factual argument

There's a difference between making tradeoffs in something that's brand new and making tradeoffs in something that folks have been building on top of for > 20 years.

@danmoseley
Copy link
Member

But, I'd be inclined to say this is worth it based on the counter factual argument

There's a difference between making tradeoffs in something that's brand new and making tradeoffs in something that folks have been building on top of for > 20 years.

Indeed, a few % could put someone out of SLA, that's why i was only inclined..

@EgorBo EgorBo force-pushed the faster-findvalue-string branch from 3332a2a to 3c7b67a Compare May 28, 2025 00:34
@EgorBo
Copy link
Member Author

EgorBo commented May 28, 2025

Alternative solution is to inline FindValue when its callsite is not a shared generic (e.g. when indexer is inlined). Hence, FindValue(_Canon) becomes specialized as is.

Instead of AggressiveInlining, we can hack in JIT to be more precise (and check profile weight).
I've kicked off a benchmark here (with long strings): EgorBot/runtime-utils#360 (comment)

@AndyAyersMS
Copy link
Member

Under the hood, FindValue on a string-keyed dictionary is always a shared generic method (_Canon). The JIT is unable to devirtualize any call inside it (There was an issue logged about this, but I couldn’t locate it.)

Feel free to open a new issue, I think this is the shared-shared case where we don't even add probes today. See eg #108913 (comment) .

It's on my list of things to try and squeeze into .NET 10.

@EgorBo
Copy link
Member Author

EgorBo commented May 28, 2025

Feel free to open a new issue, I think this is the shared-shared case where we don't even add probes today. See eg #108913 (comment) .

It's on my list of things to try and squeeze into .NET 10.

Yep, it is that one. I've just filed #116055

@EgorBo
Copy link
Member Author

EgorBo commented May 28, 2025

Looks like inlining may already happen naturally with help of PGO even today, I've filed a small BDN-centric improvement for the inliner #116054 that significantly improved lookup in the benchmark: EgorBot/runtime-utils#362

@@ -394,6 +394,7 @@ public virtual void GetObjectData(SerializationInfo info, StreamingContext conte
}
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this frequently result in this whole method being inlined into consumer call sites of methods like Dictionary.ContainsKey? That seems bad.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was not suggesting this as the real fix as I said in #116045 (comment), just wanted to run benchmarks against it, but then I noticed that we already can inline it even without this attribute. I still have a small idea I want to try here (as a workaround for #116055)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants