-
Notifications
You must be signed in to change notification settings - Fork 5k
Specialize Dictionary<,> for string key #116045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@EgorBot -amd -arm using System;
using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
public class Bench
{
static Dictionary<string, int> Dictionary_OC = Enumerable.Range(0, 1000)
.ToDictionary(i => i.ToString(), i => i);
static Dictionary<string, int> Dictionary_OCIC = Enumerable.Range(0, 1000)
.ToDictionary(i => i.ToString(), i => i, StringComparer.OrdinalIgnoreCase);
[Benchmark]
[Arguments("1")]
[Arguments("999")]
public int Lookup_best(string key) => Dictionary_OC[key];
[Benchmark]
[Arguments("1000000")]
public bool Contains_best(string key) => Dictionary_OC.ContainsKey(key);
[Benchmark]
[Arguments("1")]
[Arguments("999")]
public int Lookup_worst(string key) => Dictionary_OCIC[key];
[Benchmark]
[Arguments("1000000")]
public bool Contains_worst(string key) => Dictionary_OCIC.ContainsKey(key);
} |
Tagging subscribers to this area: @dotnet/area-system-collections |
While I'm certainly excited at the idea of a significant improvement for the 70% case, a 5% regression for other reference-type keys seems problematically large. We also have other similar types, like HashSet, ConcurrentDictionary, etc., and it'd be unfortunate if this was constrained to only Dictionary. |
Agreed. I love the improvement for string keys but 5% for everything else feels like a hard sell. I guess for struct-typed keys we won't be paying the cost, right, since the JIT specializes code for those instead of using Canon? That would mean users of record structs and tuples neither benefit from faster string key lookups nor are penalized by this. Users of record classes would pay the penalty but not benefit, yeah? Does this add overhead for custom comparers or would those be a way to bypass the overhead here? |
So with some acrobatics in JIT, I can limit the overhead to String TKey only (when comparer is not the default one e.g. Would that be acceptable? |
What is the remaining cost for the non default string comparer - just the check for it? But, I'd be inclined to say this is worth it based on the counter factual argument: if we already had such performance we would not reverse it to get a few % in the non default case. Btw can grep.app be used to guess at how common default comparer is? |
Are the key lengths used by your benchmark representative of typical real-world use case? My guess is that the string lengths for typical real-world use cases are quite a bit longer. We may want to use XxHash3 for non-randomized comparer to optimize those cases (see #98838 for the discussion - IIRC, it was motivated by analyzing real workload). |
Yep, presumably it's just two asm instructions (on x64 at least)
Not sure, I guess CodeQL might be a better option here
@jkotas, Can't easily tell, my analysis on some 1P implied they're rather small than large (e.g. not more than 50 chars). That's why I tried to optimize the shared generics/virtual call overhead rather than the hash function. Although, I think it won't hurt to re-use some parts of XxHash3 in corelib for long inputs, I've made an attempt a while ago: EgorBo@8ab3135 |
IMO if it's string tkey only that sounds like a reasonable tradeoff. People who care about perf there are using the ordinal comparer, and if they're using something else it's probably for business requirements (case insensitive lookups, etc) where the requirements are more important than being blazing fast. I do think it's probably worth doing tests with longer string keys like jkotas pointed out though. I would expect the average key length is much higher than 4. Xml tag names, people's full names, filesystem paths, stringified enum values, language names, etc. |
There's a difference between making tradeoffs in something that's brand new and making tradeoffs in something that folks have been building on top of for > 20 years. |
Indeed, a few % could put someone out of SLA, that's why i was only inclined.. |
3332a2a
to
3c7b67a
Compare
Alternative solution is to inline Instead of AggressiveInlining, we can hack in JIT to be more precise (and check profile weight). |
Feel free to open a new issue, I think this is the shared-shared case where we don't even add probes today. See eg #108913 (comment) . It's on my list of things to try and squeeze into .NET 10. |
Yep, it is that one. I've just filed #116055 |
Looks like inlining may already happen naturally with help of PGO even today, I've filed a small BDN-centric improvement for the inliner #116054 that significantly improved lookup in the benchmark: EgorBot/runtime-utils#362 |
@@ -394,6 +394,7 @@ public virtual void GetObjectData(SerializationInfo info, StreamingContext conte | |||
} | |||
} | |||
|
|||
[MethodImpl(MethodImplOptions.AggressiveInlining)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this frequently result in this whole method being inlined into consumer call sites of methods like Dictionary.ContainsKey? That seems bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was not suggesting this as the real fix as I said in #116045 (comment), just wanted to run benchmarks against it, but then I noticed that we already can inline it even without this attribute. I still have a small idea I want to try here (as a workaround for #116055)
A quick search on grep.app reveals 243 574 exact, case-sensitive matches for
Dictionary<
. Of these, 172 763 (about 71%) use String as theTKey
type; many of the remaining dictionaries use value-types as keys.Presumably, in most of these cases the default (ordinal) comparer is used, and per @GrabYourPitchforks dictionary rehashing when we switch to the randomized comparer is extremely rare. That suggests it would be worthwhile to specialize
Dictionary<string, TValue>
for the common ordinal-comparer scenario, a sort of hand-made PGO or generic specialization for ref types.Under the hood,
FindValue
on a string-keyed dictionary is always a shared generic method (_Canon). The JIT is unable to devirtualize any call inside it (There was an issue logged about this, but I couldn’t locate it.)The check I’m proposing is lightweight and should speed up lookups by roughly 50% when using the OrdinalComparer, while incurring only about a 5% slowdown for other reference-keyed dictionaries. With some extra efforts we can, in theory, limit the overhead to String keys only with help of
IsKnownConstant(typeof(TKey)) && typeof(TKey) == typeof(string)
, but my attempt to do so wasn't pretty.@stephentoub @jkotas @kg - is this something we think is worth doing or the regression is not acceptable?
Bonus idea: constant fold
"string literal key".GetNonRandomizedHashCode()
in JIT.