Skip to content

What are the acceptance/success criteria for an improved S.C.G.Dictionary? #107830

Open

Description

I'm looking into vectorizing S.C.G.Dictionary using Vector128. I'm at the point of figuring out how much work it would be to prototype it in-tree, as I have a basic standalone prototype (based on dn_simdhash) with good microbenchmark numbers. Some questions that come to mind:

  • How comprehensive is our automatic benchmark coverage for dictionaries? i.e. corner cases like bad hash functions, memory usage in specific scenarios, etc.
  • How comprehensive is our test coverage for some of the nuances I see marked with comments in the current implementation? And
  • Are all of the nuances in the current implementation ship requirements? i.e. changing which comparer is used for strings once buckets become unbalanced; detecting concurrent modification; the Key and Values collections being classes instead of structs.
  • Is it reasonable if the number of GetHashCode or Equals calls on the comparer change due to the different algorithm?
  • Is it reasonable to assume basic Vector128 support on most targets, and accept that a scalar implementation might be slightly slower than the current one?
  • Is it reasonable for memory usage to increase at certain collection sizes as long as it decreases at most sizes? Or do we need to try and match memory usage at all sizes even if it makes performance less ideal?
  • Is the speed of allocating a new dictionary as important as the speed of add/trygetvalue/remove operations?
  • Is it important for any new implementation to have roughly equivalent (or smaller) code size than the existing one, since there are typically lots of different instantiations of the dictionary type + its enumerator?
  • Is code hygiene a high priority for BCL classes like this? My original prototype was nicely encapsulated, but this made the code generated by ryujit much worse compared to manually duplicating logic for each operation. This makes the resulting code more fragile to maintain, but ideally it won't need to change much. The code for the existing Dictionary is somewhat messy for performance reasons, but this one might end up worse due to the way SIMD bloats method bodies.
  • Would higher dictionary performance be impactful enough to justify ryujit and/or type system changes to support it? My instinct is that dictionary perf has wide impacts, but it would need to be an order of magnitude faster to justify that kind of investment.
  • Which targets do we need to support before it's mergeable? Is x86(64)-with-SSE2 and ARM-with-Neon good enough? Or do we need to hit other targets like RISCV and WASM as well?
  • Is the BCL team comfortable adopting a new implementation of a container like this that's harder to maintain (due to the need for an understanding of SIMD)?

EDIT: Adding a link to the F14 blog post, which is a similar 14-wide vectorized dictionary: https://engineering.fb.com/2019/04/25/developer-tools/f14/?r=1
It explains a lot of the concepts at work here + the tradeoffs quite well. A lot of what they do isn't possible in the current .NET type system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions