Skip to content

[Proposal] Efficient Dictionary Initialization #109029

Description

Proposal: Efficient Dictionary Initialization

Background and motivation

Many applications initialize dictionaries during startup or object instance creation, i.e. mapping strings to integer IDs, or mapping integer IDs to types. At present, the initialization of these containers will generate IL for each key-value pair, either to invoke the Add method (when using a dictionary initializer i.e. new () { {k, v} }) or to initialize each element of a KeyValuePair[] array. If the dictionary is large enough, this can produce huge methods that strain the limits of the JIT, AOT compiler, or interpreter - increasing both startup/build time and memory usage.

At present modern C# has support for efficient initialization of simple constant arrays i.e. int[] using RuntimeHelpers.InitializeArray and RuntimeHelpers.CreateSpan. Unfortunately, this support does not apply to KeyValuePair<K, V>[], which means there is no efficient way to initialize simple constant dictionaries. Relying on KeyValuePair also makes it difficult to optimize cases where only the keys or values are constant instead of both - a common scenario.

By adding a standardized way to efficiently initialize dictionary types to the BCL we can enable potential language and compiler improvements in this area in addition to giving customers an easy way to improve their startup performance.

API Proposal / Usage

We add a new method to Dictionary and (potentially) ConcurrentDictionary:

// requirement: keys.Length == values.Length
void AddRange (ReadOnlySpan<K> keys, ReadOnlySpan<V> values);

Which could be used like so:

var dict = new Dictionary<int, string>();
// Keys will be initialized using RuntimeHelpers.CreateSpan. Runtime+compiler improvements could do that for values too.
dict.AddRange([1, 2, 3, 4], ["a", "b", "c", "d"]);

Extended motivation

As mentioned above, there are common scenarios where all of a dictionary's keys or values may be constant, but not both the keys and values are constant. By splitting the keys and values into separate spans, we can take advantage of initialization optimizations for whichever of the two sequences is constant and only programmatically initialize the other sequence at runtime.

Unresolved questions

  • Should we also add a new constructor to Dictionary with a similar signature?
  • Should this be part of a new interface? That would make it easier to gate adoption of this new approach to dictionary initialization on third-party types if Roslyn and C# adopt it.
  • Should this be restricted to creation of new instances from scratch, instead of being the more general AddRange operation?

Risks

  • Adding this new method without accompanying language and compiler support would be low value. I think this is not a blocker because users would be able to manually target this new API even if not all the accompanying language and compiler support is there.
  • May defy user expectations in terms of side effects:
    For the current new () { {key1, value1}, {key2, value2}, ... }, because it generates Add method calls, the expression key2 could have a dependency on global state when it is evaluated, and the Add(key1, value1) operation could have side effects that influence the result of the second Add call. In that scenario, using this new initialization API would produce a different result. I believe this is not a problem, since the same would apply to the existing AddRange.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    api-suggestionEarly API idea and discussion, it is NOT ready for implementationarea-System.CollectionsuntriagedNew issue has not been triaged by the area owner

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions