Description
openedon Oct 18, 2024
Proposal: Efficient Dictionary Initialization
Background and motivation
Many applications initialize dictionaries during startup or object instance creation, i.e. mapping strings to integer IDs, or mapping integer IDs to types. At present, the initialization of these containers will generate IL for each key-value pair, either to invoke the Add
method (when using a dictionary initializer i.e. new () { {k, v} }
) or to initialize each element of a KeyValuePair[]
array. If the dictionary is large enough, this can produce huge methods that strain the limits of the JIT, AOT compiler, or interpreter - increasing both startup/build time and memory usage.
At present modern C# has support for efficient initialization of simple constant arrays i.e. int[]
using RuntimeHelpers.InitializeArray
and RuntimeHelpers.CreateSpan
. Unfortunately, this support does not apply to KeyValuePair<K, V>[]
, which means there is no efficient way to initialize simple constant dictionaries. Relying on KeyValuePair
also makes it difficult to optimize cases where only the keys or values are constant instead of both - a common scenario.
By adding a standardized way to efficiently initialize dictionary types to the BCL we can enable potential language and compiler improvements in this area in addition to giving customers an easy way to improve their startup performance.
API Proposal / Usage
We add a new method to Dictionary
and (potentially) ConcurrentDictionary
:
// requirement: keys.Length == values.Length
void AddRange (ReadOnlySpan<K> keys, ReadOnlySpan<V> values);
Which could be used like so:
var dict = new Dictionary<int, string>();
// Keys will be initialized using RuntimeHelpers.CreateSpan. Runtime+compiler improvements could do that for values too.
dict.AddRange([1, 2, 3, 4], ["a", "b", "c", "d"]);
Extended motivation
As mentioned above, there are common scenarios where all of a dictionary's keys or values may be constant, but not both the keys and values are constant. By splitting the keys and values into separate spans, we can take advantage of initialization optimizations for whichever of the two sequences is constant and only programmatically initialize the other sequence at runtime.
Unresolved questions
- Should we also add a new constructor to
Dictionary
with a similar signature? - Should this be part of a new interface? That would make it easier to gate adoption of this new approach to dictionary initialization on third-party types if Roslyn and C# adopt it.
- Should this be restricted to creation of new instances from scratch, instead of being the more general
AddRange
operation?
Risks
- Adding this new method without accompanying language and compiler support would be low value. I think this is not a blocker because users would be able to manually target this new API even if not all the accompanying language and compiler support is there.
- May defy user expectations in terms of side effects:
For the currentnew () { {key1, value1}, {key2, value2}, ... }
, because it generatesAdd
method calls, the expressionkey2
could have a dependency on global state when it is evaluated, and theAdd(key1, value1)
operation could have side effects that influence the result of the secondAdd
call. In that scenario, using this new initialization API would produce a different result. I believe this is not a problem, since the same would apply to the existingAddRange
.