[Proposal] Efficient Dictionary Initialization

# Proposal: Efficient Dictionary Initialization

## Background and motivation

Many applications initialize dictionaries during startup or object instance creation, i.e. mapping strings to integer IDs, or mapping integer IDs to types. At present, the initialization of these containers will generate IL for each key-value pair, either to invoke the `Add` method (when using a dictionary initializer i.e. `new () { {k, v} }`) or to initialize each element of a `KeyValuePair[]` array. If the dictionary is large enough, this can produce huge methods that strain the limits of the JIT, AOT compiler, or interpreter - increasing both startup/build time and memory usage.

At present modern C# has support for efficient initialization of simple constant arrays i.e. `int[]` using `RuntimeHelpers.InitializeArray` and `RuntimeHelpers.CreateSpan`. Unfortunately, this support does not apply to `KeyValuePair<K, V>[]`, which means there is no efficient way to initialize simple constant dictionaries. Relying on `KeyValuePair` also makes it difficult to optimize cases where only the keys or values are constant instead of both - a common scenario.

By adding a standardized way to efficiently initialize dictionary types to the BCL we can enable potential language and compiler improvements in this area in addition to giving customers an easy way to improve their startup performance.

## API Proposal / Usage

We add a new method to `Dictionary` and (potentially) `ConcurrentDictionary`:

```csharp
// requirement: keys.Length == values.Length
void AddRange (ReadOnlySpan<K> keys, ReadOnlySpan<V> values);
```

Which could be used like so:

```csharp
var dict = new Dictionary<int, string>();
// Keys will be initialized using RuntimeHelpers.CreateSpan. Runtime+compiler improvements could do that for values too.
dict.AddRange([1, 2, 3, 4], ["a", "b", "c", "d"]);
```

## Extended motivation

As mentioned above, there are common scenarios where all of a dictionary's keys or values may be constant, but not *both* the keys and values are constant. By splitting the keys and values into separate spans, we can take advantage of initialization optimizations for whichever of the two sequences is constant and only programmatically initialize the other sequence at runtime.

## Unresolved questions

* Should we also add a new constructor to `Dictionary` with a similar signature?
* Should this be part of a new interface? That would make it easier to gate adoption of this new approach to dictionary initialization on third-party types if Roslyn and C# adopt it.
* Should this be restricted to creation of new instances from scratch, instead of being the more general `AddRange` operation?

## Risks

* Adding this new method without accompanying language and compiler support would be low value. I think this is not a blocker because users would be able to manually target this new API even if not all the accompanying language and compiler support is there.
* May defy user expectations in terms of side effects:
    For the current `new () { {key1, value1}, {key2, value2}, ... }`, because it generates `Add` method calls, the expression `key2` could have a dependency on global state when it is evaluated, and the `Add(key1, value1)` operation could have side effects that influence the result of the second `Add` call. In that scenario, using this new initialization API would produce a different result. I believe this is not a problem, since the same would apply to the existing `AddRange`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Efficient Dictionary Initialization #109029

kg
openedon Oct 18, 2024

Proposal: Efficient Dictionary Initialization

Background and motivation

API Proposal / Usage

Extended motivation

Unresolved questions

Risks

Assignees

Labels

Type

Projects

Milestone

Relationships

Development