Skip to content

Commit

Permalink
Merge pull request #7866 from dotnet/CyrusNajmabadi-patch-1
Browse files Browse the repository at this point in the history
  • Loading branch information
CyrusNajmabadi authored Jan 24, 2024
2 parents 5c4f63b + aba3288 commit 5134f34
Showing 1 changed file with 143 additions and 0 deletions.
143 changes: 143 additions & 0 deletions meetings/working-groups/collection-literals/CL-2024-01-23.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# Collection expressions: design restart meeting 01/23/24

The collection expressions group met again for the first time after shipping C# 12 and going through the holidays. The agenda was to discuss and outline all the interesting areas we want to consider in C# 13 (along with a loose prioritization).

## C# 13 Collection expression design areas (loosely prioritized)

### Dictionary expressions

Broadly speaking, we consider dictionary expressions (tentatively `[k1:v1, k2:v2]`) to be the most important design space for us to front load. Specifically, because it both seems like the area of collections most commonly used in the BCL and in APIs, but which has no support today with collection expressions. We also believe that the decisions made here will impact any decisions we make in the [Natural Type](#natural-type) space.

Broken down, we see important design needing to be made in the following areas:

1. Pattern correspondence.

Collection literals started with a strong emphasis on the construction side needing to have parity with the pattern-matching side. e.g. `x = [a, b, .. c]` and `x is [var x, var b, .. var c]`. We strongly think that this should be kept in mind when it comes to dictionary-expressions as well. That said, dictionary-patterns will be quite interesting in their own right and have several design challenges for them. For example:

- We have to cognizant that we likely should only offer patterns that have a sensible translation. For example: `x is [var key: 0, ..]` is not likely something that could be implemented (as it would need a linear walk of most dictionaries to find all the elements).
- Similarly, `x is [key1: 0, .. var rest]` would likely be extremely expensive, as it would require somehow cloning everything, and removing elements that matched.
- Similarly, certain types of constructs would likely not make sense, like `x is [key1: 0, .., key2: 1]`. Specifically, as dictionaries are not ordered, allowing `slice` patterns in the middle is somewhat nonsensical as there is no before/after concept.

All of the above likely means that the pattern form that dictionaries should support would be limited to: `[(constant-pattern: pattern)+, (..)?]`. In other words, some amount of key/value patterns, where the keys are all constants, and the values associated with those keys are then matched against a pattern, followed by an optional spread, to indicate if the dictionary can have more elements or not beyond those which are matched.

2. The overlap/duality between lists and dictionaries, and what should `k:v` mean in that world.

For example, we could consider things in the following ways:

- `k:v` is simply a shorthand for `KeyValuePair<TKey,TValue>`. And as such, the following would be legal: `List<KeyValuePair<string, int>> nameToAgeList = ["mads": 21]`
- `k:v` represents a key-value *association*, and thus should only be usable with dictionary-esque (tbd) types, not sequence-esque types.

As part of this, we want to figure out what expressions elements can mix/match with what target types. For example (as shown above) can a `k:v` element be used with a sequence-type? Can a normal expression element be used with a dictionary type (e.g. `Dictionary<string, int> x = [kvp1, kvp2]`)?

3. The semantics of dictionary expressions wrt 'adding' vs 'overwriting' key/value associations.

We could go with the simple approach of adding each association one at a time, left to right. This would align both with how collection-expressions work today for sequence collections, as well as how dictionary-initializers work (e.g. `new D<string, int> { { "mads": 21 }, { "cyrus": 22 } }`). However, there is a definite feeling that this approach is limiting and cuts out certain valuable scenarios, all specifically around `spread` elements. For example:

```
// Making a copy, but overwriting some elements
Dictionary<string, int> dest = [.. source, "mads": 21];
// Having some default values, but allowing them to be supplanted.
Dictionary<string, int> dest = ["mads": 21, .. source];
// Merging two dictionaries, with the later ones winning.
Dictionary<string, int> dest = [.. source1, .. source2];
```
Design group feels that they've run into all these situations, and they're generally useful. Having 'Add+Throw' semantics here would be painfully limiting.
If we do decide on 'overwrite' semantics, we could consider having the compiler warn though in the case of multiple `k:v` elements, with the same constant value for `k`.
4. Adopting `JSON` syntax here.
Specifically, allowing `{ "key": value }` to work as legal dictionary-expression syntax. Working group leans no, but we definitely want to run by LDM for thoughts.
- Pros: Great parity with a very popular data format. This would allow users to also instantiate Newtonsoft J-Etcs or System.Text.JXXX types just with real JSON literals. Copy/pasting to/from C# becomes very nice.
- Cons:
- Moves us away from `[...]` being the lingua franca for all collection types.
- Adds a lot of parsing/ambiguity complexity around `{...}`.
- Impacts our future design space around `{...}` (for example, expression blocks).
- Is very difficult to have pattern-parity. `{ k: ... }` is already legal as a property pattern. Needing that to work as a dictionary-pattern is non-trivial (and potentially very confusing for users).
5. Mechanisms to specify a `comparer` or `capacity` for a dictionary.
Discussions with the community have already indicated a strong desire to be able to specify these values (esp. the `comparer`). We think this is common enough to want to be able to have support, ideally in a way that doesn't feel like one is taking a big step back wrt brevity and simplicity of the dictionary-expression. Importantly, it would be a shame if the collection-expression form weren't better than a user just using a dictionary-initializer today (e.g. `new(myComparer) { { "mads": 21 } }`).
Strawman proposals include:
- `[comparer: ..., "mads": 21]` (where `comparer` was now a keyword in this context, and would be stylized as such by the ide). If a user actually wanted to reference a variable called `comparer` they'd then do `@comparer`, like how we normally separate out keywords vs identifiers.
- `[new: (comparer, capacity), "mads": 21]`. A special construct allowing one to state what arguments to pass to the constructor. Note: this would likely be beneficial for normal sequence-collections as well. As `new` is already a keyword and `new:` is not legal in the language today, this would have no concerns around ambiguity.
6. Targeting interfaces.
Like with sequence-expressions, we believe that dictionary-expressions should be able to at least target the well known type `IDictionary<TKey, TValue>` as well as `IReadOnlyDictionary<TKey, TValue>`. We expect to take a similar approach to what we did with sequence-expressions, where we choose `Dictionary<TKey, TValue>` for the former, and allow the compiler to synthesize a read-only type for the latter.
7. Immutable collections.
Like with sequence-expressions, we believe that dictionary-expressions should be able to target types like `ImmutableDictionary<TKey, TValue>`. Our hope is that `CollectionBuilderAttribute` can be used for this purpose, likely pointing at signatures like: `ImmutableDictionary.CreateRange<TKey, TValue>(ReadOnlySpan<KeyValuePair<TKey, TValue>> items)`.
This will tie into the decisions on '5' though wrt to how to pass items like the comparers along.
### Natural type
We consider natural types the next biggest area we would want to tackle. Specifically, we think it would be so influenced by the decisions on dictionary-expressions that it would not be sensible to design this first without seeing how dictionaries play out. For example, in the absence of dictionaries, we might consider the natural type of a sequence expression to be `ReadOnlySpan<SomeT>`. However, would such a decision have a sensible glow-up story to tackle natural types for dictionary expressions? Having a sensible story for both will very likely influence our decisions here.
For natural types, we see two broad areas we would like to examine:
1. The natural type of a collection expression in the absence of *any* *element-target-typing* whatsoever. This would apply when a collection expression was targeted to something like `object` or `var`, where all information about the final type would have to come from the collection expression itself. e.g. `object o = [1, 2, 3]`.
2. The natural type of a collection expression with some contextual *element-target-typing*. This would apply to cases like so:
- `foreach (byte b in [1, 2, 3])`. Here, we believe that the `byte` type should help influence the collection type being created (so that this case is legal).
- `List<Predicate> list = [a, b, .. c ? [_ => true] : []]`. Similarly, the use of `Predicate` here in the target should help influence the type of `[_ => true]` such that this code succeeds.
Overall though, this space is enormous and will need a lot of future design. Open areas include, but are not limited to:
1. Could we envision a world where `var v = [];` is ever allowed? This would involve flow analysis to determine the element type based on how `v` was used.
2. Should we align on a well known type for the natural type, or should we consider it just a special language type (like anonymous-types) that the compiler makes its own determinations about. For example, we could pick a well known type like `List<T>` or we could consider the language having a special ``builtin-list<T>`` type with special properties.
3. Should we try to mandate a very efficient type (like Span/InlineArrays) for perf? How would that work with async/await?
4. Would it be possible to pick an inefficient type (like `List<T>`) but then give the compiler broad leeway to optimize in the common case where it would not be observable (for example, actually emitting on the stack when safe).
### Extension methods
We are still interesting in enabling `[a, b, c].ExtensionMethod()`. However, we think this is not something that should be limited to collection expressions. Broadly speaking, the language is very limited wrt to the interaction of extension methods and target-typed constructs. For example, you cannot do this `(x => true).ExtensionOnStringPredicate()`. The language requires the item being invoked have a type *prior* to lookup of the extension method. To make cases like this work, we'd need to allow them to not have a type, and then say that lookup should still find the extensions, which are then tested for applicability in the rewritten form (e.g. `ExtensionMethod([a, b, c])`).
As we are investing heavily in extensions in C# 13, we may want to roll the exploration of this area into that work.
### Addressing existing types that should work with collection expressions, but don't.
There are several types we've found that annoying don't work with collection expressions. For example:
- `ArraySegment<T>`
- `ReadOnlyMemory<T>`. Commonly used as the analog of `ReadOnlySpan<T>` when you're working with async methods.
- InlineArray
We want to continue collecting these types to see what would be a good approach to expanding out support in the future. Specifically, if the set of types is very small, it might be acceptable to just hardcode support for them. However, if the set grows large, we may need to identify patterns to allow them to participate. For example, we might say that if a type has a `public static implicit operator ThisType(T[] array)` conversion operator, that it would then be constructible with a collection expression.
It's worth noting that making any type now be collection expression constructible would always be a potential breaking change with overload resolution, unless we made these always have a lower priority than the core rules we shipped with.
### Supporting non-generic interfaces
There is an open question on what the (currently illegal) semantics should be for `IEnumerable x = [1, 2, 3];`
There are a few paths we could conceivably take here.
- Have no special behavior for this, and allow 'natural type' to take care of it. If, for example, the natural type of `[1, 2, 3]` was `List<int>` then this would be equivalent to: `IEnumerable x = (List<int>)[1, 2, 3];`. Note that this would then be making a mutable collection under the covers, strongly typed to 'int'.
- Treat the non-generic interfaces as the generic versions with 'object'. In this case, that would mean the above would be equivalent to: `IEnumerable x = (IEnumerable<object>)[1, 2, 3];`. This would then mean the compiler would synthesize the read-only collection, but also that every element would be boxed as object.
- A hybrid approach where the element type might be picked using whatever natural-type system we had, but the collection type itself was the generic interface. In this case, that would mean the above would be equivalent to: `IEnumerable x = (IEnumerable<int>)[1, 2, 3];`, This would then mean the compiler would synthesize the read-only collection, and also that every element would be non-boxed in the underlying storage.
Regardless of what we decide, we have to keep this in mind when doing the [Natural type](#natural-type) work. This will start working (with the first version above) once we have natural types. If we do not like the code that would happen there, we would have to either block or come up with an alternate set of semantics when we do natural types.
### Relaxing restrictions.
Recent feedback has presented real-world examples where our restrictions around api shapes can be annoying. As an example, the Roslyn team themselves have identified a case of a type that they'd like to be collection-expression constructible, which they cannot use. Specifically, while they can add a CollectionBuilderAttribute to it, the type itself is not `IEnumerable<T>`, and so it fails that restriction we have. This restriction seems onerous, especially as the user is stating explicitly that they think their type is a collection through the user of that attribute.
Similarly, we have recently added a restriction that a collection-initializer type used with a collection-expression must have an instance `Add` method that works with the iteration type of the collection. This is more restrictive than what collection-initializers themselves require (simply that they can find a suitable 'Add' method (including extensions)). We may want to relax this if we see worthwhile examples provided.

0 comments on commit 5134f34

Please sign in to comment.