[Proposal]: Nominal and Collection Deconstruction

# Nominal and Collection Deconstruction

Discussion: https://github.com/dotnet/csharplang/discussions/8707

## Summary
[summary]: #summary

We allow deconstructing an instance into its constituent properties/fields in a way paralleling how property patterns can conditionally deconstruct an instance, and positional deconstruction can deconstruct instances with a suitable `Deconstruct` method.

We similiarly allow deconstructing a collection into its constituent elements in a way parraleling [list patterns](https://github.com/dotnet/csharplang/pull/3245).

## Motivation
[motivation]: #motivation

It is common to want to extract a number of fields/properties from an instance. Currently this is possible to do declaratively using property patterns, but the fields/properties are only assigned when the pattern matches. This forces you to put your code within an `if statement` if you want to use pattern matching to declaratively extract a number of properties from an instance. In order to keep this brief I will link to a motivating example from an earlier discussion: https://github.com/dotnet/csharplang/discussions/3546.

Additionally there's an aspect of symmetry in the language (see #3107 for more on this theme):

There is currently a parralelism in two dimensions between positional data, nominal data, and collections on one axis, and declaration, construction, deconstruction, and pattern matching on the other.

You can declare types positionally using positional records/primary constructors. You can construct an instance positionally using a constructor, you can deconstruct it using positional deconstructions, you can pattern match it using a positional pattern.

You can declare types nominally through properties/fields. You can construct an instance nominally through an object initialize and you can pattern match it using property patterns.

You can construct a collection using a collection initializer, and you will likely soon be able to pattern match it using list patterns.

This proposal fills in two of the three missing squares here by introducing nominal and sequence deconstructions.

## Detailed design
[design]: #detailed-design

### High level overview.
We have 3 aims which inform this design:

1. Make the most common cases as easy as possible.
2. Maintain symmetry with existing constructs (positional deconstructions and patterns).
3. Don't block ourselves from making enhancements in future language versions.

The most common case is to simply want to declare a bunch of variables. Here we take a cue from positional deconstruction, which allow you to preface a deconstruction with `var` to automatically declare locals for all identifiers within the deconstruction:

```csharp
var {
    Start: { Line: startLine, Column: startColumn },
    End: { Line: endLine, Column: endColumn },
} = textRange;
```

This declares 4 variables, `startLine`, `startColumn`, `endLine`, `endColumn`.

Positional deconstruction also allows you to specify the type explicitly, and assign to arbitrary lValues, so we allow that by leaving off the `var`:

```csharp
{
    Start: { Line: long startLine, Column: C.Column },
    End: var { Line: endLine, Column: endColumn },
} = textRange;
```

Patterns can contain any arbitrary pattern so we allow nesting any deconstruction in any other, e.g:

```csharp
var ({ A: [ a, b, c] }, d) = (new { A = new[]{1, 2, 3} }, 4);
```

Patterns can assign a pattern to a variable, even if the pattern itself contains other nested patterns, so we allow that:

```csharp
var {
    Start: { Line: startLine, Column: startColumn } start,
    End: { Line: endLine, Column: endColumn } end,
} = textRange;
```

It's useful to be able to assign such a variable to an existing local, like so:

```csharp
TextPoint start;
{ Start: { ... } start } = textRange;
```

On the other hand, we want to be able to declare a new local. We can't do so by putting `var` beforehand, since that makes all nested identifiers declare new locals. We don't want to do so by putting an explicit type beforehand, since that would lead to a confusing difference between `var` and other types. Instead we say that `{} identifier` declares a new local if one does not exist, and otherwise assigns to the existing local. This is very different to how C# works so far and may be reconsidered.

We apply all these principles to positional and collection deconstructions as well, so the grammar and spec for the 3 deconstructions is very similiar.

Unlike patterns, deconstruction does no checking for null, or bounds checking, and will throw a `NullReferenceException` or a `IndexOutOfRangeException` if these are violated. As ever, the compiler will warn you if you deconstruct a maybe null reference.

### Changes to grammar

```antlr

statement
    : ...
    | declaration_statement
    ;

declaration_statement
    : declaration_target '=' expression ';'
    | type single_variable_designation ('=' expression)? (',' single_variable_designation ('=' expression)?)* ';'
    ;

foreach_statement
    : ...
    | 'foreach' '(' declaration 'in' expression ')' embedded_statement
    ;

declaration
  : 'var' var_variable_designation
  | type single_variable_designation
  ;

variable_designation
  : var_variable_designation
  | single_variable_designation
  | discard_designation
  ;
  
var_variable_designation
  : parenthesized_variable_designation
  | nominal_variable_designation // new
  | sequence_variable_designation // new
  ;

parenthesized_variable_designation
  : '(' variable_designation ',' variable_designation (',' variable_designation)+ ')' identifier?
  ;

nominal_variable_designation
  : '{' named_variable_designation (',' named_variable_designation)* ','? '}' identifier?
  ;
  
sequence_variable_designation
  : '[' variable_designation (',' variable_designation)* ']' identifier?
  ;

named_variable_designation
  : identifier ':' variable_designation
  ;

single_variable_designation
  : identifier
  ;

discard_designation
  : '_'
  ;
  
declaration_target
  : declaration
  | deconstruction
  ;
  
deconstruction
  : positional_deconstruction
  | nominal_deconstruction // new
  | sequence_deconstruction // new
  ;
  
declaration_target_or_expression
  : declaration_target
  | expression // see https://github.com/dotnet/roslyn/blob/fbf1583ed659db06e903d877b35c3cbd45eb7e1d/src/Compilers/CSharp/Portable/Generated/CSharp.Generated.g4#L685 for complete list
  ; 
  
positional_deconstruction
  : '(' declaration_target_or_expression ',' declaration_target_or_expression (',' declaration_target_or_expression)+ ')' identifier?
  ;
  
nominal_deconstruction
  : '{' nominal_deconstruction_element (',' nominal_deconstruction_element )*, ','? '}' identifier?
  ;
  
sequence_deconstruction
  : '[' declaration_target_or_expression (',' declaration_target_or_expression)* ']' identifier?
  ;
  
nominal_deconstruction_element 
  : identifier ':' declaration_target_or_expression
  ;
```
Examples:
```csharp
// Short-hand deconstruction
var (x, y) = e;   
var [x, y] = e;      
var { A: a } = e;      

// Recursive deconstruction
(var x, var y) = e;
[var x, var y] = e;
{ A: var a } = e;

// Bind to an existing l-value
(x, y) = e;
[x, y] = e;
{ A: a } = e;
```

### Detailed Spec

#### `variable_designation`

A `var_variable_designation` is lowered recursively as follows:

1. Every `var_variable_designation` has a unique target `t`, which is a temporary variable of type `T` inferred from the expression that is assigned to `t`.
   If the `var_variable_designation` is the top level `var_variable_designation` in a `declaration_statement` we assign `expression` to `t`.
   If the `var_variable_designation` is the top level `var_variable_designation` in a `foreach_statement` we assign `enumerator.Current` to `t`.
   Else `t` is defined recursively below.

2. If a `var_variable_designation` defines an `identifier` `i`, we declare a local of type `T?` and name `i` and the same scope as the scope of the `declaration_statement`/`foreach_statement`, and assign `t` to `i`.

3. Assuming the `var_variable_designation` has `n` child `variable_designation`s `v0` to `vn - 1`, we produce a set of child temps `t0` to `tn - 1` as follows.
   1. If the `var_variable_designation` is a `parenthesized_variable_designation` we look for a suitable deconstructor on `T` to deconstruct `t` into `t0` to `tn - 1`. See [the spec](https://github.com/dotnet/roslyn/blob/master/docs/features/deconstruction.md#deconstruction-declaration-deconstruction-into-new-variables) for more details.
   2. If the `var_variable_designation` is a `nominal_variable_designation`, for each `named_variable_designation` with identifier `ix`, `t` must have an accessible property or field `ix`, and we assign `t.ix` to `tx` (this should match the spec for [property patterns](https://github.com/dotnet/csharplang/blob/06eec9bf0a5371db842f8e46547d15f64f18c9af/proposals/csharp-8.0/patterns.md#property-pattern)).
   3. If the `var_variable_designation` is a `sequence_variable_designation`, `t` must have an indexer accepting a single parameter of type `int`, and we assign `t[x]` to `tx` (this should match and keep up to date with spec for [collection patterns](https://github.com/dotnet/csharplang/pull/3245), e.g. we may allow use of `GetEnumerator` here).
   
4. For each child `variable_designation` `vx`
   1. If `vx` is a  `var_variable_designation` we lower vx as specified here, using `tx` as `t` for `vx`.
   2. If `vx` is `single_variable_designation` with `identifier` `ix` we declare a local of type `Tx?` and name `ix` and the same scope as the scope of the `declaration_statement`/`foreach_statement`, and assign `tx` to `ix`.
   3. If `vx` is a `discard_designation` we do nothing.

#### `deconstruction`

A `deconstruction` is lowered recursively as follows:

1. Every `deconstruction` has a unique target `t`, which is a temporary variable of type `T` inferred from the expression that is assigned to `t`.
   If the `deconstruction` is the top level `deconstruction` in a `declaration_statement` we assign `expression` to `t`.
   Else `t` is defined recursively below.

2. If a `deconstruction` defines an `identifier` `i`
   1. If there is a local in scope with name `i` we assign `t` to `i`.
   2. Else we declare a local of type `T?` and name `i` and the same scope as the scope of the `declaration_statement`, and assign `t` to `i`.

3. Assuming the `deconstruction` has `n` child `declaration_target_or_expression`s `d0` to `dn - 1`:
   If this is a top level `deconstruction`:
   For each `declaration_target_or_expression` `dx`
   1. If `dx` is an `expression`, it must be a valid lValue as defined by the [spec](https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/language-specification/expressions#simple-assignment), and we evaluate as much of `dx` as is evaluated before the RHS of an assignment operator as defined by the [spec](https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/language-specification/expressions#simple-assignment). The result of this evaluation is stored in a temp `dtx`.
   2. If `dx` is a `deconstruction` we perform this step recursively to evaluate as much of it's child `expression`s as are necessary. 
4. We produce a set of child temps `t0` to `tn - 1` as follows.
   1. If `deconstruction` is a `positional_deconstruction` we look for a suitable deconstructor on `T` to deconstruct `t` into `t0` to `tn - 1`. See [the spec](https://github.com/dotnet/roslyn/blob/master/docs/features/deconstruction.md#deconstruction-declaration-deconstruction-into-new-variables) for more details.
   2. If the `deconstruction` is a `nominal_deconstruction`, for each `nominal_deconstruction_element` with identifier `ix`, `t` must have an accessible property or field `ix`, and we assign `t.ix` to `tx` (this should match the spec for [property patterns](https://github.com/dotnet/csharplang/blob/06eec9bf0a5371db842f8e46547d15f64f18c9af/proposals/csharp-8.0/patterns.md#property-pattern)).
   3. If the `deconstruction` is a `sequence_deconstruction`, `t` must have an indexer accepting a single parameter of type `int`, and we assign `t[x]` to `tx` (this should match and keep up to date with spec for [collection patterns](https://github.com/dotnet/csharplang/pull/3245), e.g. we may allow use of `GetEnumerator` here).

5. For each child `declaration_target_or_expression` `dx`
   1. If `dx` is an `expression` we assign `tx` to `dtx` as specified by the spec on simple [assignment](https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/language-specification/expressions#assignment-operators). The assignment must be valid according to the rules specified there.
   2. If `dx` is a `declaration`
      1. If the `declaration` is a  `var_variable_designation` we lower vx as specified above, using `tx` as `t` for `vx`.
      2. If the `declaration` is a `single_variable_designation` with `identifier` `ix` and `type` `Tx` we declare a local of type `Tx`` and name `ix` and the same scope as the scope of the `declaration_statement`, and assign `tx` to `ix`. If `type` is `var` `Tx` is inferred from `tx`.
      3. If the `declaration` is a `discard_designation` we do nothing.
   4. If `dx` is a `deconstruction` we lower `dx` as specified here, using `tx` as `t` for `dx`.
   
## Drawbacks
[drawbacks]: #drawbacks

This is a significant set of enhancements to deconstruction. Deconstruction is far less common than pattern matching, so it may be that the benefit from this set of enhancements is not considered sufficient to pay for itself.

### Parsing ambiguities

In order to distinguish between a `nominal_deconstruction` and a `block`, we need to parse till we reach a `,` a `;` or the closing brace (at which point we can check if it's followed by a `=` or not). This lookahead may be expensive. However much of the parsed syntax tree can be reused between the two cases.

In order to distinguish between a `positional_attribute` and an attribute on a local function we need to parse till we reach the closing `]` and check to see if it's followed by a `=` or not. This may also be expensive, although I imagine the most expensive cases will quickly run into something that will disambiguate them, such as expressions that are disallowed in attributes.

If expression blocks are added in the future, this may possibly lead to genuine ambiguities even at a semantic level. E.g `{ P : (condition ? ref a : ref b) } = e;` could be a nominal deconstruction, or an assignment to an expression block containing the label `P`. It shouldn't be too difficult to work around this (e.g. disallow labels for final expression of an expression block).

## Alternatives
[alternatives]: #alternatives

There are a number of simplifications to this spec we could consider:
1. Only allow the `var` form of the patterns as the most common.
2. Don't allow mixing the different forms of deconstruction.
3. Don't allow declaring a local as well as a deconstruction.
etc.

As well there's a lot of axis on which the exact grammar/semantics could be adjusted. I hope I made clear in my high level overview why I made the decisions I did, but I will not be surprised if others come to different conclusions.

## Unresolved questions
[unresolved]: #unresolved-questions

How do we modify the spec I've given above to allow target typing of literals in the case of tuple deconstruction.

## Design meetings

https://github.com/dotnet/csharplang/blob/master/meetings/2020/LDM-2020-11-16.md#nominal-and-collection-deconstruction

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal]: Nominal and Collection Deconstruction #8708

Nominal and Collection Deconstruction

Summary

Motivation

Detailed design

High level overview.

Changes to grammar

Detailed Spec

`variable_designation`

`deconstruction`

Drawbacks

Parsing ambiguities

Alternatives

Unresolved questions

Design meetings

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development