Description
There are many scenarios where you'd like to group a set of typed values temporarily, without the grouping itself warranting a "concept" or type name of its own.
Other languages use variations over the notion of tuples for this. Maybe C# should too.
This proposal follows up on #98 and addresses #102 and #307.
Background
The most common situation where values need to be temporarily grouped, a list of arguments to (e.g.) a method, has syntactic support in C#. However, the probably second-most common, a list of results, does not.
While there are many situations where tuple support could be useful, the most prevalent by far is the ability to return multiple values from an operation.
Your options today include:
Out parameters:
public void Tally(IEnumerable<int> values, out int sum, out int count) { ... }
int s, c;
Tally(myValues, out s, out c);
Console.WriteLine($"Sum: {s}, count: {c}");
This approach cannot be used for async methods, and it is also rather painful to consume, requiring variables to be first declared (and var
is not an option), then passed as out parameters in a separate statement, then consumed.
On the bright side, because the results are out parameters, they have names, which help indicate which is which.
System.Tuple:
public Tuple<int, int> Tally(IEnumerable<int> values) { ... }
var t = Tally(myValues);
Console.WriteLine($"Sum: {t.Item1}, count: {t.Item2}");
This works for async methods (you could return Task<Tuple<int, int>>
), and you only need two statements to consume it. On the downside, the consuming code is perfectly obscure - there is nothing to indicate that you are talking about a sum and a count. Finally, there's a cost to allocating the Tuple object.
Declared transport type
public struct TallyResult { public int Sum; public int Count; }
public TallyResult Tally(IEnumerable<int> values) { ... }
var t = Tally(myValues);
Console.WriteLine($"Sum: {t.Sum}, count: {t.Count}");
This has by far the best consumption experience. It works for async methods, the resulting struct has meaningful field names, and being a struct, it doesn't require heap allocation - it is essentially passed on the stack in the same way that the argument list to a method.
The downside of course is the need to declare the transport type. THe declaration is meaningless overhead in itself, and since it doesn't represent a clear concept, it is hard to give it a meaningful name. You can name it after the operation that returns it (like I did above), but then you cannot reuse it for other operations.
Tuple syntax
If the most common use case is multiple results, it seems reasonable to strive for symmetry with parameter lists and argument lists. If you can squint and see "things going in" and "things coming out" as two sides of the same coin, then that seems to be a good sign that the feature is well integrated into the existing language, and may in fact improve the symmetry instead of (or at least in addition to) adding conceptual weight.
Tuple types
Tuple types would be introduced with syntax very similar to a parameter list:
public (int sum, int count) Tally(IEnumerable<int> values) { ... }
var t = Tally(myValues);
Console.WriteLine($"Sum: {t.sum}, count: {t.count}");
The syntax (int sum, int count)
indicates an anonymous struct type with public fields of the given names and types.
Note that this is different from some notions of tuple, where the members are not given names but only positions. This is a common complaint, though, essentially degrading the consumption scenario to that of System.Tuple
above. For full usefulness, tuples members need to have names.
This is fully compatible with async:
public async Task<(int sum, int count)> TallyAsync(IEnumerable<int> values) { ... }
var t = await TallyAsync(myValues);
Console.WriteLine($"Sum: {t.sum}, count: {t.count}");
Tuple literals
With no further syntax additions to C#, tuple values could be created as
var t = new (int sum, int count) { sum = 0, count = 0 };
Of course that's not very convenient. We should have a syntax for tuple literals, and given the principle above it should closely mirror that of argument lists.
Creating a tuple value of a known target type, should enable leaving out the member names:
public (int sum, int count) Tally(IEnumerable<int> values)
{
var s = 0; var c = 0;
foreach (var value in values) { s += value; c++; }
return (s, c); // target typed to (int sum, int count)
}
Using named arguments as a syntax analogy it may also be possible to give the names of the tuple fields directly in the literal:
public (int sum, int count) Tally(IEnumerable<int> values)
{
var res = (sum: 0, count: 0); // infer tuple type from names and values
foreach (var value in values) { res.sum += value; res.count++; }
return res;
}
Which syntax you use would depend on whether the context provides a target type.
Tuple deconstruction
Since the grouping represented by tuples is most often "accidental", the consumer of a tuple is likely not to want to even think of the tuple as a "thing". Instead they want to immediately get at the components of it. Just like you don't first bundle up the arguments to a method into an object and then send the bundle off, you wouldn't want to first receive a bundle of values back from a call and then pick out the pieces.
Languages with tuple features typically use a deconstruction syntax to receive and "split out" a tuple in one fell swoop:
(var sum, var count) = Tally(myValues); // deconstruct result
Console.WriteLine($"Sum: {sum}, count: {count}");
This way there's no evidence in the code that a tuple ever existed.
Details
That's the general gist of the proposal. Here are a ton of details to think through in the design process.
Struct or class
As mentioned, I propose to make tuple types structs rather than classes, so that no allocation penalty is associated with them. They should be as lightweight as possible.
Arguably, structs can end up being more costly, because assignment copies a bigger value. So if they are assigned a lot more than they are created, then structs would be a bad choice.
In their very motivation, though, tuples are ephemeral. You would use them when the parts are more important than the whole. So the common pattern would be to construct, return and immediately deconstruct them. In this situation structs are clearly preferable.
Structs also have a number of other benefits, which will become obvious in the following.
Mutability
Should tuples be mutable or immutable? The nice thing about them being structs is that the user can choose. If a reference to the tuple is readonly then the tuple is readonly.
Now a local variable cannot be readonly, unless we adopt #115 (which is likely), but that isn't too big of a deal, because locals are only used locally, and so it is easier to stick to an immutable discipline if you so choose.
If tuples are used as fields, then those fields can be readonly if desired.
Value semantics
Structs have built-in value semantics: Equals
and GetHashCode
are automatically implemented in terms of the struct's fields. This isn't always very efficiently implemented, so we should make sure that the compiler-generated struct does this efficiently where the runtime doesn't.
Tuples as fields
While multiple results may be the most common usage, you can certainly imagine tuples showing up as part of the state of objects. A particular common case might be where generics is involved, and you want to pass a compound of values for one of the type parameters. Think dictionaries with multiple keys and/or multiple values, etc.
Care needs to be taken with mutable structs in the heap: if multiple threads can mutate, tearing can happen.
Conversions
On top of the member-wise conversions implied by target typing, we can certainly allow implicit conversions between tuple types themselves.
Specifically, covariance seems straightforward, because the tuples are value types: As long as each member of the assigned tuple is assignable to the type of the corresponding member of the receiving tuple, things should be good.
You could imagine going a step further, and allowing pointwise conversions between tuples regardless of the member names, as long as the arity and types line up. If you want to "reinterpret" a tuple, why shouldn't you be allowed to? Essentially the view would be that assignment from tuple to tuple is just memberwise assignment by position.
(double sum, long count) weaken = Tally(...); // why not?
(int s, int c) rename = Tally(...) // why not?
Unification across assemblies
One big question is whether tuple types should unify across assemblies. Currently, compiler generated types don't. As a matter of fact, anonymous types are deliberately kept assembly-local by limitations in the language, such as the fact that there's no type syntax for them!
It might seem obvious that there should be unification of tuple types across assemblies - i.e. that (int sum, int count)
is the same type when it occurs in assembly A and assembly B. However, given that structs aren't expected to be passed around much, you can certainly imagine them still being useful without that.
Even so, it would probably come as a surprise to developers if there was no interoperability between tuples across assembly boundaries. This may range from having implicit conversions between them, supported by the compiler, to having a true unification supported by the runtime, or implemented with very clever tricks. Such tricks might lead to a less straightforward layout in metadata (such as carrying the tuple member names in separate attributes instead of as actual member names on the generated struct).
This needs further investigation. What would it take to implement tuple unification? Is it worth the price? Are tuples worth doing without it?
Deconstruction and declaration
There's a design issue around whether deconstruction syntax is only for declaring new variables for tuple components, or whether it can be used with existing variables:
(var sum, var count) = Tally(myValues); // deconstruct into fresh variables
(sum, count) = Tally(otherValues); // deconstruct into existing variables?
In other words is the form (_, _, _) = e;
a declaration statement, an assignment expression, or something in between?
This discussion intersects meaningfully with #254, declaration expressions.
Relationship with anonymous types
Since tuples would be compiler generated types just like anonymous types are today, it's useful to consider rationalizing the two with each other as much as possible. With tuples being structs and anonymous types being classes, they won't completely unify, but they could be very similar. Specifically, anonymous types could pick up these properties from tuples:
- There could be a syntax to denote the types! E.g.
{ string Name, int Age}
. If so, we'd need to also figure out the cross-assembly story for them. - There could be deconstruction syntax for them.
Optional enhancements
Once in the language, there are additional conveniences that you can imagine adding for tuples.
Tuple members in scope in method body
One (the only?) nice aspect of out parameters is that no returning is needed from the method body - they are just assigned to. For the case where a tuple type occurs as a return type of a method you could imagine a similar shortcut:
public (int sum, int count) Tally(IEnumerable<int> values)
{
sum = 0; count = 0;
foreach (var value in values) { sum += value; count++; }
}
Just like parameters, the names of the tuple are in scope in the method body, and just like out parameters, the only requirement is that they be definitely assigned at the end of the method.
This is taking the parameter-result analogy one step further. However, it would special-case the tuples-for-multiple-returns scenario over other tuple scenarios, and it would also preclude seeing in one place what gets returned.
Splatting
If a method expects n arguments, we could allow a suitable n-tuple to be passed to it. Just like with params arrays, we would first check if there's a method that takes the tuple directly, and otherwise we would try again with the tuple's members as individual arguments:
public double Avg(int sum, int count) => count==0 ? 0 : sum/count;
Console.WriteLine($"Avg: {Avg(Tally(myValues))}");
Here, Tally
returns a tuple of type (int sum, int count)
that gets splatted to the two arguments to Avg
.
Conversely, if a method expects a tuple we could allow it to be called with individual arguments, having the compiler automatically assemble them to a tuple, provided that no overload was applicable to the individual arguments.
I doubt that a method would commonly be declared directly to just take a tuple. But it may be a method on a generic type that gets instantiated with a tuple type:
var list = List<(string name, int age)>();
list.Add("John Doe", 66); // "unsplatting" to a tuple
There are probably a lot of details to figure out with the splatting and unsplatting rules.