Description
Author background
- Would you consider yourself a novice, intermediate, or experienced Go programmer? Experienced.
- What other languages do you have experience with? C, Java, Rust, Python, MATLAB, Haskell, TypeScript, various flavors of assembly
Related proposals
- Has this idea, or one like it, been proposed before? I previously proposed a less refined form as an invested April fools' joke at proposal: Go 2: sigma types #52096; this is intended to be a serious proposal. It resembles sum types (proposal: spec: add sum types / discriminated unions #19412) or typed enums (proposal: spec: add typed enum support #19814, proposal: spec: enum type (revisited) #28438, proposal: spec: enums as an extension to types #28987) in the capability of the type system, although it is more general.
- If so, how does this proposal differ? The types proposed are more general than sum types, which are in turn more general than typed enums. In particular, the mechanisms in this proposal would greatly simplify certain scenarios encountered in marshaling and unmarshaling when compared to sum types.
Proposal
Introduce a restricted version of sigma types into Go. Formally, sigma types are pairs (a: T, b: B(a))
where a
is a value of type T
and B
is a function that constructs the type of b
from the value of a
. Less formally, they can be considered tagged unions where the tag is an accessible value of a type the programmer can select.
Sigma types are a central feature of dependent type systems, which make it possible to statically prove many facts about the safety of programs. Such systems allow type metaprogramming, which allows the type constructor B
to depend on variable values of a
. Because Go does not have metaprogramming, I propose the restriction that a
must be one of a statically enumerated set of values. I believe that sigma types will be useful despite that restriction.
Definitions
Preferring a name more familiar to programmers than logicians, I propose to name these types union types. Formally, there are two categories of union types, depending on the form of T
in the definition (a: T, b: B(a))
. The first category is value-discriminated. A value-discriminated union type has the following:
- A discriminator type, which is any type that can represent constants.
- A discriminator set, which is a fixed set of constant values of the same type (i.e. not untyped) as the discriminator type.
- A fixed set of dependent types.
- A fixed surjective mapping from the discriminator set to the dependent types.
A value of a value-discriminated union type U
has:
- A discriminator, which is a variable of
U
's discriminator type. The discriminator is one of the elements ofU
's discriminator set. - A dependent value, which is a variable of the type to which
U
maps the discriminator.
The second category of union types are type-discriminated. A type-discriminated union type has the following:
- A discriminator set, which is a fixed set of types, plus a nil case. That is, the nil case is not a type, but it always appears in the discriminator set.
- Dependent types, which is a fixed set of types.
- A fixed surjective mapping from the discriminator set to the dependent types.
A value of a type-discriminated union type W
has:
- A discriminator, which is a type in
W
's discriminator set. - A dependent value, which is a variable of the type to which
W
maps the discriminator.
The distinction between value-discriminated and type-discriminated union types is that T
in (a: T, b: B(a))
is a type in the former and a universe of types in the latter. This distinction is necessary because types are not terms in Go.
Syntax
Writing union types
The syntax I propose is as follows. We add a new kind of type literal with the following EBNF:
UnionType = "switch" (Type [Tag] | "type") "{" { UnionTypeElem ";" } "}" .
UnionTypeElem = UnionTypeCase ":" Type [Tag] .
UnionTypeCase = "case" ExpressionList | "default" .
For a value-discriminated union type, the Type
following switch
specifies the discriminator type. Each case
gives a list of values and the dependent type to which those values map. The values in the case
clauses must be constants representable by the discriminator type. These form the discriminator set. The default
clause specifies the dependent type when the discriminator is the zero value of the discriminator type. It is an error to specify the same value or type multiple times, including to both specify the zero value and to include a default
clause. It is also an error to neither specify the zero value nor to include a default
clause. That is, the zero value must appear exactly once in the discriminator set.
If the keyword type
rather than a type name follows switch
, then the union type is instead type-discriminated. Each case
gives a list of types and the dependent types to which those values map. These form the discriminator set. The default
clause specifies the dependent type for the nil case; if it is omitted, then the dependent type is struct{}
. It is an error to specify the same type multiple times.
Union types may have tags on dependent types accessible through reflection, similar to tags on struct fields. Value-discriminated union types may additionally have a tag on the discriminator type.
An additional shorthand syntax can be used to specify type-discriminated union types where the discriminator types are each identical to their dependent types.
ShortUnionType = Type { "|" Type }
With this proposal, the syntax T1 | T2 | ... | Tn
where a type is expected becomes a type-discriminated union type, shorthand for:
switch type {
case T1: T1
case T2: T2
...
case Tn: Tn
}
Union type literals are composite literals containing zero or one keyed elements. If a keyed element appears, the key must be a constant or type in the discriminator set of the union type, or literal nil
to indicate the nil case of a type-discriminated union type, and the element must be a value of the type to which the union type maps the key.
Inspecting
In order to inspect the dynamic state of a union-typed value, we use syntax analogous to the same operations on interface types. These come in three flavors. First, we can use union assertions, a primary expression:
UnionAssertion = "." "(" Expression ")" .
The expression in a UnionAssertion production must be a constant or type in the discriminator set of the union type, or literal nil
to indicate the nil case of a type-discriminated union type. Like type assertions on interfaces, union assertions can be used in contexts expecting one or two values. When there is only one result, the assertion panics if the discriminator is not equal or identical to the expression. When there are two, the first is the dependent value if the discriminator is equal or identical and the zero value of the dependent type otherwise, and the second is a bool indicating whether the assertion succeeded.
Second, we can use switch statements that mirror type switches, called discriminator switches:
DiscrimSwitchStmt = "switch" [ SimpleStmt ";" ] DiscrimSwitchGuard "{" { DiscrimCaseClause } "}" .
DiscrimSwitchGuard = [ identifier ":=" ] PrimaryExpr "." "(" "switch" ")" .
DiscrimCaseClause = DiscrimSwitchCase ":" StatementList .
DiscrimSwitchCase = "case" ExpressionList | "default" .
The expressions in the case clauses in a discriminator switch must each be in the discriminator set of the union type of the primary expression of the guard. Analogous to type switches on interfaces, if an assignment appears in the guard and all expressions in the case which matches the discriminator appear in the same case of the union type definition, then the assigned variable has the type to which that case maps; otherwise, it has the union type.
Lastly, specific to value-discriminated union types, the special form x.(switch)
evaluates to the current value of the discriminator. Note that this is the same syntax as the guard of a discriminator switch. Whenever this syntax appears in a switch guard, the switch statement is a discriminator switch having the semantic requirement that the case values must be members of the discriminator set. If the programmer wishes to relax this requirement, they may wrap the entire switch guard in parentheses.
Syntactic ambiguity
A minor syntactic ambiguity arises with this definition. This occurs where a union type literal without a type name is written in a statement context. Because Go requires that every literal must be used, the only otherwise valid use of a union type literal in statement context is, ostensibly, to call a method or perform a channel send on a value of one of the dependent types:
func anime() {
switch string {
case "anime": int
default: error
}{}.("").Error()
}
Considering the awkwardness and uselessness of this expression, it is unlikely that it would ever arise in practice. Therefore, the parser assumes a switch statement when encountering the switch
keyword in statement context. If it is necessary to write such an expression, the workaround is to wrap at least the type in parentheses.
Properties
The zero value of a value-discriminated union type has the zero value of the discriminator type as its discriminator and the zero value of the dependent type as its dependent value. The zero value of a type-discriminated union type has nil
as its discriminator and the zero value of the dependent type of the nil case as its dependent value. Note that a type-discriminated union type may have a nil discriminator and a nonzero dependent value.
As an additional property, operations which can be used with all dependent types of a union type can additionally be used with the union type itself, mapping dynamically to the corresponding operation on the dependent value. When the operation on the dependent value is not a conversion but creates a result of the same type, the use of the operation on the union type produces a new value of the union type with the same discriminator and the operation result as the dependent value. The definition of "operation" here is as for constraints in generic functions. (This applicative property serves to mirror the semantics of type elements in generic functions.)
Two value-discriminated union types are identical if their discriminator sets are equal and both map each discriminator value to identical dependent types. Two type-discriminated union types are identical if every type in each is identical to one type in the other and both map each discriminator to identical types.
Neither the discriminator nor the dependent value of a union type is addressable.
Union types are comparable when each dependent type is comparable (all types which can represent constants are comparable, so the discriminator of a value-discriminated union type is as well) and are equal when their discriminators are equal or identical or both nil and their dependent values are equal.
The alignment requirement of a union type is the maximum of the alignments of all dependent types and the discriminator type, with an implementation-defined minimum alignment for type-discriminated union types. No guarantees are made about the maximum size or layout of a union type; a compiler may choose to put the discriminator first, last, in the middle, or spread across bits of the dependent types' storage, and it may or may not choose to rearrange or share storage for different dependent types.
Standard library
Updates to package reflect are extensive. Kind needs a new Union value. A new type DependentType has the following definition:
type DependentType {
// Discriminator is the case which maps to this dependent type. If the
// union is type-discriminated, then Discriminator is a Type or nil.
Discriminator any
// Type is the dependent type.
Type Type
// Tag is the tag string on the dependent type.
Tag StructTag
}
On Type, a method NumDependent returns the number of discriminators on the type, and Dependent(int) returns a DependentType according to an implementation-defined enumeration. For value-discriminated union types, a new Discriminator method returns the type of the discriminator, and a DiscriminatorTag method returns the tag of the discriminator. Lastly, on Value, a new Discriminator method returns the current discriminator as a Value | Type
, and the existing Elem method returns the dependent value.
Packages go/ast and go/types will need new types to describe union types.
Examples
Sum types
Before
type Sum interface {
isSum()
}
type AlphaVariant struct{}
func (AlphaVariant) isSum() {}
var _ Sum = AlphaVariant{}
type BetaVariant struct {
err error
}
func (*BetaVariant) isSum() {}
var _ Sum = (*BetaVariant)(nil)
func Make(bad bool) Sum {
if bad {
return &BetaVariant{errors.New("beta")}
}
return AlphaVariant{}
}
After
type Sum switch bool {
case false: struct{}
case true: error
}
func Make(bad bool) Sum {
if bad {
return Sum{true: errors.New("beta")}
}
return Sum{}
}
Sum type with type parameters and methods
Before
type Maybe[T any] struct {
just T
ok bool
}
func (m Maybe[T]) Just() (T, bool) {
return m.just, m.ok
}
func (m Maybe[T]) String() string {
if just, ok := m.Just(); ok {
return fmt.Sprintf("Just(%v)", just)
}
return "Nothing"
}
After
const Just = true
type Maybe[T any] switch bool {
case Just: T
default: struct{}
}
func (m Maybe[T]) String() string {
switch t := m.(switch) {
case Just:
return fmt.Sprintf("Just(%v)", t)
default:
return "Nothing"
}
}
Type element syntax
Before
type Bytesy interface {
string | []byte
}
var b Bytesy // ERROR: cannot use type Bytesy outside a type constraint: interface contains type constraints
After
type Bytesy string | []byte
var b Bytesy // OK: Bytesy is a type-discriminated union type with discriminators string, []byte, and nil
Unmarshaling JSON with a variant record
Before
type variantJSONResponse struct {
Type string `json:"type"`
Data json.RawMessage `json:"data"`
}
type Response interface {
isResponse()
}
type AlphaResponse string
type BetaResponse int
func (AlphaResponse) isResponse()
func (BetaResponse) isResponse()
func Decode() (Response, error) {
var v variantJSONResponse
err := json.Unmarshal(data, &v)
if err != nil {
return nil, err
}
var r Response
switch v.Type {
case "alpha":
r = new(AlphaResponse)
case "beta":
r = new(BetaResponse)
default:
return nil, fmt.Errorf("unknown type %q", v.Type)
}
err = json.Unmarshal(v.Data, r)
if err != nil {
return nil, err
}
return r, nil
}
func Use() {
r, err := Decode()
if err != nil {
panic(err)
}
switch r := r.(type) {
case AlphaResponse: // Needs to be *AlphaResponse, but this can't be detected.
doStringStuff(string(r))
case BetaResponse: // Needs to be *BetaResponse.
doIntStuff(int(r))
default:
panic("forgot a type")
}
}
After
type Response switch string `json:"type,omitempty"` {
case "alpha": string `json:"data"`
case "beta": int `json:"data"`
// We can express a JSON error field that exists if and only if the data
// field is absent.
default: string `json:"error"`
}
func Decode() (Response, error) {
var r Response
err := json.Unmarshal(data, &r) // Assuming encoding/json support for union types.
if err != nil {
return Response{}, err
}
return r, nil
}
func Use() {
r, err := Decode()
if err != nil {
// Note that this can include the case where the JSON specifies a
// variant not in the union type.
panic(err)
}
switch r := Decode().(switch) {
case "alpha":
doStringStuff(r)
case "beta":
doIntStuff(r)
// case "anime": // Invalid because "anime" is not in the discriminator set.
default:
panic(errors.New(r.("")))
}
}
Additional info
- Who does this proposal help, and why? This proposal allows implementing the full behavior of typed enums and discriminated unions, so all of the uses described in proposal: spec: add sum types / discriminated unions #19412 and proposal: spec: add typed enum support #19814 are addressed (except uses depending on exhaustive checking of cases). It expands upon that with the flexibility to describe types varying with arbitrary values, notably strings, which arises from time to time in some JSON APIs. By repurposing the
T1 | T2
syntax, it also takes a large step toward unifying type constraints with the rest of Go's type system. - Please also describe the change informally, as in a class teaching Go. This proposal adds the ability to write types similar to Rust enums that can have values of several different types, but with the added advantage that the dynamic type can be controlled by an arbitrary value that can be any constant.
- Is this change backward compatible? Insofar as changes to the type system can be, yes. A compiler for a version of Go that implements the proposed changes will also compile code written for versions that do not, with no changes in semantics. Other tools will need updates to support the changes to the AST and type system.
- Orthogonality: how does this change interact or overlap with existing features? While it is technically possible to approximate sum types using sealed interfaces, doing so requires a large amount of code to define new types and methods for each variant, the interface zero value of nil is always a variant, and it is impossible to discover the set of variants dynamically. While this proposal addresses all of these problems, the last is, in my opinion, the most significant: reflection-based unmarshaling algorithms cannot decode interface-typed variables in general. Union types in particular would allow reflection to decode a particular variant based on the value of another field, which is helpful for certain APIs which return JSON like
{madoka: "homura", kyuubei: 9}
or{error: "anime"}
from a single endpoint. This is in addition to the benefits of sum types, such as making applications like parsers much simpler to describe in types. - Is the goal of this change a performance improvement? No.
Costs
- Would this change make Go easier or harder to learn, and why? Harder. I don't think dependent types are a particularly easy concept to learn. Typescript has the ability to encode dependent types through conditional types, so advanced users of that language may find the idea of union types accessible. Once users learn them, they may find certain problems easier and more natural to express in safe code, lowering the cost of learning different approaches like sealed interfaces or custom unmarshaling for those situations.
- What is the cost of this proposal? (Every language change has a cost). The main cost is in the addition to the language, which increases the burden associated with learning Go. Supporting union types would also add a significant amount of code to the compiler, runtime, and any external tools which process Go code, as well as any code which uses reflect to handle all possible types.
- How many tools (such as vet, gopls, gofmt, goimports, etc.) would be affected? Yes.
- What is the compile time cost? I do not expect that adding a new type kind would dramatically increase compilation costs for any programs, although there may be some small increase in compilation time and object sizes. The cost would arise in the amount of compiler code required to support union types, changes to export and object formats to represent them, &c.
- What is the run time cost? The existence of union types should not substantially impact the execution of most programs, except those which use reflection to handle types broadly, since the proposed implementation of reflection of union types may be expensive. The runtime itself will need substantial updates to account for the new type kind.
Exclusions
This proposal intentionally excludes several related features, especially pattern matching, underlying type terms (~T
), and exhaustive switches. I will explain my rationale for these exclusions.
Pattern matching is orthogonal. It would hypothetically be possible to propose pattern matching which works with the existing types in Go, especially interface types. If we decide to accept union types, and pattern matching is a thing we want specifically for them, it is straightforward to propose it separately.
Underlying type terms are orthogonal. One can envision a proposal to add underlying type terms as proper types, e.g. such that every type which has T
as its underlying type is in subtype relationship with ~T
, allowing a "shortcut" for functions which are generic over a single type per formal parameter. Again, it is straightforward to propose them specifically for union types separately.
Exhaustive switches on sum types have been discussed in #41716, and the conclusion at the time was that it would be a mistake to include them. The rationale there is based on that proposal allowing underlying type terms, which is not proposed here, so perhaps it would be reasonable to include them. However, adding one creates a strong reason not to add the other, and I feel that syntactic unification with type terms of general interfaces is a better goal.