Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: unions as sigma types #54685

Open
zephyrtronium opened this issue Aug 26, 2022 · 52 comments
Open

proposal: spec: unions as sigma types #54685

zephyrtronium opened this issue Aug 26, 2022 · 52 comments
Labels
LanguageChange Suggested changes to the Go language LanguageChangeReview Discussed by language change review committee Proposal
Milestone

Comments

@zephyrtronium
Copy link
Contributor

zephyrtronium commented Aug 26, 2022

Author background

  • Would you consider yourself a novice, intermediate, or experienced Go programmer? Experienced.
  • What other languages do you have experience with? C, Java, Rust, Python, MATLAB, Haskell, TypeScript, various flavors of assembly

Related proposals

Proposal

Introduce a restricted version of sigma types into Go. Formally, sigma types are pairs (a: T, b: B(a)) where a is a value of type T and B is a function that constructs the type of b from the value of a. Less formally, they can be considered tagged unions where the tag is an accessible value of a type the programmer can select.

Sigma types are a central feature of dependent type systems, which make it possible to statically prove many facts about the safety of programs. Such systems allow type metaprogramming, which allows the type constructor B to depend on variable values of a. Because Go does not have metaprogramming, I propose the restriction that a must be one of a statically enumerated set of values. I believe that sigma types will be useful despite that restriction.

Definitions

Preferring a name more familiar to programmers than logicians, I propose to name these types union types. Formally, there are two categories of union types, depending on the form of T in the definition (a: T, b: B(a)). The first category is value-discriminated. A value-discriminated union type has the following:

  • A discriminator type, which is any type that can represent constants.
  • A discriminator set, which is a fixed set of constant values of the same type (i.e. not untyped) as the discriminator type.
  • A fixed set of dependent types.
  • A fixed surjective mapping from the discriminator set to the dependent types.

A value of a value-discriminated union type U has:

  • A discriminator, which is a variable of U's discriminator type. The discriminator is one of the elements of U's discriminator set.
  • A dependent value, which is a variable of the type to which U maps the discriminator.

The second category of union types are type-discriminated. A type-discriminated union type has the following:

  • A discriminator set, which is a fixed set of types, plus a nil case. That is, the nil case is not a type, but it always appears in the discriminator set.
  • Dependent types, which is a fixed set of types.
  • A fixed surjective mapping from the discriminator set to the dependent types.

A value of a type-discriminated union type W has:

  • A discriminator, which is a type in W's discriminator set.
  • A dependent value, which is a variable of the type to which W maps the discriminator.

The distinction between value-discriminated and type-discriminated union types is that T in (a: T, b: B(a)) is a type in the former and a universe of types in the latter. This distinction is necessary because types are not terms in Go.

Syntax

Writing union types

The syntax I propose is as follows. We add a new kind of type literal with the following EBNF:

UnionType     = "switch" (Type [Tag] | "type") "{" { UnionTypeElem ";" } "}" .
UnionTypeElem = UnionTypeCase ":" Type [Tag] .
UnionTypeCase = "case" ExpressionList | "default" .

For a value-discriminated union type, the Type following switch specifies the discriminator type. Each case gives a list of values and the dependent type to which those values map. The values in the case clauses must be constants representable by the discriminator type. These form the discriminator set. The default clause specifies the dependent type when the discriminator is the zero value of the discriminator type. It is an error to specify the same value or type multiple times, including to both specify the zero value and to include a default clause. It is also an error to neither specify the zero value nor to include a default clause. That is, the zero value must appear exactly once in the discriminator set.

If the keyword type rather than a type name follows switch, then the union type is instead type-discriminated. Each case gives a list of types and the dependent types to which those values map. These form the discriminator set. The default clause specifies the dependent type for the nil case; if it is omitted, then the dependent type is struct{}. It is an error to specify the same type multiple times.

Union types may have tags on dependent types accessible through reflection, similar to tags on struct fields. Value-discriminated union types may additionally have a tag on the discriminator type.

An additional shorthand syntax can be used to specify type-discriminated union types where the discriminator types are each identical to their dependent types.

ShortUnionType = Type { "|" Type }

With this proposal, the syntax T1 | T2 | ... | Tn where a type is expected becomes a type-discriminated union type, shorthand for:

switch type {
	case T1: T1
	case T2: T2
	...
	case Tn: Tn
}

Union type literals are composite literals containing zero or one keyed elements. If a keyed element appears, the key must be a constant or type in the discriminator set of the union type, or literal nil to indicate the nil case of a type-discriminated union type, and the element must be a value of the type to which the union type maps the key.

Inspecting

In order to inspect the dynamic state of a union-typed value, we use syntax analogous to the same operations on interface types. These come in three flavors. First, we can use union assertions, a primary expression:

UnionAssertion = "." "(" Expression ")" .

The expression in a UnionAssertion production must be a constant or type in the discriminator set of the union type, or literal nil to indicate the nil case of a type-discriminated union type. Like type assertions on interfaces, union assertions can be used in contexts expecting one or two values. When there is only one result, the assertion panics if the discriminator is not equal or identical to the expression. When there are two, the first is the dependent value if the discriminator is equal or identical and the zero value of the dependent type otherwise, and the second is a bool indicating whether the assertion succeeded.

Second, we can use switch statements that mirror type switches, called discriminator switches:

DiscrimSwitchStmt  = "switch" [ SimpleStmt ";" ] DiscrimSwitchGuard "{" { DiscrimCaseClause } "}" .
DiscrimSwitchGuard = [ identifier ":=" ] PrimaryExpr "." "(" "switch" ")" .
DiscrimCaseClause  = DiscrimSwitchCase ":" StatementList .
DiscrimSwitchCase  = "case" ExpressionList | "default" .

The expressions in the case clauses in a discriminator switch must each be in the discriminator set of the union type of the primary expression of the guard. Analogous to type switches on interfaces, if an assignment appears in the guard and all expressions in the case which matches the discriminator appear in the same case of the union type definition, then the assigned variable has the type to which that case maps; otherwise, it has the union type.

Lastly, specific to value-discriminated union types, the special form x.(switch) evaluates to the current value of the discriminator. Note that this is the same syntax as the guard of a discriminator switch. Whenever this syntax appears in a switch guard, the switch statement is a discriminator switch having the semantic requirement that the case values must be members of the discriminator set. If the programmer wishes to relax this requirement, they may wrap the entire switch guard in parentheses.

Syntactic ambiguity

A minor syntactic ambiguity arises with this definition. This occurs where a union type literal without a type name is written in a statement context. Because Go requires that every literal must be used, the only otherwise valid use of a union type literal in statement context is, ostensibly, to call a method or perform a channel send on a value of one of the dependent types:

func anime() {
	switch string {
		case "anime": int
		default:      error
	}{}.("").Error()
}

Considering the awkwardness and uselessness of this expression, it is unlikely that it would ever arise in practice. Therefore, the parser assumes a switch statement when encountering the switch keyword in statement context. If it is necessary to write such an expression, the workaround is to wrap at least the type in parentheses.

Properties

The zero value of a value-discriminated union type has the zero value of the discriminator type as its discriminator and the zero value of the dependent type as its dependent value. The zero value of a type-discriminated union type has nil as its discriminator and the zero value of the dependent type of the nil case as its dependent value. Note that a type-discriminated union type may have a nil discriminator and a nonzero dependent value.

As an additional property, operations which can be used with all dependent types of a union type can additionally be used with the union type itself, mapping dynamically to the corresponding operation on the dependent value. When the operation on the dependent value is not a conversion but creates a result of the same type, the use of the operation on the union type produces a new value of the union type with the same discriminator and the operation result as the dependent value. The definition of "operation" here is as for constraints in generic functions. (This applicative property serves to mirror the semantics of type elements in generic functions.)

Two value-discriminated union types are identical if their discriminator sets are equal and both map each discriminator value to identical dependent types. Two type-discriminated union types are identical if every type in each is identical to one type in the other and both map each discriminator to identical types.

Neither the discriminator nor the dependent value of a union type is addressable.

Union types are comparable when each dependent type is comparable (all types which can represent constants are comparable, so the discriminator of a value-discriminated union type is as well) and are equal when their discriminators are equal or identical or both nil and their dependent values are equal.

The alignment requirement of a union type is the maximum of the alignments of all dependent types and the discriminator type, with an implementation-defined minimum alignment for type-discriminated union types. No guarantees are made about the maximum size or layout of a union type; a compiler may choose to put the discriminator first, last, in the middle, or spread across bits of the dependent types' storage, and it may or may not choose to rearrange or share storage for different dependent types.

Standard library

Updates to package reflect are extensive. Kind needs a new Union value. A new type DependentType has the following definition:

type DependentType {
	// Discriminator is the case which maps to this dependent type. If the
	// union is type-discriminated, then Discriminator is a Type or nil.
	Discriminator any
	// Type is the dependent type.
	Type Type
	// Tag is the tag string on the dependent type.
	Tag StructTag
}

On Type, a method NumDependent returns the number of discriminators on the type, and Dependent(int) returns a DependentType according to an implementation-defined enumeration. For value-discriminated union types, a new Discriminator method returns the type of the discriminator, and a DiscriminatorTag method returns the tag of the discriminator. Lastly, on Value, a new Discriminator method returns the current discriminator as a Value | Type, and the existing Elem method returns the dependent value.

Packages go/ast and go/types will need new types to describe union types.

Examples

Sum types

Before

type Sum interface {
	isSum()
}

type AlphaVariant struct{}

func (AlphaVariant) isSum() {}

var _ Sum = AlphaVariant{}

type BetaVariant struct {
	err error
}

func (*BetaVariant) isSum() {}

var _ Sum = (*BetaVariant)(nil)

func Make(bad bool) Sum {
	if bad {
		return &BetaVariant{errors.New("beta")}
	}
	return AlphaVariant{}
}

After

type Sum switch bool {
	case false: struct{}
	case true:  error
}

func Make(bad bool) Sum {
	if bad {
		return Sum{true: errors.New("beta")}
	}
	return Sum{}
}
Sum type with type parameters and methods

Before

type Maybe[T any] struct {
	just T
	ok bool
}

func (m Maybe[T]) Just() (T, bool) {
	return m.just, m.ok
}

func (m Maybe[T]) String() string {
	if just, ok := m.Just(); ok {
		return fmt.Sprintf("Just(%v)", just)
	}
	return "Nothing"
}

After

const Just = true

type Maybe[T any] switch bool {
	case Just: T
	default:   struct{}
}

func (m Maybe[T]) String() string {
	switch t := m.(switch) {
	case Just:
		return fmt.Sprintf("Just(%v)", t)
	default:
		return "Nothing"
	}
}
Type element syntax

Before

type Bytesy interface {
	string | []byte
}

var b Bytesy // ERROR: cannot use type Bytesy outside a type constraint: interface contains type constraints

After

type Bytesy string | []byte

var b Bytesy // OK: Bytesy is a type-discriminated union type with discriminators string, []byte, and nil
Unmarshaling JSON with a variant record

Before

type variantJSONResponse struct {
	Type string          `json:"type"`
	Data json.RawMessage `json:"data"`
}

type Response interface {
	isResponse()
}

type AlphaResponse string
type BetaResponse int

func (AlphaResponse) isResponse()
func (BetaResponse) isResponse()

func Decode() (Response, error) {
	var v variantJSONResponse
	err := json.Unmarshal(data, &v)
	if err != nil {
		return nil, err
	}
	var r Response
	switch v.Type {
	case "alpha":
		r = new(AlphaResponse)
	case "beta":
		r = new(BetaResponse)
	default:
		return nil, fmt.Errorf("unknown type %q", v.Type)
	}
	err = json.Unmarshal(v.Data, r)
	if err != nil {
		return nil, err
	}
	return r, nil
}

func Use() {
	r, err := Decode()
	if err != nil {
		panic(err)
	}
	switch r := r.(type) {
	case AlphaResponse: // Needs to be *AlphaResponse, but this can't be detected.
		doStringStuff(string(r))
	case BetaResponse: // Needs to be *BetaResponse.
		doIntStuff(int(r))
	default:
		panic("forgot a type")
	}
}

After

type Response switch string `json:"type,omitempty"` {
	case "alpha": string `json:"data"`
	case "beta":  int    `json:"data"`
	// We can express a JSON error field that exists if and only if the data
	// field is absent.
	default: string `json:"error"`
}

func Decode() (Response, error) {
	var r Response
	err := json.Unmarshal(data, &r) // Assuming encoding/json support for union types.
	if err != nil {
		return Response{}, err
	}
	return r, nil
}

func Use() {
	r, err := Decode()
	if err != nil {
		// Note that this can include the case where the JSON specifies a
		// variant not in the union type.
		panic(err)
	}
	switch r := Decode().(switch) {
	case "alpha":
		doStringStuff(r)
	case "beta":
		doIntStuff(r)
	// case "anime": // Invalid because "anime" is not in the discriminator set.
	default:
		panic(errors.New(r.("")))
	}
}

Additional info

  • Who does this proposal help, and why? This proposal allows implementing the full behavior of typed enums and discriminated unions, so all of the uses described in proposal: spec: add sum types / discriminated unions #19412 and proposal: spec: add typed enum support #19814 are addressed (except uses depending on exhaustive checking of cases). It expands upon that with the flexibility to describe types varying with arbitrary values, notably strings, which arises from time to time in some JSON APIs. By repurposing the T1 | T2 syntax, it also takes a large step toward unifying type constraints with the rest of Go's type system.
  • Please also describe the change informally, as in a class teaching Go. This proposal adds the ability to write types similar to Rust enums that can have values of several different types, but with the added advantage that the dynamic type can be controlled by an arbitrary value that can be any constant.
  • Is this change backward compatible? Insofar as changes to the type system can be, yes. A compiler for a version of Go that implements the proposed changes will also compile code written for versions that do not, with no changes in semantics. Other tools will need updates to support the changes to the AST and type system.
  • Orthogonality: how does this change interact or overlap with existing features? While it is technically possible to approximate sum types using sealed interfaces, doing so requires a large amount of code to define new types and methods for each variant, the interface zero value of nil is always a variant, and it is impossible to discover the set of variants dynamically. While this proposal addresses all of these problems, the last is, in my opinion, the most significant: reflection-based unmarshaling algorithms cannot decode interface-typed variables in general. Union types in particular would allow reflection to decode a particular variant based on the value of another field, which is helpful for certain APIs which return JSON like {madoka: "homura", kyuubei: 9} or {error: "anime"} from a single endpoint. This is in addition to the benefits of sum types, such as making applications like parsers much simpler to describe in types.
  • Is the goal of this change a performance improvement? No.

Costs

  • Would this change make Go easier or harder to learn, and why? Harder. I don't think dependent types are a particularly easy concept to learn. Typescript has the ability to encode dependent types through conditional types, so advanced users of that language may find the idea of union types accessible. Once users learn them, they may find certain problems easier and more natural to express in safe code, lowering the cost of learning different approaches like sealed interfaces or custom unmarshaling for those situations.
  • What is the cost of this proposal? (Every language change has a cost). The main cost is in the addition to the language, which increases the burden associated with learning Go. Supporting union types would also add a significant amount of code to the compiler, runtime, and any external tools which process Go code, as well as any code which uses reflect to handle all possible types.
  • How many tools (such as vet, gopls, gofmt, goimports, etc.) would be affected? Yes.
  • What is the compile time cost? I do not expect that adding a new type kind would dramatically increase compilation costs for any programs, although there may be some small increase in compilation time and object sizes. The cost would arise in the amount of compiler code required to support union types, changes to export and object formats to represent them, &c.
  • What is the run time cost? The existence of union types should not substantially impact the execution of most programs, except those which use reflection to handle types broadly, since the proposed implementation of reflection of union types may be expensive. The runtime itself will need substantial updates to account for the new type kind.

Exclusions

This proposal intentionally excludes several related features, especially pattern matching, underlying type terms (~T), and exhaustive switches. I will explain my rationale for these exclusions.

Pattern matching is orthogonal. It would hypothetically be possible to propose pattern matching which works with the existing types in Go, especially interface types. If we decide to accept union types, and pattern matching is a thing we want specifically for them, it is straightforward to propose it separately.

Underlying type terms are orthogonal. One can envision a proposal to add underlying type terms as proper types, e.g. such that every type which has T as its underlying type is in subtype relationship with ~T, allowing a "shortcut" for functions which are generic over a single type per formal parameter. Again, it is straightforward to propose them specifically for union types separately.

Exhaustive switches on sum types have been discussed in #41716, and the conclusion at the time was that it would be a mistake to include them. The rationale there is based on that proposal allowing underlying type terms, which is not proposed here, so perhaps it would be reasonable to include them. However, adding one creates a strong reason not to add the other, and I feel that syntactic unification with type terms of general interfaces is a better goal.

@gopherbot gopherbot added this to the Proposal milestone Aug 26, 2022
@thediveo
Copy link

Preferring a name more familiar to programmers than logicians, I propose to name these types union types.

Shouldn't this be reflected in the issue title, such as "proposal: Go 2: sigma (union) types"?

@seankhliao seankhliao added LanguageChange Suggested changes to the Go language v2 An incompatible library change labels Aug 26, 2022
@zephyrtronium zephyrtronium changed the title proposal: Go 2: sigma types proposal: Go 2: unions as sigma types Aug 26, 2022
@beoran
Copy link

beoran commented Sep 5, 2022

I think that even with generics, Go is sorely lacking some kind of union types. This proposal seems one of the better ones I have seen so far.

@atdiar
Copy link

atdiar commented Sep 5, 2022

Is "union" the right name for this feature? It looks like a dependent type i.e. the type of a variable depends on its value.

If so, isn't it something that is more complex and the theory is still making progress/ in flux?

As opposed to making constraint interfaces usable as regular interfaces which is more akin to discriminated unions where the discriminant would be the type pointer.

It's interesting nonetheless. (and I will come back to it once I have more time to delve into it.)

@beoran
Copy link

beoran commented Sep 5, 2022

Especially for modeling JSON data a value discriminated union would be extremely useful as well. So i like that this proposal covers both cases.

@zephyrtronium
Copy link
Contributor Author

@atdiar

Is "union" the right name for this feature? It looks like a dependent type i.e. the type of a variable depends on its value.

If so, isn't it something that is more complex and the theory is still making progress/ in flux?

Correct. The types I am proposing are exactly sigma types, a major feature of dependent types, with the limitation that types can only vary with static terms. I don't know whether dependent types are substantially more "in flux" than classical types – I have less time than I'd like to read literature on type systems – but dependent types are at least less common.

That difference of frequency is why I am proposing to call these union types rather than sigma types, even though formally they are the latter. The number of people I can say "sigma types where the dependent term varies only with static terms" to without substantially more explanation is very small. On the other hand, I can say "tagged unions where the tag is an enum" and communicate fairly precisely what is going on to many more people, probably including all of those who would understand "sigma types." So, it seems pragmatic to choose a name which evokes unions.

@ianlancetaylor
Copy link
Contributor

Is it possible to access fields of a sigma type without using a switch? Does that question even make sense?

One of the main reasons that we rejected union types long ago was that there seemed to be too much overlap with interface types. An interface type seems fairly similar to the sigma types described here, except that the key must itself be a type, not a value. Does this add sufficient power to the language to justify including it?

@zephyrtronium
Copy link
Contributor Author

@ianlancetaylor

Is it possible to access fields of a sigma type without using a switch? Does that question even make sense?

switch and assertions are the only ways to access the dependent type of a sigma type. This is the intent: in order to use a variant of a sigma type, you must first verify that that is the variant that was originally assigned to it. Interfaces are similar in this respect.

(The proposal does currently describe an applicative property that would allow "operations" on sigma types when all dependent types support that operation, intended to mirror the semantics of type parameters. In retrospect, it is the wrong property, and it is hard to implement in a compiler, and it is out of scope for what I intend this proposal to be. I will retract that paragraph.)

One of the main reasons that we rejected union types long ago was that there seemed to be too much overlap with interface types. An interface type seems fairly similar to the sigma types described here, except that the key must itself be a type, not a value. Does this add sufficient power to the language to justify including it?

The formal expressive power that sigma types would add lies primarily in reflection. Having a fixed set of types enumerable through reflection is especially beneficial in dynamic marshaling and unmarshaling; the "Unmarshaling JSON with a variant record" example in the proposal shows how sigma types would enable decoding JSON efficiently in one shot where it currently requires substantial additional code using json.RawMessage or entirely custom unmarshaling. This is in addition to the ability to handle individual fields which are e.g. either string or number, which classical union types address. The proposal's value-discriminated union types also provide a way to express a choice between types without necessarily including nil, which was a sticking point in #19412 and #41716.

Logically, sigma types are entirely distinct from interfaces. Rather than an open type describing a set of polymorphic behaviors, sigma types as proposed describe a fixed relationship between a finite set of constants and corresponding types. It is possible to implement a finite set of types using interfaces with unexported methods, as long as you can define those unexported methods (particularly requiring the types to be in the same package). It is much more natural to me to describe a list of AST node types or a list of assembly instruction types as a list of types.

Sigma types have additional practical benefits over interfaces because they can be implemented without implicit boxing. If every (discriminator and dependent) type in a given sigma type is scalar, it is feasible for a compiler to treat the entire sigma type as scalar. There is no way to achieve this with interfaces, barring some optimization that changes the entire representation of interfaces with provably finite implementations. So, sigma types could be admitted even in performance-sensitive contexts where interfaces are usually avoided.

In addition to all of this, perhaps the most important aspect of sigma types is not the code it enables us to write, but the code it prevents. Since the compiler rejects switches and assertions not in the discriminator set, there is no risk of mistaking the dynamic type, especially where sigma types could replace any or where T implements an interface but a programmer can check for *T. Sigma types can prevent bugs, and this comes with all the other advantages of static types for the developer experience.

So, I feel that the addition is justifiable.

@beoran
Copy link

beoran commented Dec 9, 2022

@ianlancetaylor After ten years of experience with Go, it seems to me the decision made then to not include unions into Go because of interfaces was wrong.

While it is possible to use all sorts of workarounds using interfaces to get something similar to unions, these workarounds are annoying and require a lot of extra work and boilerplate. I end up needing to simulate union types in almost any program Go I write.

As @zephyrtronium says, json serialization is one point where this proposal would help a lot, but also for GUI applications, or anything where unions are traditionally used in other programming languages.

@merykitty
Copy link

Logically, sigma types are entirely distinct from interfaces. Rather than an open type describing a set of polymorphic behaviors, sigma types as proposed describe a fixed relationship between a finite set of constants and corresponding types.

Currently type parameter constraint allows union of types, it seems to be a natural evolution for me to allow this syntax to specify a usable interface type. Additionally, it allows the interface to naturally have common behaviours across all implementations with methods.

Sigma types have additional practical benefits over interfaces because they can be implemented without implicit boxing.

I don't see any reason why this would not be possible with simple sealed interface, if the set of implementation is known, the compiler can inline the implementations directly into the interface instance.

Additionally, sealed interface allows more flexible hierarchies with some branches being cut off and some branches open for implementation. For example, consider a function that takes a list-like object:

type ListLike[T any] interface {
    []T | List[T]
}

type List[T any] interface {
    // get, append, etc
}

@zephyrtronium
Copy link
Contributor Author

zephyrtronium commented Dec 21, 2022

@merykitty

Currently type parameter constraint allows union of types, it seems to be a natural evolution for me to allow this syntax to specify a usable interface type. Additionally, it allows the interface to naturally have common behaviours across all implementations with methods.

That was #41716, for the old type list syntax for constraint interfaces. I do not believe another proposal exists for using the current type union syntax for sum types, but it would have many of the same problems, and it would require defining what ~T means outside a constraint. That said, one of my goals for this proposal is to make it possible to one day rewrite the entire section describing general interfaces as something like, "An interface may embed a non-interface type, in which case only types assignable to that type are assignable to the interface type."

Sigma types have additional practical benefits over interfaces because they can be implemented without implicit boxing.

I don't see any reason why this would not be possible with simple sealed interface, if the set of implementation is known, the compiler can inline the implementations directly into the interface instance.

It is possible in principle, but it has substantial disadvantages. The compiler would need to run this check for every interface (although, to be fair, a check of "has an unexported method or a method with an unexported parameter type, or embeds such an interface" shouldn't be too expensive) and contain additional code paths to generate unboxed interfaces. Moreover, the runtime and any unsafe code aware of the layout of interfaces would need to be aware that sometimes interfaces are special, if the compiler has chosen to use the unboxed representation. That was already a bar that excluded changes to interfaces in the past, in #8405.

Additionally, sealed interface allows more flexible hierarchies with some branches being cut off and some branches open for implementation. For example, consider a function that takes a list-like object:

I don't understand this. If you add any method to List, your example does not compile: https://go.dev/play/p/AgCfvZIho_i. Neither of the interfaces are sealed. It is strictly less flexible than what union types would enable, since you could easily define a union type with []T and List[T] as dependent types.

@DeedleFake
Copy link

DeedleFake commented Dec 21, 2022

I don't understand this. If you add any method to List, your example does not compile: https://go.dev/play/p/AgCfvZIho_i. Neither of the interfaces are sealed. It is strictly less flexible than what union types would enable, since you could easily define a union type with []T and List[T] as dependent types.

They're saying that they want to be able to do that. They're essentially talking about using the type list syntax to turn interfaces into union types so that you could define an interface that could be used to accept any of a list of types. In their example, I assume it would work something like

type ListLike[T any] interface {
  []T | List[T] // This would become legal.
}

type List[T any] interface {
  Len() int
  At(i int) T
}

func At[T any](list ListLike[T], i int) T { // This would also become legal.
  // The compiler could now enforce only legal branches and go vet or something could check completeness.
  switch list := list.(type) {
  case []T:
    return list[i]
  case List[T]:
    return list.At(i)
  case int: // Compile-time error, presumably.
    return *new(T)
  }
}

@merykitty
Copy link

@zephyrtronium Thanks for your reply,

That was #41716, for the old type list syntax for constraint interfaces. I do not believe another proposal exists for using the current type union syntax for sum types, but it would have many of the same problems, and it would require defining what ~T means outside a constraint.

We don't need to unlock the whole syntax, it is entirely possible to just say that interface { A | B | ... } is a type (as opposed to being only a constraint) only if all of A, B, ... are also types. I don't see this having any difference in comparison with your proposal. In fact, an interface value is a discriminated union that contains a type and a pointer to the dependent value already, so creating an entirely new construct seems confusing to me.

It is possible in principle, but it has substantial disadvantages. The compiler would need to run this check for every interface (although, to be fair, a check of "has an unexported method or a method with an unexported parameter type, or embeds such an interface" shouldn't be too expensive) and contain additional code paths to generate unboxed interfaces.

The compiler just needs to do it for unions only, for non-sealed interfaces this optimisation may be added later. Adding another construct to the language also needs a huge amount of additional code paths so I don't find that reason so compelling.

I don't understand this. If you add any method to List, your example does not compile: https://go.dev/play/p/AgCfvZIho_i. Neither of the interfaces are sealed.

The decision to disallow behavioural interfaces in unions is not fundamental and can be lifted if the need arises (#49054). The union ListLike[T] is sealed in the mean that it can be implemented only by a []T or a List[T].

@zephyrtronium
Copy link
Contributor Author

@merykitty My mistake, I misunderstood a few parts of your previous comment.

We don't need to unlock the whole syntax, it is entirely possible to just say that interface { A | B | ... } is a type (as opposed to being only a constraint) only if all of A, B, ... are also types. I don't see this having any difference in comparison with your proposal.

That's true, and it is exactly the approach I'm taking with this proposal: suggest a meaning for A | B now and leave the question of ~A for later. One difference with my proposal is that I'm proposing a strictly more powerful mechanism, which includes types dependent on arbitrary constants or types rather than just sum types. Another difference is that I am explicitly trying to address the concerns raised in #19412 and #41716, especially with regard to the zero value, whereas your suggestion seems like it either does not address those concerns or is making a choice that people disagreed with in those issues.

In fact, an interface value is a discriminated union that contains a type and a pointer to the dependent value already, so creating an entirely new construct seems confusing to me.

I think that's a very narrow view of interfaces, or perhaps better described as an implementation detail. In terms of the type system, interface types have subtypes that can substitute for the interface. That's the point of interfaces, but it is exactly the opposite of what I expect a discriminated union to do. That's why I said above that "sigma types are entirely distinct from interfaces." I want an explicit mechanism to express the idea I want, rather than having to exploit the interaction with an orthogonal concept to get less than half the advantages of that idea.

The compiler just needs to do it for unions only, for non-sealed interfaces this optimisation may be added later. Adding another construct to the language also needs a huge amount of additional code paths so I don't find that reason so compelling.

There is a substantial difference between adding a new representation for an existing language element and adding a new language element, and I think the former is more subtle in projects as large as Go compilers. That's especially informed by my attempts to implement #45494 which deals with the representation of interfaces. I could be wrong, though.

In general, I think you are talking about a different proposal than this one, although that proposal hasn't been written yet. I think it would be helpful to have it written, so that we can explore its advantages and implications in a focused place. I would also support such a proposal if it includes the ability to enumerate the variants through reflection – what I'm proposing is more powerful, but what you describe still does most of what I want.

@ianlancetaylor
Copy link
Contributor

As discussed in the last few comments, it seems that we can get a similar effect by permitting ordinary interfaces to include a union of embedded types (basically, allow constraint types as ordinary types, perhaps excluding the ~T notation). That would have the significant advantage of making the language slightly more orthogonal. If we do that, then is there is a strong reason to also have this proposal? And is there a strong reason to have this proposal instead of doing that?

Note that such an interface could perhaps be implemented without boxing, without affecting how the language behaves. Though perhaps in some cases an implicit type switch would be required when assigning to any or some other basic interface type.

@mrg0lden
Copy link

mrg0lden commented Jan 5, 2023

@ianlancetaylor
I know this might not be the best place to discuss this nor is it well established to be discussed, but I got curious. (IDK where to ask about this TBH, if there's a better place, please let me know.) Regarding what's between the parenthesis "perhaps excluding the ~T notation", so if a type constraint includes a ~T, it can still function as intended when used as a constraint, but is the ~ (only the base type is permitted) part only excluded or the whole ~T (the base type is excluded as well). This also brings many questions, like what actual type constraints are useful as both generic constraints and union types.

@ianlancetaylor
Copy link
Contributor

@mrg0lden I'm sorry, I don't understand the question.

Using ~T is fine in a type constraint, but in a union type it would permit any type whose underlying type is T. That would mean that type switches wouldn't be useful, and it would be effectively impossible to pull the value out of the union type. So we could only permit ~T if we added additional language mechanisms, such as permitting ~T as a case in a type switch.

@zephyrtronium
Copy link
Contributor Author

@ianlancetaylor

As discussed in the last few comments, it seems that we can get a similar effect by permitting ordinary interfaces to include a union of embedded types (basically, allow constraint types as ordinary types, perhaps excluding the ~T notation). That would have the significant advantage of making the language slightly more orthogonal. If we do that, then is there is a strong reason to also have this proposal? And is there a strong reason to have this proposal instead of doing that?

As I mentioned in my previous comment, it's difficult to evaluate this proposal against just allowing values of non-basic interface types because there is no formal proposal for the latter. Per #19412 and #41716, that proposal would need to make the decision about whether nil is a value of such types, or else find a way to obviate the decision. This proposal does so by having union types explicitly describe their zero value.

Deferring the zero value question, there are practical patterns enabled by the dependence of a type on constants. The union types I propose are roughly equivalent to a pattern I've seen in TypeScript whereby one combines literal types (e.g. false or 0 or '' as a type rather than as a value) with unions to restrict the types of other properties of an object depending on which arm of the union has a matching literal. The fact that I've seen this in TypeScript implies to me that we would see it in Go as well if it were possible. (I think narrowing by the discriminator would even work better with this proposal than it does in TS.) Whether that's "a strong reason to have this proposal" is subjective, but I like being able to reject incorrect programs without having to run them.

@DeedleFake
Copy link

This comment was originally about how I don't like that using interfaces with type lists as unions would require new top-level types in order to name the options, but after I wrote out examples of code I didn't like it wound up actually being O.K. Here's the examples to show how it would work in case someone else is having the same hesitations:

type Option[T any] interface {
  Some[T] | None
}

// This is necessary because a type parameter can't be used directly in a type list. It also
// fixes a potential issue from someone trying to do, for example, Option[None].
type Some[T any] struct {
  Val T
}

type None struct{}

func Example(arg string) Option[fs.File] {
  // ...
  if (condition) {
    return None{} // This feels strange to me. It's got no real connection to `Option[T]`.
  }
  // ...
}

type Result[T any] interface {
  Ok[T] | Err
}

type Ok[T any] struct {
  Val T
}

// This is necessary because an interface with methods can't be used directly in a type list.
type Err struct {
  Err error
}

func (err Err) Error() string { return err.Err.Error() }
func (err Err) Unwrap() error { return err.Err }

I would still prefer that the types in the union be more directly attached to the union type itself, but with proper package naming it should be fine most of the time, probably.

This mechanism also makes it possible to attach nicely specialized methods to the union by adding them to the interface:

type Option[T any] interface {
  Some[T] | None

  Or(T) Option[T]
}

// Same type definitions as before.

func (s Some[T]) Or(v T) Option[T] {
  return s
}

func (n None) Or(v T) Option[T] {
  return Some[T]{Val: v}
}

@Merovius
Copy link
Contributor

Merovius commented Jan 5, 2023

FWIW personally I very much like this proposal. I think it is one of the most well-designed proposals to add sum types to Go I've seen and IMO the result would fit pretty nicely into the language. And in a world where we didn't have union-elements for interfaces for constraints, I would probably be in favor of this proposal. But given that we do have union-elements, it just seems too confusing to have two so similar, but distinct mechanisms in the language. I think this is a better mechanism for sum/union types in themselves, but we should only have one and there's one which is already in.

@zephyrtronium
Copy link
Contributor Author

@DeedleFake
With that example, what is the value of var o Option[fs.File]? Is it nil? Then your option type really has Some[T], None, and nil as options, and I have to check for nil before I can safely call Or. If it's non-nil, which variant is it? How does the compiler initialize it; will the zero value be all zeros in memory like it is for every other type? Is it simply an illegal statement, i.e. there is no zero value? That would imply that it can't be the element type of a pointer, slice, map, or channel, and that reflect would have to introduce new panics in currently fine methods.

I really think a separate proposal for values of non-basic interface types would be helpful.

@DeedleFake
Copy link

The zero value of interfaces as unions has been discussed elsewhere, especially in #41716. I don't think that a good solution was ever decided on. I believe the main options it got narrowed down to were either that yes, it can be nil, or that it defaults to the first type listed. The latter option would be less surprising from a usage standpoint, I think, but it wouldn't correspond to all zeros in memory like other types so it feels a bit cheaty. It makes sense with most hypothetical types I can think of, though:

type Option[T any] interface {
  None | Some[T] // None as a default is good.
}

type JSONType interface {
  Null | String | Number | ... // Null makes sense as a default.
}

type Result[T any] interface {
  // I'm less sure about this one, but I guess that a nil error as a default isn't a terrible option.
  Err | Ok[T]
}

@zephyrtronium
Copy link
Contributor Author

@Merovius
One of my goals with this proposal, in particular with defining a meaning for T1 | ... | Tn outside interfaces, is to allow one day rewriting the spec on general interfaces exclusively in terms of other orthogonal language elements. In particular, I'd love to see a sequence like:

  1. define T1 | ... | Tn to be a type-discriminated union type (what I feel is an important part of this proposal);
  2. define ~T to be a new kind of type representing a supertype of T that acts like an unboxed interface but supports all the operations of T;
  3. add a clause to assignability of union types that implicitly chooses the right discriminator when exactly one dependent type is identical to the assigned value's type;
  4. say that a type is assignable to the interface type when it has all the interface's methods and is assignable to all the non-interface types embedded in it;
  5. remove everything about different kinds of interfaces, because now we can just say "interfaces can list methods and embed types" and "to instantiate a type parameter P with a type T, T must be assignable to P."

I'm not certain how viable all of those steps are, and most of them are out of scope for this proposal. I'm choosing to keep the proposal close to minimal, especially considering it is already very long.

Allowing values of non-basic interfaces is probably a shorter path to the goal of "unifying" interfaces at least when the measure is word count, but I also feel the disagreements in #19412 and #41716, so I'm not convinced that path can be taken at all.

@zephyrtronium
Copy link
Contributor Author

zephyrtronium commented Jan 5, 2023

@DeedleFake

The zero value of interfaces as unions has been discussed elsewhere, especially in #41716. I don't think that a good solution was ever decided on.

That is my point. Hence why it would be helpful to have a separate proposal instead of continuing to discuss a somewhat vague idea in (I hope) a very specific and concrete proposal about a related but different thing.

@DeedleFake
Copy link

DeedleFake commented Jan 10, 2023

It seems to me that the main draw of #57644 is that it attempts to reuse the constraint type lists for something that looks very similar, but it's actually quite different. The constraint type lists exist primarily to allow for a way to deal with operators in constraints. That usage doesn't apply to union types because there would be no way to guarantee at compile-time that the underlying types are the same. For example:

// Works fine because v1 and v2 are the same type.
func add[T ~int | ~uint](v1, v2 T) T { return v1 + v2 }

// Who knows what types v1 and v2 actually are relative to each other?
func add(v1, v2 interface{ int | uint }) (interface{ int | uint }) { return v1 + v2 }

I think it's a mistake to try to cram union behavior into type lists just because it looks similar. If a way can be found to make it work the way unions actually should with a similar syntax that isn't confusing, that would be great, but I'd rather have a completely new type and leave type lists only for constraints than wind up with a half-implemented union type that isn't very useful in practice just for the sake of what looks like consistency.

@DmitriyMV
Copy link
Contributor

@zephyrtronium

Since this proposal chooses to make the zero value explicit, it largely avoids that problem; a zero value is required, but it can be made something meaningful and useful.

Your Response type, specifically

type Response switch string `json:"type"` {
	case "alpha": string `json:"data"`
	case "beta": int    `json:"data"`
}

Does look like it have an implicit nil value.


Also, I think the last example is wrong and should be:

func Use() {
	r, err := Decode()
	if err != nil {
		// Note that this can include the case where the JSON specifies a
		// variant not in the union type.
		panic(err)
	}
	switch r := r.(switch) {
	case "alpha":
		doStringStuff(r)
	case "beta":
		doIntStuff(r)
	// case "anime": // Invalid because "anime" is not in the discriminator set.
	default:
		panic("r is nil")
	}
}

@zephyrtronium
Copy link
Contributor Author

zephyrtronium commented Jan 11, 2023

@DmitriyMV Good catch. The type in that example is invalid according to the proposal's own rules, because it needs to specify either default: or case "":. I'll update that when I get a chance. Note that it is not an implicit nil, though; the zero value is Response{}. Unlike a nil interface, you can call methods on it and use the dependent value of the zero variant.

Edit: Updated.

@yordis
Copy link

yordis commented Apr 21, 2024

In my work with event sourcing and event-driven systems, I frequently encounter the necessity of defining numerous empty interfaces to represent union types.

Each stream involves a repetitive process of declaring these interfaces to manage the polymorphism inherent in Commands and Events.

All Decide, Evolve, and Event Processors consequently rely heavily on type-casting. In some cases, it isn't a big deal; in others, it is high-risk and error-prone.

//Every stream in the system will do this!
type state struct{}
type Event interface{}
type Command interface{}

func decide(s state, command Command) (Event[], error) {
  switch c := command.(type) {
    // ...
  }
}
func evolve(s state, event Event) state {
  switch e := event.(type) {
    // ...
  }
}
//Every event handler
func handleEvent(ctx context.Context, event any) error {
 switch e := event.(type) {
    // ...
 }
}

Although I appreciate the simplicity of Go, the repetitive nature of this pattern in professional practice can be frustrating. The frequent need to handle type-casting adds complexity, especially the nuances between pointers, values, and actual registered types. Since Event Processors interact with generic interface{} types rather than concrete ones, there's an inherent risk of runtime type errors.

Event-driven systems typically feature loosely coupled components; the teams/people responsible for defining events often differ from those implementing the Processors. Therefore, Event Processors can not dictate how to implement a given interface to deal with polymorphism, practically speaking, and extremely high risk due to coupling; Conway's Law is not in our favor here.


A possible syntax improvement inspired by Rust might involve using brackets from Generics Syntax, which could offer a more intuitive and error-reducing approach:

enum Foo {
    Stringy[String],
    Numerical[u32]
}

@zephyrtronium
Copy link
Contributor Author

A possible syntax improvement inspired by Rust might involve using brackets from Generics Syntax, which could offer a more intuitive and error-reducing approach

Any new keyword added to Go is inherently not backward compatible. It isn't viable to add an enum keyword because that would cause any code using enum as a variable name to cease to compile. This proposal specifically serves as a way to reconcile unions with the keywords and syntax already used in Go. I don't think my proposed syntax is especially amazing, so alternative suggestions are good, but those alternatives need to be viable.

@Merovius
Copy link
Contributor

Merovius commented Apr 22, 2024

@zephyrtronium We now have a mechanism to add new keywords, if we want to: They can be only a keyword, if the go directive in go.mod has the right version. Of course not any added keyword is possible - in particular, we can't allow code to be valid before and after its introduction, with different meanings. I'm not sure whether the given suggestion is such a case (I don't think so, but there might be non-obvious ambiguities).

(Also, a proposal requiring a keyword will still have a higher cost, so a correspondingly higher burden of proof that this cost is justified. So it'll still be better not to need a new keyword. Just pointing out, that it's no longer categorically true that we can't have new keywords).

@atdiar
Copy link

atdiar commented Apr 26, 2024

@zephyrtronium I've been rereading the proposal again.

I seem to understand that the whole point is to introduce type constructors parametrizable by value.
Somewhat as opposed to typed value constructors which create a value of a given type. (dual)

This is the same need that is expressed in proposals that tackle the same issue, e.g. #67064 and #65555.

However, I think that implementing what you are describing may introduce complexity that can be avoided. For instance, it adds a new, non-obvious type kind.
I don't think it is necessary.
I'd rather see some additional form of subtyping/refinement types that would help us tackle the issue of const value/types.
With the preexisting facilities added when introducing parametric polymorphism in Go, I believe that this would integrate more seamlessly into the language to allow for a form of dependent type (we already have phantom types).

For example:

type Deux int={2}
Should be a subtype of int.

I think it should require less changes at the compiler level (to be checked). And then could work as a type argument to generic functions, essentially taking care of the dependent typing needs.

That might be a better route for typed enums, unions and heterogeneous collections.

[edit] to be even simpler, we could use a specific kind of phantom type (parameter) which could appear as a union term that would be constrained to any member of the union although I'd rather think about how to deal with nil as untyped nil so that we iron out the corners around nilability.

// Still approximative syntax, zero has specific semantics here which impact the allocation behavior.
// zero is being used here to define which type's zero value will be used as the zero value for the union.
type SomeType[zero  int32 | float32 | string]  interface {
    int32 | float32 | string | zero
}

In any case, that is an interesting take.

@ianlancetaylor ianlancetaylor added LanguageChangeReview Discussed by language change review committee and removed v2 An incompatible library change labels Aug 6, 2024
@seankhliao seankhliao changed the title proposal: Go 2: unions as sigma types proposal: spec: unions as sigma types Aug 7, 2024
@alecgargett
Copy link

alecgargett commented Oct 5, 2024

@zephyrtronium We now have a mechanism to add new keywords, if we want to: They can be only a keyword, if the go directive in go.mod has the right version. Of course not any added keyword is possible - in particular, we can't allow code to be valid before and after its introduction, with different meanings. I'm not sure whether the given suggestion is such a case (I don't think so, but there might be non-obvious ambiguities).

(Also, a proposal requiring a keyword will still have a higher cost, so a correspondingly higher burden of proof that this cost is justified. So it'll still be better not to need a new keyword. Just pointing out, that it's no longer categorically true that we can't have new keywords).

Interesting. Could you elaborate on what kind of proof would be adequate? Because all I've got is the impression that people (even Gophers) love the way enums are done in Rust/ML & Rust/Ocaml fans regarding the lack of anything similar in GO as its biggest drawback, with perhaps the most central implementation being the Option --> None/Some and Result --> Ok/Err in Rust, which certainly looks very elegant and useful to me too even as a non-user.

@Merovius
Copy link
Contributor

Merovius commented Oct 5, 2024

Could you elaborate on what kind of proof would be adequate?

To acknowledge this: You have the disadvantage here. Both because you are arguing in favor of a change and the easy default is "do nothing". And because language design is ultimately subjective, so while "we like it better this way" is an effective argument from the deciders on the Go team, the community needs to provide some more objective evidence. So you need to overcome both inertia and personal preferences.

I don't think anyone can really say what exactly would be sufficient evidence to do that. This proposal provides good theoretical arguments (it demonstrates thoroughly how sum types could fit into Go). I think to move it forward, empirical arguments would be useful (e.g. quantifying a reduction in lines of code or production outages or engineering hours that this proposal would enable).

Though I'll caveat that by saying that, in my personal opinion, #57644 has a significantly higher chance of being accepted, so spending time on collecting data for this proposal might be wasteful.

Because all I've got is the impression that people (even Gophers) love the way enums are done in Rust/ML

This is not universal. There are also people who feel differently. The main argument against them is the impossibility of gradual code repair, should an expansion or reduction of the set of valid cases be needed. As a particular example, a natural use case for sum types would be use in go/ast. But if we did that, we could never introduce new AST nodes into Go - ironically making this proposal itself impossible. Similar for reflect.Kind, which would make it impossible to introduce int128.

with perhaps the most central implementation being the Option --> None/Some and Result --> Ok/Err in Rust, which certainly looks very elegant and useful to me too even as a non-user.

This isn't universal either. There are certainly people who prefer the current way Go does these, or at least do not perceive the difference to be as radical as the advocates for sum types. As one argument, AIUI Rust in particular relies pretty heavily on additional features, to make these types work well. For example, without the ?-operator, Result doesn't really reduce boilerplate.

I think in this case, the impression is even less reliable, because "from scratch", Go's current way of doing things is harder to argue for (which does not necessarily imply that it's wrong) so you are less likely to hear people make that argument, so the proponents for change are relatively more visible.

I don't expect any of these arguments to be convincing to anyone, to be clear. I'm just trying to demonstrate that it's possible to disagree. And thus, that the pure impression that "people think this is a good idea" is at least questionable. If you want to use that as an argument, it should be quantified. Again, though, the caveat that pure popularity of ideas is a relatively weak input into the decision process of the Go project. So I'm not sure how worthwhile the effort would be. And also, that the methodology would be difficult. E.g. "do you want sum types?" is not a useful question, as there are many different ways to implement them, with different tradeoffs. "do you want sum types, as Rust does them?" also isn't good. Because as demonstrated above, Rust relies on other, unrelated features too. So it's not obvious what a good question to ask would be.

@alecgargett
Copy link

I see. Thanks.

@joneshf
Copy link

joneshf commented Oct 8, 2024

This is not universal. There are also people who feel differently. The main argument against them is the impossibility of gradual code repair, should an expansion or reduction of the set of valid cases be needed. As a particular example, a natural use case for sum types would be use in go/ast. But if we did that, we could never introduce new AST nodes into Go - ironically making this proposal itself impossible. Similar for reflect.Kind, which would make it impossible to introduce int128.

Sorry I'm not really involved in this discussion, just following along as an interested party. So forgive me coming out of left field, but I don't really understand this point.

I thought I got the point of the linked article. But I'm not seeing how there's something inherent in sum types that prevents gradual code repair. If you use sum types in the most straight-forward way possible, and let them propagate throughout the entire code base, you get to the point where you do have to make atomic changes across the entire code base. And that does fly in the face of gradual code repair. But, nothing requires letting sum types propagate throughout the entire code base.

If I'm understanding the proposal, one option would be to have something like this in the ast package:

package ast

…

type Decl switch string {
    case "func": FuncDecl
    case "gen": GenDecl
    case "bad": BadDecl
}

type DeclHandler[ReturnValue any] struct {
    Func func(FuncDecl) ReturnValue
    Gen func(GenDecl) ReturnValue
    Bad func(BadDecl) ReturnValue
}

func HandleDecl[ReturnValue any](
    decl Decl,
    handler DeclHandler[ReturnValue],
) ReturnValue {
    switch decl := decl.(switch) {
    case "func":
        if handler.Func == nil {
            return *new(ReturnValue)
        }

        return handler.Func(decl)

    case "gen":
        if handler.Gen == nil {
            return *new(ReturnValue)
        }

        return handler.Gen(decl)

    case "bad":
        if handler.Bad == nil {
            return *new(ReturnValue)
        }

        return handler.Bad(decl)

    default:
        return *new(ReturnValue)
    }
}

…

For callers that don't want to be broken whenever ast.Decl changes, they can use ast.HandleDecl whenever they want to destructure an ast.Decl. For callers that don't care, they can use ast.Decl directly. Again, if everything uses ast.Decl directly, then yes you'll have to do a large atomic change if you add or remove a case from ast.Decl. But for callers that use ast.HandleDecl, you shouldn't have to make large atomic changes.

If you add another case to Decl, the caller doesn't have to update their Handler immediately to deal with that new case because structs have zero values.

If you remove a case from Decl, it's a similar situation to how interfaces work now (meaning that nothing is better or worse than now):

  • Either you explicitly break callers by removing the corresponding case in Handler, and also the case in HandleDecl.
    • This is similar to removing the type entirely when using an interface.
  • Or you implicitly break callers by keeping the corresponding case in Handler, and only removing the case inHandleDecl.
    • This is similar to keeping the type and removing its declNode method.

If the expansion side is too implicit, you can define a new pair of Handler/HandleDecl that have the expansion in them, and slowly migrate to using the new pair. I don't know how you get away from breaking things explicitly or implicitly with the reduction when using either interfaces or sum types, though. So maybe that's the bit I'm missing? But I also feel like given Go's backwards compatibility promise, reduction that might not apply to this specific example, but more relevant in other examples?

In any case, I feel like I'm missing something about the article and how sum types break gradual code repair. Can you help me (and other folks following along) know what it is that sum types prevent with gradual code repair?

@DragonDev1906
Copy link

(also following this as an interested party)
I assume it's because of the (implicit) assumption that these unions are not non_exhaustive (term from Rust):
If the compiler checks that you have handled all possible cases/variants (which many people probably want) you do get an error/warning if someone adds a change that isn't aware of the newly added variant. But that can be entirely avoided through multiple means, thus making this a non-problem for gradual restructures / gradual code repair:

  • If it is just a warning you haven't broken other peoples code by adding new variants, as @joneshf mentioned. Only the initial change may be relevant, unless it behaves similar to interfaces in its use.
  • If these types could be marked as exhaustive or non_exhaustive you could do changes in multiple steps: (1) add default-cases (2) add non_exhaustive (3) add new variant (4) gradually add the new variant to all places (5) remove non_exhaustive

In my opinion you have the same issue with interfaces and not doing a hard error for variants that are not handled will not negatively impact gradual code repair any more than interfaces do.

@bjorndm
Copy link

bjorndm commented Oct 8, 2024

Go Interfaces are already non-exhaustive sum types, so no need to add those.

The benefit is in exhaustive sum types is that it allows static type checking. But as mentioned exhaustive sum types do have the downside that changes to the sum type cannot be done gradually. I would say that is a feature in stead of a problem since static type checking type switches at compile time cannot be done now in Go.

@Merovius
Copy link
Contributor

Merovius commented Oct 8, 2024

@joneshf Note that I was responding to "Because all I've got is the impression that people (even Gophers) love the way enums are done in Rust/ML". There might be ways to do them differently, but I understood this comment to include exhaustiveness checking. It's that specific component (absent from this proposal) that is inherently incompatible with gradual code repair.

Again, if everything uses ast.Decl directly, then yes you'll have to do a large atomic change if you add or remove a case from ast.Decl. But for callers that use ast.HandleDecl, you shouldn't have to make large atomic changes.

If I may be a bit snippy: Yes, if your callers do not actually use the sum types you declare, I agree that sum types do not conflict with gradual code repair. But I would argue the point of having sum types is to use them.

Note that a breaking change is a breaking change. You can't bump a major version number only for "some callers".

If you add another case to Decl, the caller doesn't have to update their Handler immediately to deal with that new case because structs have zero values.

Which is another reason why "the way enums are done in Rust/ML" doesn't translate particularly well to Go, by the way. In Rust/ML, there is no such thing as a zero value and only specific constructors are allowed. That's an important part of the concept. But that's another story.

@yordis
Copy link

yordis commented Oct 8, 2024

Is there anything we could take here to move the situation forward?

@ianlancetaylor
Copy link
Contributor

My purely personal take on this proposal is that it is well written and it addresses the problem that it sets out to solve. But I don't see the compelling use cases. It doesn't permit us to do anything we can't already do it, it just permits doing it more concisely and in some cases more efficiently. And it's a fairly complex language change, introducing a new category of type that is significantly unlike any existing type. Go aims to be a simple language, and this doesn't feel simple. Go aims to be a language in which you write code rather than design types, and this feels like a fairly sophisticated kind of type, more complex than any of Go's existing types.

I'm not saying there are no benefits to this change, because I do see benefits. And I'm not saying that the costs are prohibitive, because we could live with them if the benefits were large enough. But at this point I'm not seeing the benefits outweigh the costs.

@yordis
Copy link

yordis commented Oct 8, 2024

Wouldn’t it be great to focus on what we can achieve together rather than getting caught up in personal opinions or concerns that might not have complete support?

I’ve noticed that emotions and personal experiences often play a big role in shaping how we think, especially when it comes to software. I'd much rather collaborate with those who are invested in making progress, even if it means stepping back from my own perspective and letting them take the lead or simply doing nothing.

Generics is an excellent example of a very similar situation.

@ianlancetaylor
Copy link
Contributor

Sure. But we can't let our desire to make progress bias us in favor of saying yes. Go won't succeed if it doesn't change with the times. It also won't succeed if it changes to become too hard to learn.

@Merovius
Copy link
Contributor

Merovius commented Oct 8, 2024

@yordis It would be possible to push for this proposal to go through the formal process and review meeting. That is how to get past personal opinions. I'm pretty confident that the result would be, that it will be rejected after a few weeks. I don't think that would improve the situation for anyone. Sometimes, it's better to just keep the cat in the box.

You seem to look for a way to make this proposal happen. See the first and last paragraph of what I wrote here if you want to invest actual energy into it. It mentions some ideas for additional data that could be added. But if I'm completely honest, I don't think it's a good use of time. I don't think the material reality is unknown to the relevant people. But who knows, maybe you find some surprisingly compelling arguments.

Generics is an excellent example of a very similar situation.

I do agree. And there are reasons why it took ten years in pretty much exactly the same kind of limbo that variant types are in right now. And the reason was not "people didn't ask for them enough".

@yordis
Copy link

yordis commented Oct 8, 2024

@Merovius, I agree with you. What I’m suggesting is that we focus our efforts on figuring out how we can help make this successful before going through the formal review process.

Could we take that approach?

I trust the Go team to take ownership and make decisions that are in the best interest of the community. That's their responsibility.

But right now, it’s exhausting to navigate through this situation. Eventually, these threads become less about the proposal itself and more about everything around it. I think we can do better by collaborating upfront.

@zephyrtronium
Copy link
Contributor Author

Re: exhaustive case checking and gradual code repair, the proposed union types are always non-exhaustive. As described in the original proposal:

Exhaustive switches on sum types have been discussed in #41716, and the conclusion at the time was that it would be a mistake to include them. The rationale there is based on that proposal allowing underlying type terms, which is not proposed here, so perhaps it would be reasonable to include them. However, adding one creates a strong reason not to add the other, and I feel that syntactic unification with type terms of general interfaces is a better goal.

While not mentioned there, gradual code repair is yet another reason not to require exhaustive checking. Yet despite being non-exhaustive, I believe that the union types proposed here have benefits on par with type parameters – which, of course, were judged to outweigh their costs.

Along with that, I disagree with the claim in #54685 (comment) that the proposed union types don't do anything we can't do already, specifically because of their appearance in reflection. Whether you use iota constants or fancier techniques with sealed interfaces, the only reasonably safe way to enumerate an enumeration in Go today is to use code generation. Even if you go through that effort, general-purpose reflective packages like encoding/json each need a special-purpose mechanism à la json.Unmarshaler to handle your type, which you need to address individually as part of your code gen. Not to mention this entire process is easy to get wrong and can explode in arbitrary directions when you do (historically #36422, #36068). I've gone through this effort; it was a major motivation for me to write this proposal.

To answer your question directly, @yordis, the best way to improve the position of this proposal is to show real code examples – not hypotheticals, and not descriptions – of things we can do with it that we cannot reasonably do without it. If those examples highlight problems with the proposed design, that's arguably even better. If you have anecdotes of actual situations where the additional safety of the proposed (non-exhaustive) union types would have prevented serious problems in production, that might be helpful as well, though certainly weaker. It's also important to show why this proposal, and not e.g. #57644, is the right solution. (My arguments in #54685 (comment) might get you started there.)

The important thing is to add new information to the discussion. A well-informed participant will have read the entirety of the discussions here and in #19412, #57644, #41716, #19814, and other issues linked to those. That's why, as you say, these threads tend to become more about everything around them. There is simply a lot of context that matters.

@yordis
Copy link

yordis commented Oct 9, 2024

@zephyrtronium related to

The only reasonably safe way to enumerate an enumeration in Go today is to use code generation.

That is precisely why we are enforcing all the events today to be defined in the protobuf files, primarily to codegen, even though we prefer to decouple Domain data types from the serialization of the messages (as we do in other stacks, except Go at the moment) Dealing with pointer issues due to the usage of interfaces or developers forgetting to reference or dereference ... Simultaneously, the frustration of defining union types by hand and having to be paranoid by nil checks all over the place.

Which is also related to the actual production code that I posted above: #54685 (comment)

Here is an example, copied from literal production:

var ErrPlanNotFound = errors.New("plan not found")
var ErrPlanArchived = errors.New("plan already archived")

var Decider = trogon.NewDecider(decide, evolve)

type State struct {
	PlanId     *string
	IsArchived bool
}

func decide(state State, command *planproto.ArchivePlan) ([]*planproto.Event, error) {
	if state.PlanId == nil {
		return nil, ErrPlanNotFound
	}
	if state.IsArchived {
		return nil, ErrPlanArchived
	}

	return []*planproto.Event{
		{
			Event: &planproto.Event_PlanArchived{
				PlanArchived: &planproto.PlanArchived{
					PlanId:     command.PlanId,
					ArchivedBy: command.ArchivedBy,
					ArchivedAt: command.ArchivedAt,
				},
			},
		},
	}, nil
}

func evolve(state State, event *planproto.Event) State {
	switch e := event.Event.(type) {
	case *planproto.Event_PlanCreated:
		state.PlanId = &e.PlanCreated.PlanId
		return state
	case *planproto.Event_PlanArchived:
		state.IsArchived = true
		return state
	default:
		return state
	}
}

Multiply this for at least 200 command handlers in one single application. This is a nightmare.

Again, we are already enforcing things using proto buffer even though we use JSON format in the event store, or we technically do not need such a package at all in that layer; just there.

Even more of these in the Event Handler processors are connected to the Event Store. That is even worse since the Message Type Provider packages could combine types from multiple proto buffers.

@DragonDev1906
Copy link

DragonDev1906 commented Oct 9, 2024

It doesn't permit us to do anything we can't already do it, it just permits doing it more concisely and in some cases more efficiently. And it's a fairly complex language change, introducing a new category of type that is significantly unlike any existing type. [...]

But at this point I'm not seeing the benefits outweigh the costs.

The only reasonably safe way to enumerate an enumeration in Go today is to use code generation.

Maybe this comparison goes too far, but what is a compiler? A compiler is a tool that generates code (assembly/machine code) to make it easier to develop (correct) code. Allowing you to write more concise code, allowing code to be more efficient (in some cases). At the same time a compiler tries to simplify the underlying system (we're not using goto for example) and use more complex (in terms of implementation) things like if, for, ... to make writing (correct) code easier. Compared to gotos they are also a "new category [...] that is significantly unlike any existing [...]". Granted, those where (usually) added to new languages and not to existing ones, but the same applies for a bunch of other constructs we use everywhere: OOP, Interfaces, Classes, Methods.

Isn't the fact that people use code generation, a separate tool that produces code for the compiler an indicator that this is something that is missing in the language itself? In my opinion code generation (from some other higher abstraction to Go code) adds even more overhead and complexity for something that (from the standpoint of the hardware) should be even simpler than interfaces.

@joneshf
Copy link

joneshf commented Oct 9, 2024

@joneshf Note that I was responding to "Because all I've got is the impression that people (even Gophers) love the way enums are done in Rust/ML". There might be ways to do them differently, but I understood this comment to include exhaustiveness checking. It's that specific component (absent from this proposal) that is inherently incompatible with gradual code repair.

Again, if everything uses ast.Decl directly, then yes you'll have to do a large atomic change if you add or remove a case from ast.Decl. But for callers that use ast.HandleDecl, you shouldn't have to make large atomic changes.

If I may be a bit snippy: Yes, if your callers do not actually use the sum types you declare, I agree that sum types do not conflict with gradual code repair. But I would argue the point of having sum types is to use them.

Note that a breaking change is a breaking change. You can't bump a major version number only for "some callers".

Ah, okay. I think I get the point you're trying to make. Thanks for explaining that!

I'll go back to observing 😄.

@torbenschinke
Copy link
Contributor

Our experience is, that we actually need exhaustive and non-exhaustive checks of enum-like types in practice: When writing libraries or internal frameworks, we need a design, which supports gradual code repair and compatibility. Breaking changes would be very expensive for all projects which import them and outweigh any cost. And this works great with what the language offers today. This is solved good enough.

However, when modelling domain models, we want the exact opposite of this behavior. There are no dependencies outside of the customer project. Any change in the domain must be exact and complete. Any deployed unseen cases are more expensive, then the maintenance costs of enforced incompatible changes. We may write unit-tests, but that is unreasonable expensive as a workaround for a "trivial" compiler check.

Today, we ensure this by maintaining our own macro and code generator system, which is a pain to maintenance. We (mis-)use the interface generic notation to create tagged union-like boxes mixed with marker interface methods where applicable. Our generated solution has all the already mentioned short comings like (no) matching for polymorphic types, double boxings, unneeded heap escapes etc. And still we see, that the benefits outweigh all those shortcomings in practice. It saves us probably hundreds of hours, if we had a build-in alternative. It would be more, if we wouldn't need to maintain our own generators.
Also, just a side note: we currently investigate into a linter to enforce constructor usage of annotated types, because we have seen that things are used repeatedly the wrong way, despite of extensive comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LanguageChange Suggested changes to the Go language LanguageChangeReview Discussed by language change review committee Proposal
Projects
None yet
Development

No branches or pull requests