Skip to content

An issue with the accepted definition of Union #22

Open
@mniip

Description

@mniip

The text in PEP 483 that describes how unions behave actually doesn't do a very good job as a definition of a union type.

  • Union[t1, t2, …]. Types that are subtype of at least one of t1 etc. are subtypes of this.
    • Unions whose components are all subtypes of t1 etc. are subtypes of this. Example: Union[int, str] is a subtype of Union[int, float, str].
    • The order of the arguments doesn’t matter. Example: Union[int, str] == Union[str, int].
    • If ti is itself a Union the result is flattened. Example: Union[int, Union[float, str]] == Union[int, float, str].
    • If ti and tj have a subtype relationship, the less specific type survives. Example: Union[Employee, Manager] == Union[Employee].
    • Union[t1] returns just t1. Union[] is illegal, so is Union[()]
    • Corollary: Union[..., object, ...] returns object.

This lists some facts that are indeed true if we were to interpret Union[A, B] as a static type that stands for a set of values that is a set-theoretic union of the sets that A and B stand for. However this does not actually define Union[A, B] to be that set.

Many have interpreted the first line: "Types that are subtype of at least one of t1 etc. are subtypes of this." as a definition:

C <: A | B if and only if C <: A or C <: B

However this is now how set-theoretic unions work, it is not in general true that $C \subseteq A \cup B \iff (C \subseteq A) \lor (C \subseteq B)$

Counterexample:
>>> A = {1}
>>> B = {2}
>>> C = {1, 2}
>>> C.issubset(A | B) == (C.issubset(A) or C.issubset(B))
False

What people probably are thinking of is the formula $c \in A \cup B \iff (c \in A \lor c \in B)$, which is actually true. But we cannot use this as a definition because that involves checking potentially infinitely many $c$. Instead we want a definition in terms of the <: aka $\subseteq$ relation.

This is a solved problem and I'll cut to the chase: the formulas that define set-theoretic unions and intersections are as follows:
$$C \subseteq A \cap B \iff (C \subseteq A \land C \subseteq B)$$
$$C \supseteq A \cup B \iff (C \supseteq A \land C \supseteq B)$$

The duality between intersection and union lies in reversing the subset relation, not in replacing an "and" with an "or". In fact it is a well understood idea that transitive relations play extremely poorly with "or", and extremely well with "and". Hence the above definitions both using "and".

What about $A \cap B \subseteq ...$ and $... \subseteq A \cup B$? We have these formulas, as corollaries of the definitions above:
$$(A \subseteq D \lor B \subseteq D) \implies A \cap B \subseteq D$$
$$(D \subseteq A \lor D \subseteq B) \implies D \subseteq A \cup B$$

Proof
  • Substitute $C = A \cap B$ into the first equation. LHS is true by reflexivity, so RHS must be true too. If we have $A \subseteq D$ then by transitivity $A \cap B \subseteq A \subseteq D$. Likewise if we have $B \subseteq D$. So in total if we have either $A \subseteq D$ or $B \subseteq D$ then we have $A \cap B \subseteq D$.
  • Same proof but with all relations reversed. Substitute $C = A \cup B$ into the second equation, RHS must be true. If $D \subseteq A$ then by reflexivity $D \subseteq A \subseteq A \cup B$. Same for $B$.
Note that these are one-way implications. Sufficient but not necessary conditions. There's once again a symmetry with reversing the subset relation, but no symmetry between "and" and "or".

In #1, here, @erictraut lists all 4 of these on equal footing, labelled "non-controversial". I want to emphasize that two of these involving "and" are actually definitions by universal property, and the two involving "or" are special cases/corollaries.

What does the PEP-483 paragraph actually define?

Most of the bullet list is actually describing a reduction procedure for type expressions involving class literals and unions. The proposed algorithm of checking C | D <: A | B with (C <: A or C <: B) and (D <: A or D <: B) only works exactly because each of A, B, C, D can only be a class, as the only other possible form - union - gets flattened first. However in presence of intersections this algorithm already fails:

A & (B | C) <: (A & B) | (A & C) vs A & (B | C) <: A & B or A & (B | C) <: A & C

I'll also cut to the chase here: you want to reduce the left hand side to a disjunctive normal form, and the right hand side to a conjunctive normal form. The | on the left and the & on the right can then be taken apart. But the & on the left and the | on the right cannot. Problems of the kind t1 & ... <: s1 | ... where t1, ..., s1, ... are class literals -- are in general irreducible and will require knowledge of the inheritance tree and whether a multiple inheritance child class exists/can be constructed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions