Description
The text in PEP 483 that describes how unions behave actually doesn't do a very good job as a definition of a union type.
- Union[t1, t2, …]. Types that are subtype of at least one of t1 etc. are subtypes of this.
- Unions whose components are all subtypes of t1 etc. are subtypes of this. Example: Union[int, str] is a subtype of Union[int, float, str].
- The order of the arguments doesn’t matter. Example: Union[int, str] == Union[str, int].
- If ti is itself a Union the result is flattened. Example: Union[int, Union[float, str]] == Union[int, float, str].
- If ti and tj have a subtype relationship, the less specific type survives. Example: Union[Employee, Manager] == Union[Employee].
- Union[t1] returns just t1. Union[] is illegal, so is Union[()]
- Corollary: Union[..., object, ...] returns object.
This lists some facts that are indeed true if we were to interpret Union[A, B]
as a static type that stands for a set of values that is a set-theoretic union of the sets that A
and B
stand for. However this does not actually define Union[A, B]
to be that set.
Many have interpreted the first line: "Types that are subtype of at least one of t1 etc. are subtypes of this." as a definition:
C <: A | B
if and only ifC <: A
orC <: B
However this is now how set-theoretic unions work, it is not in general true that
Counterexample:
>>> A = {1}
>>> B = {2}
>>> C = {1, 2}
>>> C.issubset(A | B) == (C.issubset(A) or C.issubset(B))
False
What people probably are thinking of is the formula <:
aka
This is a solved problem and I'll cut to the chase: the formulas that define set-theoretic unions and intersections are as follows:
The duality between intersection and union lies in reversing the subset relation, not in replacing an "and" with an "or". In fact it is a well understood idea that transitive relations play extremely poorly with "or", and extremely well with "and". Hence the above definitions both using "and".
What about
Proof
- Substitute
$C = A \cap B$ into the first equation. LHS is true by reflexivity, so RHS must be true too. If we have$A \subseteq D$ then by transitivity$A \cap B \subseteq A \subseteq D$ . Likewise if we have$B \subseteq D$ . So in total if we have either$A \subseteq D$ or$B \subseteq D$ then we have$A \cap B \subseteq D$ . - Same proof but with all relations reversed. Substitute
$C = A \cup B$ into the second equation, RHS must be true. If$D \subseteq A$ then by reflexivity$D \subseteq A \subseteq A \cup B$ . Same for$B$ .
In #1, here, @erictraut lists all 4 of these on equal footing, labelled "non-controversial". I want to emphasize that two of these involving "and" are actually definitions by universal property, and the two involving "or" are special cases/corollaries.
What does the PEP-483 paragraph actually define?
Most of the bullet list is actually describing a reduction procedure for type expressions involving class literals and unions. The proposed algorithm of checking C | D <: A | B
with (C <: A or C <: B) and (D <: A or D <: B)
only works exactly because each of A
, B
, C
, D
can only be a class, as the only other possible form - union - gets flattened first. However in presence of intersections this algorithm already fails:
A & (B | C) <: (A & B) | (A & C)
vs A & (B | C) <: A & B or A & (B | C) <: A & C
I'll also cut to the chase here: you want to reduce the left hand side to a disjunctive normal form, and the right hand side to a conjunctive normal form. The |
on the left and the &
on the right can then be taken apart. But the &
on the left and the |
on the right cannot. Problems of the kind t1 & ... <: s1 | ...
where t1
, ..., s1
, ... are class literals -- are in general irreducible and will require knowledge of the inheritance tree and whether a multiple inheritance child class exists/can be constructed.