Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An argument against intersections. Or, having our cake, and eating it too. #38

Closed
mikeshardmind opened this issue Jan 15, 2024 · 55 comments

Comments

@mikeshardmind
Copy link
Collaborator

mikeshardmind commented Jan 15, 2024

Note: This is An argument for the intentional non-implementation of pure intersections, and instead use all of what we worked out to implement an alternative option that satisfies all currently supportable use-cases, while leaving the door open to full intersections. This is not arguing to throw away everything we've worked on, but to use what we worked out to help our collective use cases, productively. I am actively inviting any constructive detraction to this, as if there is something these cannot do that are "must haves" to go this route would require determining if intersections are must-have to support those uses.

(edited) Note 2: Where you see type[*Ts] and object[*Ts]it may make more sense to haveOrderedIntersection` There's more details at the end on this now

With Thanks to @DiscordLiz , @erictraut , @gvanrossum, @mark-todd, and @NeilGirdhar in alphabetical order (and no other order of particularity) for presenting each extremely strong, and opinionated arguments as well as some feedback that led to some self-reflection on why I felt so strongly about particular outcomes of this, I believe that there is a strong argument that without other constructs (such as negation, pick, and omit), there is not a good argument for the inclusion of the theoretical intersection, and that even with those, it might only make sense for that to work with structural subtyping.

However, that leaves an awkward question, how can we best handle the use cases that are certainly valid without them such that I can say we don’t need intersections? Have we really just wasted months of work? No.

The pitch: The order matters to us and to the behavior we want, so lets go with an ordered construct that's simpler and still does what we need.

type[*Ts] and object[*Ts]

First of all, my apologies to @mark-todd as when this was originally floated as an idea in a slightly different form by him in Discord, I stated the non-equivalence to intersections and that that should be separate, and we went back to working on intersections, but there’s a real question here I think we need to answer: Do these not solve every case we care about sufficiently? As originally stated, I would say that they didn't, but I believe I made a mistake in viewing these as something we should consider separately instead of as something which might do enough while being simpler. Exploring the idea further led to a slight tweak in which they appeared to me to solve all presented use cases that intersections could in the context of the current type system.

If it does, we can use the existing explorations to justify this construct instead and leave notes on rationale for anyone who wants to explore the difference between this and “true intersections” at a later date.

Background

The primary difference between all interpretations of Intersection which anyone involved has stated support for (For this I'm mostly looking at 4, 5, and the not fully fleshed out, hypothetical 6 from #31, #37) comes into play in things that are defined on multiple operands. Put another way, the unordered nature of Intersection as a parallel construct to Unions has some very unpleasant consequences. This became especially contentious when looking at what happens in non-disjoint cases and (implicitly non-disjoin) cases involving Any.

I iterated on this several times trying to find a solution that satisfied all the complaints from multiple parties, but every time there was still some corner case that had to favor one use over another, and given that we wanted intersections to also take up an operator in the type system & the consequences for needing two different constructs to express one version or the other wasn’t sitting right with me. With some time spent reflecting on it, I realized my primary issue with favoring any solution over another was that by promoting one to use & and another being relegated to requiring an import, we were necessarily putting some needs above others when there was no consensus on which behavior would be more common, or how this would reconcile with Any.

Enter, an alternative: type[*Ts]

This is ordered. People want option 4 for type[Any & T]? Write type[Any, T]. Want option 5? Write type[T, Any]. This reflects existing behavior, including the ordered nature with subclassing of Any.

Semantics

type[*Ts] does not create a new type at runtime. It creates a virtual type that has the bases *Ts. The order is not semantically considered for nominal subtyping, but should be considered when handling diamond patterns. As a return type or annotation, the type returned must have the bases *Ts. The bases do not need to be in the same order, but the types must resolve as if they were. object[*Ts] is the equivalent formulation for an instance of something of that type, reflecting the duality between type and object at both runtime and in the type system.

This allows the following class definitions (and many non-trivial others)

class Foo(A, B):
    pass

#and 

class Bar(B, A):
    pass 

# but also

class Baz(A, B, C):
    pass

to each work for type[A, B] or instances of these classes for object[A, B].

This also allows all of the current assumptions of nominal subtyping to apply to this construct.

In the case of protocols, each must be satisfied.

Because of this, it is enough at assignment to check that each type in Ts for type[*Ts] is satisfied, as any real object being passed would be checked when defined for LSP. LSP does not need to be checked for at time of assignment. At time of use, lookup should be done in order. By deferring to the fact that LSP violations would be detected elsewhere, the ordered search should be fine.

Examples:

class X:
    def foo(self) -> int:
        ...

class Y:
    def bar(self) -> int:
        ...

class Z:
    def foo(self, arg: bool = True) -> int:
       ...

def goodcase1(x: object[X, Y]) -> tuple[int, int]:
    return x.foo(), x.bar() 


def goodcase2(x: object[Any, X]) -> Any:
    return x.foo("bar")  # Any has precedence

def goodcase3(x: object[X, Any]) -> int:
    ret = x.foo()
    reveal_type(ret)  # int, not Any, X has precedence
    x.bar()  # not on x, but allowed, see Any
    return x.foo() 

def badcase1(x: object[X, Z]) -> int:
    return x.foo(True)  # definition from X takes precedence for differing type of method foo
    # And any attempt to create this type would cause an LSP violation,
    # you can't subclass from both and have X's definition have precedence, so the error surfaces in multiple places.

def goodcase4(x: object[Z, X]) -> int:  
     return x.foo(True)  # definition from Z takes precedence for differing type of method foo
    # It's possible to create this type as well.

Does this really cover all of our use cases?

It covers all of mine, including potential future cases with type negation as a separate construct. I've given it a lot of thought, and it seems to cover every case I've seen floated for intersections, both in this repo and elsewhere when discussing intersections for python, but the trick with type[A, B] allowing the mro order to be B, A is mandatory for that to be the case

There are potential cases that I would be blind to here, but we’ve limited the scope of this to non-disjoint overlap of types with differing types involved in resolution, and this allows handling of even that scope to a very large degree.

There are potentially other cases that can come up with higher kinded types. I have not explored in depth what this would mean with the inclusion of higher kinded types in python, because there is no strong prevailing push for these to use as a framework to explore. The temptation to view HKTs as in one of the languages that has them exists, but there’s no guarantee that any of these would reflect what is added to python’s type system. I believe that this composes properly with any reasonable future interpretation of higher kinded types, but will refrain from saying that HKTs could not re-introduce the need for a fully featured intersection.

What about type safety vs pragmatism and warnings?

  • The ordering gives the user control in the only place it matters.
  • It would be reasonable to implement a check that Any did not precede non-Any types in the order, or even to preclude it all together in a check, but this would be an optional additional constraint provided by a specific tool, not something that needs to be forbidden by the type system (And which would have negative effects on gradual typing to forbid outright)

What about ergonomics and obviousness of behavior?

In the most commonly anticpated case, of disjoint types, the ordering won’t matter. The definition above only cares about the order when mro would change a resolved type. The complex cases where the ordering will matter should be obvious, as it will be the same as the required mro order when defining the type.

What about backwards compatibility with type[T] (singular type)?

Behavior is unchanged, the definition here is compatible with and does not change the behavior of the current use of type[T]

What about not using type and object here?

It's probably better to use a new typing.OrderedIntersection instead of object for this, and allow it to compose with type with the existing semantics.

That would make one of the examples using decorators look like this:

# I believe this is in here, but a protocol indicating supporting rich comparison if not
from useful_types import RichComparable  

def total_ordering(typ: type[T]) -> type[OrderedIntersection[RichComparable, T]]:  # see functools.total_ordering
    ...

# Any type wrapped in this is now known to be both a nominal subtype of the original type
# while also being a structural subtype of the protocol RichComparison.

@total_ordering
class Point:
    def __eq__(self, other) -> bool: ...
    def __lt__(self, other) -> bool:  ...

Having our cake and eating it too

It may be less satisfying for some here to have done all this work and to walk away not implementing intersections, but if we use this work to implement something that handles as many cases as this would, and which leaves the edge cases in the user's hands to pick the behavior they want, I think we walk away accomplishing everything we set out to do by adding intersections, by not adding intersections. If this works for everyone here, the work wasn't a waste, but informative of what the issues would be with an unordered intersection, and an exploration of what the scope of difference would be with different solutions. This remains available as a reference to those revisiting and (if we go with this) we should absolutely document for history "why not a true intersection".

@mark-todd
Copy link
Collaborator

mark-todd commented Jan 15, 2024

Firstly, thanks @mikeshardmind for formulating this - it's really cool to see a little idea I had being rigorously explored! Just have a syntax question really, which I mentioned before but I think it's worth discussing again - how do we differentiate instances vs classes?

# In current python
class A:
   pass

x: A = A()
y: type[A] = A

In the new syntax

class A:
   pass

class B:
   pass

class C(B,A):
   pass

x: type[B,A] = C
x = C() # How do we express type of this?

# Would it instead be:

x: type[B,A] = C()
x: type[type[B,A]] = C # This syntax seems kind of strange

The term type is used to say, "this follows the metaclass of type" I believe, so I'm not sure we can use object for instances. Am I missing something?

@mikeshardmind
Copy link
Collaborator Author

mikeshardmind commented Jan 15, 2024

x = C() # How do we express type of this?

This would be fine:

typ_x: type[B, A] = C 
x: object[B, A] = C()
# also fine:
x: object[B, A] = typ_x()

But I believe that in that particular case, it should just be left alone (To inference) or typed as C.

it becomes more clearly needed in a function signature

def some_constructor(typ: type[*Ts]) -> object[*Ts]:
    return typ()

leaving out some details there of course

@DiscordLiz
Copy link
Collaborator

Thanks for raising my attention to this on discord.

I don't have much to add to this. I would be happy with this as an outcome. +0 on using object this way, it's better that it's a builtin and it doesn't need to be an import, but I don't feel as strongly as you do on it being a natural choice.

Allowing reordering when trivial to the type structurally seems pragmatic to nominal typing use, but feels like this is just a clever formulation that hides the differences the type checker cares about being different from what runtime cares about. I think that's okay here since the behavior appears to be fully consistent with runtime and describes everything I think matters at static analysis time as well, but this needs to be something people seeing this don't gloss over.

@mark-todd
Copy link
Collaborator

mark-todd commented Jan 15, 2024

@mikeshardmind

This would be fine:

typ_x: type[B, A] = C 
x: object[B, A] = C()
# also fine:
x: object[B, A] = typ_x()

I follow your idea here - I like the idea of saving an import, but there are I think some points to consider:

Why is type[X] interpreted to mean what it does in python?

In general I think (correct me if I'm wrong here), that type[X] implies a class inheriting from X made from the metaclass type, so I guess this means we'll need to consider expanding the interface of other metaclasses too:

class B(type):
   pass

class C(metaclass=B):
   pass

class D(metaclass=B):
   pass
class E(C,D):
   pass

x: B[C,D] = E

Why do I care about metaclasses?

My point is not so much that I think metaclasses are a super common use case, but just that we should make sure the new syntax aligns with the way python already works.

Equivalent expressions

Under this syntax the following two are equivalent: type[object[A,B]] vs type[A,B]. Is this a problem? Not sure - just wanted to point it out.

Also, object[T] is now the same as T.

Implications for object

One thing I'm not sure of is what the new syntax might mean for object. At first object being an instance feels quite logical, but I'm wondering if we're missing something here. object is the base of all classes in the mro, so if you wanted to say "this is a class", it's quite natural to say:
type[object]
is a class that inherits from anything. But suddenly we could say:
type[A, object] (must inherit from A and object)
or even object[A,object] (instance of something that inherits from A and object).

What to do?

It might be that there are easy solutions to these, or things I'm missing - but it seems that a separate keyword would be a lot less complex. We could always go the route of Union and make it use builtins later down the line if something was worked out.

@mikeshardmind
Copy link
Collaborator Author

@mark-todd

Why is it type[X]?

type[T] already works this way today. Code sample in pyright playground

And also:

>>> class X: 
...     pass
...
>>> isinstance(X, type)
True

The only new things here are expanding this to allow multiple type parameters / typevartuples, defining what that means, and providing a way to specify when you want the instance behavior by expanding object.

@mark-todd
Copy link
Collaborator

mark-todd commented Jan 15, 2024

@mark-todd

Why is it type[X]?

type[T] already works this way today. Code sample in pyright playground

And also:

>>> class X: 
...     pass
...
>>> isinstance(X, type)
True

The only new things here are expanding this to allow multiple type parameters / typevartuples, defining what that means, and providing a way to specify when you want the instance behavior by expanding object.

@mikeshardmind

Sorry I realise now my heading was slightly misleading, but want I meant was "why is type[X] interpreted to mean what it does in python?", and the answer is that I think this is about the metaclass of X.

Update: I've investigated further and found it does not depend on the metaclass at all, so we're alright on that front - the below works:

class B(type):
   pass

class C(metaclass=B):
   pass

x: type[C] = C

@mikeshardmind
Copy link
Collaborator Author

mikeshardmind commented Jan 15, 2024

Ah okay, then I should clarify in a bit more detail on a couple things you said.

Yes, object[T] would be equivalent to T, there's really no way around this (even if we used a different name than object) that doesn't have special casing implications, we need a way to spell SomeName[T, U] as one type, and it should work with TypeVarTuples ideally as well to compose properly with generics.

My point is not so much that I think metaclasses are a super common use case, but just that we should make sure the new syntax aligns with the way python already works.

The relationship between type and object is actually a little different, I showed a little bit of it above, but classes (Rather than instances of them) are type objects and are usually often valid as a metaclass. I'm not sure where this is specified for the type system in a pep at this point in time, but it is mentioned here https://docs.python.org/3/library/typing.html#the-type-of-class-objects

@mark-todd
Copy link
Collaborator

mark-todd commented Jan 15, 2024

Yes, object[T] would be equivalent to T, there's really no way around this (even if we used a different name than object) that doesn't have special casing implications, we need a way to spell SomeName[T, U] as one type, and it should work with TypeVarTuples ideally as well to compose properly with generics.

I agree this is unavoidable whichever name we choose - I guess I was wondering if giving specifically object this property causes issues or lack of clarity, as it means:

object[object] == object
while: Inherit[object] == object seems clearer (not wedded to this name, just using as a placeholder)

Also Inherit[A, object] seems clearer than object[A, object].

Also by using a different name it has the nice property that, as with all other classes, type[X] can be the class while X is the instance.

Consider:

T = TypeVar("T")
def f(o: T) -> type[T]:
    ...
# Now suppose we have:
x: object[A,B] = C()

y = f(x)

The type of y here would be type[A,B] - but I'm not sure about this. For any other class x's type just gets wrapped in type - why should this behave differently?

If instead:

x: Inherit[A,B] = C()

y = f(x)

The type of y can be type[Inherit[A,B]] and it still has this property.

You could also consider the inverse function of this which I think has a similar issue.

@mikeshardmind
Copy link
Collaborator Author

I agree this is unavoidable whichever name we choose - I guess I was wondering if giving specifically object this property causes issues or lack of clarity, as it means:

object[object] == object

Seems fine to me. When I had the idea to use object, it was from the relation between type and object.

even the example you posed that mirrors with existing behavior of type, for instance, type (runtime) has a type (type checking) of type[type]

x: type[type] = type  # This type checks, and is correct today
y: object = object()  # this type checks, and is correct today
z: object[object] = object()  # in theory, this should be fine too.

It'll only look strange in cases where people have the single form, in which we can say that people should prefer not wrapping in object[] when un-needed.

I wouldn't want to special case and preclude that though since the equivalence is trivial.

This could very easily be a linting rule in something that is opinionated about style including a code action in IDE or other autofix (see stuff like ruff), but we do need a corresponding way to refer to an instance, and requiring an import (and keep in mind that typing is considered expensive to import, and many people avoid doing so) for this when object is right here and a reasonable parallel seems like it makes this worse.

@mark-todd
Copy link
Collaborator

mark-todd commented Jan 15, 2024

@mikeshardmind I'm pretty convinced by your points here:

x: type[type] = type  # This type checks, and is correct today
y: object = object()  # this type checks, and is correct today
z: object[object] = object()  # in theory, this should be fine too.

The only case that still bothers me is this one:

The type of y here would be type[A,B] - but I'm not sure about this. For any other class x's type just gets wrapped in type - why should this behave differently?

and it's inverse func as below:

def f(o: type[T]) -> T:
    ...

x: type[A,B] = C()
y = f(x)

Type of y here is object[A, B], which seems troubling to me - maybe it's just troubling because it's unfamiliar

@erictraut
Copy link
Collaborator

This is an interesting idea. @mikeshardmind, thanks for working out the details and presenting the proposal.

Here are a few follow-up questions to help me clarify some points that I'm unclear on...

  1. Are you proposing that all of the type arguments used in this construct must be concrete? Or are type variables allowed? For example, is object[int, S, list[T]] allowed if S and T are TypeVars? If type variables are allowed, how do you see that working? How would a constraint solver solve for them?

  2. How does this construct differ from a type defined with the existing class statement? Am I correct in understanding that class AB(A, B): ... is effectively describing the same type as type AB = object[A, B] in your proposal? Is object[A, B] just a shorthand way to express the same thing? If so, what problem does it solve over the existing class definition syntax? The class definition admittedly has some limitations and runtime overhead, but it has the added benefit of already existing in the language, and type checkers already know a lot about class statements, providing checks for things like metaclass conflicts, ambiguous MRO ordering, incompatible overrides, attempts to subclass final classes, etc.

From a naming perspective, I think that object and type would probably be pretty confusing, but we can figure out a different name if we move forward with this construct.

@mikeshardmind
Copy link
Collaborator Author

mikeshardmind commented Jan 15, 2024

Are you proposing that all of the type arguments used in this construct must be concrete? Or are type variables allowed? For example, is object[int, S, list[T]] allowed if S and T are TypeVars? If type variables are allowed, how do you see that working? How would a constraint solver solve for them?

Type Variables and typevar tuples are allowed, but type checkers are allowed to say they don't know that they represent if there isn't an appropriate corresponding use. I have not written out all of the places I would expect this to be solvable yet, but this is definitely a question that needs at least a minimum set of detectable specified even if we leave an upper limit wider in case of type checker advances. I can work on better specifying this with places I expect it to work and not, but I think it's fine to leave this with similar rules to existing type vars: there needs to be a clear place the types enter the system at some point to tie it all together. I'd really prefer to leave this as much in the ball court of type checkers as possible with how much they deem they can infer. I would state though that because this is intended to flexibly work with other non-specified bases, it's probably not reasonably inferable from "there are no other bases" I'll see about firming this up.

At assignment, it is sufficient to check that each type is individually satisifed because creating the type will handle checking that creating the type is valid. From the use side, it's "safe to use" what's provided by the minimum bound.

object[int, S, list[T]] ... How would a constraint solver solve for them?

quick example:

def foo(some_list: list, x: object[int, S, list[T]]):
    # type checker may choose to error or warn that it does not know what S or T, nothing provides that information here
    # may alternatively chose to use the lowest known bound, object.
    some_list[x]  # this is fine, looking for `.__index__` we find it on int
    for item in x:    # `__iter__` is not found on int, not found on S, but found on list[T] without needing to specialize for T
        print(item)  # And the type being Iterator[T], and all types being printable allows this.
    

class Foo[S, T]:
    def __init__(self, stype: S, ttype: T): ...

    def foo(self, some_list: list, x: object[int, S, list[T]]):
        # S and T are now resolvable in relation to something else (the class)
        # but inside the body prior to specialization, may still only be known to the minimum bound, rest same as above

Other Examples:

class InplaceSortable(Protocol):
    def sort(self):
        ...

def foo[T](x: object[Sequence[int], InplaceSortable, T]) -> object[Sequence[int], InplaceSortable, T]:
    # type checker may error here, there's no appropriate way to infer what T is
    # may indicate that the next example is what should actually be used

def foo_fixed[T: object[Sequence[int], InplaceSortable]](x: T) -> T:
    ...

class FooFineHere[T: SupportsIndex]:
    def __init__(self, emitted_type: type[T]):
        self.typ: T = emitted_type
        self.some_list: list[T]
    
    def foo(self, x: object[InplaceSortable, Sequence[T]]) 
        # T fine here, inferable from class
        x.sort()  # this is fine
        x.do_something()  # not fine, nothing provides this
        self.some_list[x[1]]  # This works fine, even if it's nonsense for use as an example, bound of T 

How does this construct differ from a type defined with the existing class statement? Am I correct in understanding that class AB(A, B): ... is effectively describing the same type as type AB = object[A, B] in your proposal? Is object[A, B] just a shorthand way to express the same thing? If so, what problem does it solve over the existing class definition syntax? The class definition admittedly has some limitations and runtime overhead, but it has the added benefit of already existing in the language, and type checkers already know a lot about class statements, providing checks for things like metaclass conflicts, ambiguous MRO ordering, incompatible overrides, attempts to subclass final classes, etc.

It is not the same.

class AB(A, B): ... would be a valid choice for type[A, B], but this is the simplest case possible. class BA(B, A): ... also works. it works with types that aren't known by the function itself (like with type variable), and it also should work with protocols.

This is entirely a type system side feature with the other checks you mention existing when someone tries to make the type that will be used here, so users don't lose out on those checks, but they will be raised to them in other places.

The lookups are intentionally defined to be ordered in the same way mro would, but to allow levels of indirection not specified to keep both reasoning about this simple, as well as implementation.

Some examples showing a few of the ways this is different:

class A:
   ...

class B:
    ...

class ABRealized1(A, B):
    ...

class ABRealized2(B, A):
    ...

class C(B):
    ...

class ABRealized3(A, C):
    ...


type NeedsAB = object[A, B]

# all of these are fine. 
# These are all instances of types which are all nominal subclasses of all of what is specified, LSP verified in respective definitions.
x: NeedsAB
x = ABRealized1()
x = ABRealized2()
x = ABRealized3()

# I believe this is in here, but a protocol indicating supporting rich comparison if not
from useful_types import RichComparable  

def total_ordering(typ: type[T]) -> type[RichComparable, T]:  # see functools.total_ordering
    ...

# Any type wrapped in this is now known to be both a nominal subtype of the original type
# while also being a structural subtype of the protocol RichComparison.

@total_ordering
class Point:
    def __eq__(self, other) -> bool: ...
    def __lt__(self, other) -> bool:  ...

p1: Point
p2: Point

p1 >= p2  # fine

@erictraut
Copy link
Collaborator

OK, I think I understand. To summarize: A class statement defines a specific class, but the type[A, B] construct describes any class that is a subtype of A and B regardless of the class's name or base class ordering. Is that an accurate summary?

I think that implies... If either A or B are nominal classes (other than object), then any class compatible with type[A, B] must also be nominal. If both A and B are structural types (or one or both are object), then a class compatible with type[A, B] could be structural or nominal.

...users don't lose out on those checks, but they will be raised to them in other places.

That's a great point. I think that's really compelling because it eliminates the need to duplicate those complex checks and rules within the intersection type itself. Those errors will be easier to understand if they are surfaced at the point where such a type is defined (e.g. a class statement that uses a @final type as a base class).

Your answer about TypeVar usage makes sense, but I don't think you fully answered my question about constraint solving. I was asking about the case where a call expression targets a generic function or constructor whose signature includes a TypeVar within an object[A, T] type. Here's an attempt to answer my own question. Does this sample look right to you?

class A: ...
class B: ...
class C: ...
class D(A, B): ...
class E(A, C): ...

def call_target[T](val: object[A, T]) -> T:
    return val

def caller(d: D, de: D | E):
    reveal_type(call_target(d)) # B ?
    reveal_type(call_target(de)) # B | C ?

I can think of other more complex constraint solver examples that involve structural types, overlapping types, etc. Some of these don't have such obvious solutions, but similar complex (and unspecified) situations arise when solving TypeVars that are included in unions, so maybe it's OK to leave some of these details up to individual type checkers to figure out. For all the cases I can think of, I'm able to come up with some answer that is reasonable.

As for nomenclature, is there a reason we wouldn't want to use the term Intersection for this concept? I realize that it's not identical to the set-theoretic intersection concept that you've been exploring in other discussion threads, but it is effectively an intersection type, right? That would allow us to spell it as Intersection[A, B] and type[Intersection[A, B]] (or A & B and type[A & B]).

What other limitations are there on types that can be included in this construct? You've mentioned that any class (nominal or structural) is allowed, as are Any and TypeVars. What about unions, literal types, callables, Never, None, type[X]? Are any (or all) of these disallowed?

@mikeshardmind
Copy link
Collaborator Author

OK, I think I understand. To summarize: A class statement defines a specific class, but the type[A, B] construct describes any class that is a subtype of A and B regardless of the class's name or base class ordering. Is that an accurate summary?

Mostly, yes. There's an example with non-disjoint types to consider though, and part of the reason the ordering does matter

class A
    def foo(self) -> int:
        ...

class B:
    def foo(self, some_arg: bool = True) -> int:
        ...

def foo(x: object[A, B]):
    x.foo()  # ok
    x.foo(True)  # not okay

we've specified A as first in order when resolving lookup at static analysis time. In this trivial case we can figure out that anything that subclasses A and B validly must allow this, but we're specifically leaving ordering to the user so that the more complcex cases where a type checker wouldn't be able to make a decision function properly for multiple use cases.

Type checkers may decide to note that swapping the order would fix it to give a helpful message, but should error here ( I mentioned this off-handedly in the part about obviousness of behavior and ergonomics)

I can think of other more complex constraint solver examples that involve structural types, overlapping types, etc. Some of these don't have such obvious solutions, but similar complex (and unspecified) situations arise when solving TypeVars that are included in unions, so maybe it's OK to leave some of these details up to individual type checkers to figure out. For all the cases I can think of, I'm able to come up with some answer that is reasonable.

Actually, that's a compelling reason to not use type and object for this. I wasn't actually anticipating that type checkers would infer type variables in this case, and I don't think your sample there presents something that type checkers should be left to try and make sense of.

obviously in the case of

def foo[T](typ: type[T]) -> T:
    ...

we want the current behavior

def foo[*Ts](typ: type[*Ts]) ->object[*Ts]:
    ...

In this case, is Ts the full mro of the type? Reasonable to do so, but is it useful? idk

def foo[T, U](typ: type[T, U]) -> object[T, U]:
    ...

This case looks fine at first, it's resolvable by having T resolve to the type passed in and U as object (or any other base in the mro)

But it invites the case in your sample you have here where there's no reasonable choice as any choice would inherently be lossy on type information, and there's no singular correct pick that doesn't require understanding the intent of the developer outside of the type system itself.

Here's an attempt to answer my own question. Does this sample look right to you? [Sample left out]

Not really, no. I'd say in both cases here there's not a reasonable choice for the type checker to go with here, the type checker isn't provided enough information to make the right decision unambiguously here, and I don't think that this should be destructuring the mro partially to make sense of what the user is trying to do, the user should need to find a less ambiguous way to write this. If the specific call site shown is the intent, there's no clear reason why this wouldn't be a typevar bound, on a singular type var.

From what the type checker should do, There's multiple answers which won't error, and at least one of these involves understanding all of the call sites, which may not be possible, especially if we consider libraries.

this could present some undesirable cases where the lack of inference doesn't feel consistent, even if there's a consistent underlying rule for the type checker, which may make this harder to teach, learn, and/ or otherwise adopt. I would say from this that it's safe to infer only from call site in the singular case, and in the plural case, there should probably be some other external information here.

As for nomenclature, is there a reason we wouldn't want to use the term Intersection for this concept? I realize that it's not identical to the set-theoretic intersection concept that you've been exploring in other discussion threads, but it is effectively an intersection type, right? That would allow us to spell it as Intersection[A, B] and type[Intersection[A, B]] (or A & B and type[A & B]).

A preference on avoiding the same split in the future that is currently happening with TypeGuards, and a preference to avoid inviting use of & based on the background here.

I would be fine with OrderedIntersection to leave Intersection open for any potential future. People can import it as whatever name they want, but this feels like a forseeable case down the line if something prompts adding the full version we had previously been discussing, and especially with the prior section of this, I now think it's better than using type and object here and will add a footnote to the original post.

What other limitations are there on types that can be included in this construct? You've mentioned that any class (nominal or structural) is allowed, as are Any and TypeVars. What about unions, literal types, callables, Never, None, type[X]? Are any (or all) of these disallowed?

I have no reason to disallow any prior existing type construct here. The fun part about "it just needs to be all of these, and we have an order that the type checker should check for things in", is any error where it's impossible to make it will arise elsewhere when someone tries to make it and can't satisfy LSP or some other constraint of the type system. Literal types are the flimsiest of these, but since I can't see a reason to outright forbid it, I have to wonder if there is someone who will find a use. I also think that since users are responsible for sensible ordering in the case of any overlaps here, there's no additional complexity for type checkers. This should scale linearly with the number of elements in the ordered intersection.

@erictraut
Copy link
Collaborator

There's an example with non-disjoint types to consider though

This is an interesting example because there would be no legal way to define a type with the order A then B.

class Good(B, A): # OK
    ...

class Bad(A, B): # Type error: A.foo overrides B.foo in an incompatible way
    ...

I think that any non-disjoint types are either illegal to combine (if the disjoint member is invariant) or demand a specific ordering (if the disjoint member is covariant).

Just brainstorming... Here are some other terms that might connote "intersection-like" without using the term "Intersection": Overlap, Shared, Joint, Conjunction, Common, Meet, Mutual.

I'm pretty bullish on this proposal. It strikes a good balance between functionality and complexity, and it appears to address most (all?) of the major use cases that were identified for intersections.

Thoughts on next steps? Maybe start writing a more complete spec that could serve as a first draft of a PEP?

@mikeshardmind
Copy link
Collaborator Author

I think that any non-disjoint types are either illegal to combine (if the disjoint member is invariant) or demand a specific ordering (if the disjoint member is covariant).

Hmm. I'm not sure I want this to be specified behavior as mandatory, as this may increase complexity for type checkers when it would already be detected when you try to create something that is that type, but I do think that's a reasonable way type checkers could go above and beyond for users and detect it earlier. Not sure here. If you think this is reasonable to detect at the site of the ordered intersection as well, I can include detection of that in writing up a more formalized draft, but this was something there wasn't good consensus on in the unordered form, and while much of the disagreement should go away with the ordering being part of the construct, I'm unsure if it's necessary.

Thoughts on next steps? Maybe start writing a more complete spec that could serve as a first draft of a PEP?

I should be able to spend time on that soon if there are no other major things to consider here. If we're mostly in agreement on the construct not having major discrepancies across use-cases, then we're down to defining it, specifying it well enough, potentially bikeshedding on a name, and presenting the argument for it to the wider python community as part of the PEP process.

@erictraut
Copy link
Collaborator

If you think this is reasonable to detect at the site of the ordered intersection as well

No, that's not what I meant. Sorry for any confusion. I definitely don't think that it's reasonable to detect that at the site of the type annotation, and I wouldn't want to see that appear in the spec.

My point is that only some base class orders are legal in a real class definition. Both classes Good and Bad are type compatible with OrderedIntersection[A, B], but Bad is not a legal class because it violates the LSP. Do I have that right?

@mikeshardmind
Copy link
Collaborator Author

mikeshardmind commented Jan 16, 2024

My point is that only some base class orders are legal in a real class definition. Both classes Good and Bad are type compatible with OrderedIntersection[A, B], but Bad is not a legal class because it violates the LSP. Do I have that right?

Short answer, yes.

Slightly longer: it's still possible to resolve it even with that ordering (one of the cases against intersection detecting this in the unordered form from before, complexity in determining what the type would result in minimally)

class NoLongerBad(A, B): 
    def foo(self, some_arg: bool = True) -> int:
        if some_arg:
            return super().foo()
        else:
            return 1


def foo(x: OrderedIntersection[A, B]):
    # but we still use the order we told static analysis to look for it here, 
    # so while x supports an arg, we can't use it here unless we reorder here.
    x.foo(True)  # still an error
    x.foo()


foo(NoLongerBad())  # fine

Defering to user provided ordering can in some cases result in wider or narrower cases. The clear reason why deferring to that ordering isn't just to make type checking simpler shows up with Any

@mark-todd
Copy link
Collaborator

mark-todd commented Jan 16, 2024

I think implementation wise I'm convinced - I agree with Eric that a separate name would be better than type and object, what that name is I have less strong opinions about.

Re names, I'm more in favour of single word names for brevity's sake - also I think OrderedIntersection suggests the existence of Intersection (like OrderedDict), which may or may not actually ever exist. That said maybe avoiding using Intersection is a good point.

Overlap, Shared, Joint, Conjunction, Common, Meet, Mutual.

I think Overlap seems decent

Other thoughts - I know I originally suggested Inherit which seems unpopular, but I would also suggest other names that follow the idea that these are things that appear in this order in the mro. So you could also have:

Mixin[A,B], Mixed[A,B], Mixing[A,B], MRO[A,B]

Inherit[A,B], Inherits[A,B], Inheritance[A,B], InheritsFrom[A,B]

@mark-todd
Copy link
Collaborator

Other related naming ideas based on the same concept:

Extends[A,B] (this is my current favourite)
Derives[A,B], Derived[A,B], DerivesFrom[A,B]

@CarliJoy
Copy link
Owner

CarliJoy commented Jan 16, 2024

Disclosure: I haven't read understood everything in detail yet.
But from a teaching/learning perspective using type[] and object[] seems to be a lot easier because it is much more similar in how type[T] or list[str] is used already.

I got a bit lost in why Problems with resolving somehting like def call_target[T](val: object[A, T]) -> T: ... are related to a naming this type or object .

Wouldn't the problem not also exist for set based intersections ala def call_target[T](val: A&T) -> type[T]: ... ?

@mark-todd
Copy link
Collaborator

mark-todd commented Jan 16, 2024

I got a bit lost in why Problems with resolving somehting like def call_target[T](val: object[A, T]) -> T: ... are related to a naming this type or object .

@CarliJoy Just to clarify, there are really two distinct options presented in terms of names:

1. Using a separate name:

For the instance: Extends[A, B]
For the class: type[Extends[A,B]]

What that name is matters less - hence various options discussed - as it doesn't affect behaviour in any way, and you can do import as whatever you want. I've used Extends as a placeholder but many other names are available :)

2. Using type and object

For the instance: object[A, B]
For the class: type[A, B]

This presents a quite difference interface - we've been discussing if this is a good interface or not mainly.

But from a teaching/learning perspective using type[] and object[] seems to be a lot easier because it is much more similar in how type[T] or list[str] is used already.

The type part of this I completely agree on - it follows naturally from the existing use of type. I would argue though that object[A,B] is less natural as it has no current equivalent in python, which is why I prefer the separate name option.

Also we've discussed equivalences that come out of this, and whether these matter such as:

type[object[A,B]] == type[A,B]
object[A] == A

One thing I like about option 1 is that only one of these exists: Extends[A] == A

@mark-todd
Copy link
Collaborator

mark-todd commented Jan 18, 2024

@mikeshardmind I've been pondering a little thought experiment following our discussion on object[A] or Extends[A] being equivalent to A.

It'll only look strange in cases where people have the single form, in which we can say that people should prefer not wrapping in object[] when un-needed.
I wouldn't want to special case and preclude that though since the equivalence is trivial.

This set me thinking - is there any precedent in python for banning single item type wrappers when multiple args is allowed? I discovered there is:

from typing import Union

class A:
   pass

Union[A]

The above returns in pyright Union requires two or more type arguments - even though a single item union could be trivially resolved to A. So I think saying say Extends must take two of more arguments would if anything follow more closely with what python already does, and we remove the equivalence Extends[A] == A.

@tomasr8
Copy link
Collaborator

tomasr8 commented Jan 18, 2024

Pyright might not like it, but the Union itself allows it and in fact Union[A] == A e.g. Union[int] == int

@mark-todd
Copy link
Collaborator

mark-todd commented Jan 18, 2024

@tomasr8

Pyright might not like it, but the Union itself allows it and in fact Union[A] == A e.g. Union[int] == int

Oh really? That's interesting - so the type checker is more restrictive than the PEP. If the PEP for Union says this is fine then I think similarly we can leave it up to the type checkers to ban it if they want to.

@mikeshardmind
Copy link
Collaborator Author

mikeshardmind commented Jan 23, 2024

There are a couple issues being discussed on discourse that may effect the specific language required, specifically around Annotated and Literal. I'm advocating what I believe are sensible interpretations with those, but there are other interpretations which could cause needing special behavior for intersections with Annotated or Literal. If those come to pass before this is accepted, I'll need to do some extra work with specification here.

I'll try and wrap the draft specifying this up by the end of the week for review since there have been no major detractions from it in a week. I'm a little behind on this compared to my intent to have the draft ready today, my apologies to the couple people who asked about my timeline on this out of band.

I'm going to use the name OrderedIntersection in the draft, we can bikeshed on a better name once it's the last thing left if people have concerns about the name, but I'd rather focus on getting the behavior specified clearly and make sure there are no issues with that

@vergenzt
Copy link

vergenzt commented Jan 31, 2024

As someone who hasn't followed the nuanced / in-the-weeds discussion too well:

Would anyone be willing to summarize what might be wanted from a typing.Intersection that is not addressed by this proposal? From my perspective, MRO seems like an essential part of typing in Python, so an Intersection that doesn't take into account ordering seems like it would be incomplete. 🙂 I may just be missing something though.

@vergenzt
Copy link

vergenzt commented Jan 31, 2024

... requiring an import (and keep in mind that typing is considered expensive to import, and many people avoid doing so) for this when object is right here and a reasonable parallel seems like it makes this worse.

Also, my 2¢ on the import issue: IMO if folks are sensitive to typing import costs, then they should wrap type imports in if TYPE_CHECKING: and annotate using strings.

@vergenzt
Copy link

vergenzt commented Jan 31, 2024

Given the relative rarity of expected use cases for this (as evidenced by the fact that they don't exist yet 🙂), I'm 👎 on overloading the term object for this. It just feels very counter-intuitive to me when so many other features of the type system so frequently rely on imports.

I vote typing.Intersection, perhaps also with the & operator.

@vergenzt
Copy link

vergenzt commented Feb 1, 2024

There's also the fact that this would cause X & Y to be a different type from Y & X. While we desire the ability to express these as separate concepts so that users can determine the correct behavior in case the type checker cannot know by definition (such as with Any as an operand), the ordering here and why it is needed is subtle. Calling attention to it in the name of the type form itself helps prevent this from becoming a gotcha.

This was super helpful for understanding the issue. Thanks. I see now why you don't want to call it an "Intersection": because the word "Intersection" might give a mistaken expectation of commutativity of the arguments when they don't in fact commute.

@vergenzt
Copy link

vergenzt commented Feb 1, 2024

This contrasting commutativity between Unions vs the proposed spec for Intersections is making me think about a parallel to sets and dicts.

E.g. set(*a, *b, *c) is commutative (a, b, and c can be in any order and the result is the same), while dict(**a, **b, **c) is not commutative if they share any keys. (dict(**a, **b, **c) may give you a different result than dict(**c, **b, **a).)

The similarities may end there (I'm not sure), but seems like interesting food for thought.

@vergenzt
Copy link

vergenzt commented Feb 1, 2024

What about typing.All[*Ts] as a name for this concept? I.e. indicating a type that must satisfy all of the named types? Seems to me to convey less of an expectation of commutativity than "Intersection" might, given that the boolean equivalent isn't necessarily commutative due to short-circuiting. (Note: I know the result of a boolean all call is the same regardless of order, but side-effects could be different since the predicate may be invoked on a different set of instances, so it's not the same.)

@mark-todd
Copy link
Collaborator

dict(**a, **b, **c) is not commutative if they share any keys. (dict(**a, **b, **c) may give you a different result than dict(**c, **b, **a).)

It's an interesting comparison - I think there's a good reason for the second point if we consider the annotations accessible in each:

class A:
   foo: int
   bar: str
class B:
   bar: float

class C(B,A):
    pass

class D(A,B):
    pass

print(A.__annotations__)
print(B.__annotations__)
print({**A.__annotations__, **B.__annotations__}) # C
print({**B.__annotations__, **A.__annotations__}) # D

Here the results are representative of the attributes available in C and D

@mark-todd
Copy link
Collaborator

What about typing.All[*Ts] as a name for this concept? I.e. indicating a type that must satisfy all of the named types? Seems to me to convey less of an expectation of commutativity than "Intersection" might, given that the boolean equivalent isn't necessarily commutative due to short-circuiting. (Note: I know the result of a boolean all call is the same regardless of order, but side-effects could be different since the predicate may be invoked on a different set of instances, so it's not the same.)

I see where you're going with this idea - I think maybe it's better to avoid overlap with the existing all. Personally I think whatever name we go for should probably be clearly distinct from existing language points. That said, I did wonder about extending the use of the existing Concatenate, but similarly I think it might make it too confusing.

Also any and all are kind of related - this would of imply that typing.Any and typing.All are somehow related which may be also unwise...

Extends is still my favourite

@mark-todd
Copy link
Collaborator

I'm now starting to wonder if I dismissed the Concatenate idea too quickly - at the moment it's used to combine ParamSpec variables with Arguments - is there any reason why it couldn't be extended to become "Intersection"?
https://docs.python.org/3/library/typing.html#typing.Concatenate

@mark-todd
Copy link
Collaborator

What about typing.All[*Ts] as a name for this concept? I.e. indicating a type that must satisfy all of the named types? Seems to me to convey less of an expectation of commutativity than "Intersection" might, given that the boolean equivalent isn't necessarily commutative due to short-circuiting. (Note: I know the result of a boolean all call is the same regardless of order, but side-effects could be different since the predicate may be invoked on a different set of instances, so it's not the same.)

Ok another idea bouncing off @vergenzt 's thought on typing.All. If we're going to conceptually relate all with intersection, we could go the whole way and say:

class: type[all[A, B]]
instance: all[A, B]

This does suffer this issue I mentioned earlier of being a "harder sell". Also just noticed something:

a: list[A] = list([A()])
a: all[A, B] /= all([A(), B()])

I think this probably means we don't want to go down this road - just wanted to follow the thought through to the end.

@mikeshardmind
Copy link
Collaborator Author

Had a slightly radical idea re syntax, what about:

Class: x: type[A,B]
Instance: x: [A,B]

This has the advantage that it follows the same pattern as everything else, with the type keyword.

I don't like this for stylistic reasons, but even beyond that, this probably wouldn't work, as this would get very muddy with runtime use of the type system. That expression is a list literal with types as it's items.

Other quick name notes:

Extends: When I think about extends, I actually am reminded that unlike python's MRO, the languages I know that use this, the order isn't allowed to matter or even to extend from multiple. For instance Java's use of extends only allows direct subclassing, Kotlin then has syntax sugar to allow multiple, but multiple class inheritance is prohibited while multiple interfaces may be implemented by a single class. They don't have the diamond pattern we're avoiding here because they just don't allow that kind of subclassing. I'm not sure using it would be a problem, but I wonder if using this is going to lead to initial confusion for people.

All: While I see the parallel, I don't think it's a good name for this and would prefer not to use it when there could later be other things in the type system that would use this name more effectively. If refinement types are ever added to the type system, All and Any might be desired to chain together statically analyzable predicates. We can pick a name more conceptually close to what we are doing.

OrderedIntersection: It's more verbose (good) but takes up more horizontal space. I think this is "okay" with type alias syntax. I'm not stuck with this as the only option, but it's effective and descriptive of what we want the behavior to be.

@mikeshardmind
Copy link
Collaborator Author

I'm now starting to wonder if I dismissed the Concatenate idea too quickly - at the moment it's used to combine ParamSpec variables with Arguments - is there any reason why it couldn't be extended to become "Intersection"?

There's very little friction to adding a new type form, and not enough reason to overload the behavior of concatenate IMO. This could lead to hard to read type signatures if an intersection involves multiple callables that use concatenate themselves.

@mark-todd
Copy link
Collaborator

Extends: When I think about extends, I actually am reminded that unlike python's MRO, the languages I know that use this, the order isn't allowed to matter or even to extend from multiple. For instance Java's use of extends only allows direct subclassing, Kotlin then has syntax sugar to allow multiple, but multiple class inheritance is prohibited while multiple interfaces may be implemented by a single class. They don't have the diamond pattern we're avoiding here because they just don't allow that kind of subclassing. I'm not sure using it would be a problem, but I wonder if using this is going to lead to initial confusion for people.

I think I got the idea for this from a document on diagram structures, that I used it for multiple inheritance. It's also used by scss for multiple inheritance I believe.

OrderedIntersection: It's more verbose (good) but takes up more horizontal space. I think this is "okay" with type alias syntax. I'm not stuck with this as the only option, but it's effective and descriptive of what we want the behavior to be.

I'm fine with this as a name but I do think it's a bit long - I wondered about OIntersection.

I think it'd be worth having a call with anyone who wants in to brainstorm and pick a name if that's the only outstanding thing - it's not worth discussing for aeons, but I think a quick discussion wouldn't hurt. If OrderedIntersection is the most descriptive name we can come up with after that I think let's just go with that.

@mark-todd
Copy link
Collaborator

Also I just went through the thread and gathered up all the possible names that have been discussed - I'm excluding any that overlap with any existing names here:
https://docs.google.com/spreadsheets/d/1rJlt5UfSOkVey2g5p634mKa0NdsCdpuZDwDfs4CnsiY/edit?usp=sharing

Feel free to add more if people have more thoughts

@CarliJoy
Copy link
Owner

CarliJoy commented Feb 2, 2024

Thank you for the list mark.
I would propose the following:

We use OrderedIntersection in our PEP to get it done finally and we don't introduce the & Operator for this, as we might need it for the "proper" Intersecion.
This way we reduce the bikeshedding.

Once the the PEP is handed in, we can create a survey in Typing on discuss python. During the PEP discussion process we still can change the name of the operator.
But the title "Ordered Intersection" for the PEP itself seems just to be exact and catchy.

As mark pointed out, shortcuts like type[*Ts] can still be introduced later one. I would suggest to rather not put energy into it yet.

Please react with 👍🏻 or 👎🏻 if you like/dislike this proposal on the progress.

@mark-todd
Copy link
Collaborator

Please react with 👍🏻 or 👎🏻 if you like/dislike this proposal on the progress.

Yeah, I think it's worth moving ahead with this in the way you describe - couple of points on the name I just wanted to mention, but happy to park the discussion.

As a little thought experiment, I took a look at the existing names in the typing module, to see what the longest ones were. Longest I've found were LiteralString and TypeAliasType both at 13 characters. OrderedIntersection is 19 characters, which is quite a bit longer, but also it's not crazy long.

I did also think maybe it's worth noting how likely it is the regular unordered intersection could exist. They could just use UnorderedIntersection for this one if it was ever figured out, and then we can use Intersection. It's a little bit first come first serve in my mind, and a hypothetical where someone magically solves all the problems we've discovered seems extremely unlikely to me.

I also think the majority of people would just start their script from typing import OrderedIntersection as Intersection which although not a huge deal seems a bit of a waste of time.

As I say though, if OrderedIntersection is the best we can get for now I think it's fine as a placeholder.

@mikeshardmind
Copy link
Collaborator Author

This approach will not be possible to implement in a consistent manner without larger changes to the type system.

Those same changes would allow just not having an ordered intersection. I'll be following up with both a summary of my findings on this, as well as details and references to other related discussions at a later point in time.

@mark-todd
Copy link
Collaborator

mark-todd commented Apr 8, 2024

@mikeshardmind Is there no way we can make a specification that allows for the special cases as they are currently? I'm judging from the other discussion that making the proposed changes to the way Any subclassing works won't be possible, but I don't think that necessarily means we have to scrap intersections altogether. So long as there is a defined way that it behaves can't we just make a special case in intersections for now? Perhaps I'm missing something more fundamental, but it seems to me that the above approach works for almost all the cases we've considered which seems like something still worth pursuing.

@mark-todd
Copy link
Collaborator

Another thing I've been considering is that it seems we're trying to resolve a lot of cases, some of which may not come up very much in practice. If we can determine exactly what these cases are, is there a way we can deliberately exclude them from the specification? While it may not be the most satisfying theoretically, I think perhaps an approach where we create an intersection that satisfies 99% of use cases is definitely the most practical. Once we have this "minimum viable intersection", there's no reason why this couldn't be expanded by some future pep to be more complete when other issues in the typing system are resolved.

@mikeshardmind
Copy link
Collaborator Author

There is no useful to users way to reconcile Any in an intersection with the current behavior type checkers have, and I consider the behavior of having Any as a base class to be fundamentally broken right now. I'm unwilling to put my name behind a proposal that makes this situation worse and harder to untangle in the future by layering on top of something that is in this state, and I'm unwilling to expend more effort on this proposal until a time when it can be done without these problems.

The "best case" for accepting intersections in the current state with what we explored and with how this needs to be done in a way that the underlying problems can be fixed without people saying "but we're relying on the current behavior" (see TypeGuard vs TypeIs for a simpler example of how fixing behavior isn't trivial) right now would be to have intersections that were T & Any is T & Any, unordered, not special-cased, and that type checkers should know that T places a theoretical lower bound on T & Any, but for now they are free to treat it as if it were Any.

@mark-todd
Copy link
Collaborator

mark-todd commented Apr 8, 2024

That's fair enough, I respect your position on this. You've certainly put in a lot of effort already into trying to figure out a path forward with it - I'm quite keen for all our efforts not to go to waste, so I might at some point try and continue this forward in some form. Would you be happy if I use your draft for ideas on some parts in drafting a new PEP? I think there's a lot of good ideas here, and I like to think that there might be a way forward with it.

@mikeshardmind
Copy link
Collaborator Author

@mark-todd

As it stands, I believe that intersections are likely to cause more harm than good if accepted without addressing the underlying issues around how we handle gradual types within the type system. I won't go as far to say that you shouldn't feel free to use what was discussed intentionally in the open to try and steer a way to salvage the effort, but I think we're at a point where attempts to do so are potentially misguided and a product of the sunk cost fallacy, so I would urge caution in actually doing so.

I think other productive things could be worked on that wouldn't be intersections, but which came about throughout these discussions and would help those who want them, such as type[*Ts] with fewer associated problems attached (see the semantics of types.new_class for what the behavior of that should be, and why type checkers can just assume it to be identical to a class declaration)


I'm burnt out on this. I was hopeful we could find some sort of compromise. I've worked for that compromise and balancing of various concerns through everything from people telling me I was too concerned with theory, to not enough on theory, to that I was unwilling to compromise*, only to conclude that right now there seems to be too much working against this feature without other things that are at the heart of why compromise seemed needed in the first place being addressed first.

* The only thing I've been unwilling to compromise on, is that I think we needed to be able to show that this would be a net positive for users, and would not make other typing features worse for users as a side effect. I think that's a pretty tame constraint when discussing being confident enough in a feature to propose it for standard inclusion.


I won't be detailing all of the fatal flaws you'd need to overcome right now (see above about burnout, I don't want to retread everything to go pick out every single minor detail that must be addressed), but a brief handful of major ones until I've had time to just let things be for a while are below.

  • Type-checking allowing gradual typing is directly opposed to the needs of those who only use type-checking indirectly via language server completions powered by type information, and other similar tooling. The latter don't want to be told it's "valid" to do anything to an object, they want to know what is known about the value. While type-checking without gradual typing does not have this conflict, python does not always have a way to denote things that are valid and even some of the staunchest typing advocates still have uses for Any in Python.

  • __getattr__ being defined on a known type directly conflicts with the dynamic behavior of Any

  • construction is special-cased in Python's type system to allow LSP violations for constructors (I believe this is a misfeature, but it's far too late to change that without a Python 4). type[Intersection] is incredibly fragile, and attempts to work around the prior issues via special casing rather than generally applicable rules may cause irreconcilable issues for construction.

  • Banning Any from an intersection was ruled out as violating the rule that you should be able to take any fully typed expression, and replace a part of it with Any without introducing new type errors. You'll find that many other attempts to special case intersections will violate this rule as well.

If after reading that, you still think that's worth pursuing right now rather than in a future where some of the issues have been resolved ahead of intersections, I wish you the best of luck with it.

@mark-todd
Copy link
Collaborator

@mikeshardmind

Thanks for the summary, that's very useful. I actually had a slightly new idea last night based on a lot of what we've discussed and the PEP that's already been formulated. I have a feeling it would address a lot of these issues, but it's good to have some of them gathered in one place so I can potentially address previous discussions in the PEP.

It'll take a bit of time for me to flesh out this PEP, but I'm hopeful that there's a way forward - I wish you all the best.

I'll close this thread as it's getting quite long and I think I'll probably make a fresh start with the new one.

@hoelzeli
Copy link

@mark-todd Hi, do you have any updates/are you currently working on this? I stumbled across https://github.com/python/typing/issues/213 today and went down the rabbithole into this repository, and while I do not have a solution to the issues presented here, @mikeshardmind's alternate approach made me think that something like a HasMRO[T1, T2, ..., TN] type annotation (with obviously a more elegant name) could solve a few common issues like annotating mixins (cls type in classes could be HasMRO[Self, OtherMixinType] or similar for mixin classes) and annotating class decorators that add additional members could be solved in a similar way (Callable[[T], HasMRO[Added, T]). I do not have a full overview over all issues yet and basically no formal training in type theory, so I might miss obvious things.

I would expect HasMRO to specify that the MRO of an object or a type is similar to the MRO of class HasMROEquivalent(T1, T2, ..., TN). This means that T1...TN can be replaced by sound subclasses of each type (sound in the sense that the signature of all methods stay unchanged) and an object can inherit from more than T1...TN, but only if the additional classes in the inheritance hierarchy do not change the member types in class HasMROEquivalent(T1, T2, ..., TN). Additionally, changing the order of the superclasses should be ok as long as the resulting "interface" does not change. I would explicitely ignore constructors, as their signature are allowed to change in subclasses.

By tieing the meaning of the annotation to the actual inheritance mechanism in python, we could mostly get around any additional definitions (maybe there is some benefit in allowing for example class HasMROEquivalent(int, str); although we would never be able to construct this, we could still use the "interface" definition from the type ordering). We would not have to worry about the return values or accepted argument types as all of this should already be figured out from "normal" class inheritance. I think that this addition should not prevent a later addition of full-fledged intersections, but would solve a small subset of the issues intersections would solve.

Any should be relatively trivial as well: HasMRO(T1, ... Any, TN, TN + 1, ...) is equivalent to HasMRO(T1, ... Any), as Any would not 'forward' __getattr__ calls to superclasses, but return Any instead.

I currently have no idea about the internals of typecheckers; therefore I cannot judge whether this would be heavy on hot paths, but I think that checking whether a given object complies with a HasMRO annotation should be relatively straightforward: check whether each member exists in the type-checked object and whether it has the correct type. This should also work if there are TypeVar(s) in the inheritance list (I can write up a small example algorithm for this if this idea is worth pursuing).

In my mind, it would make sense to allow the expression class[T1, T2, ..., TN] to be used for this annotation, as it would mean exactly that: a type that behaves like a class inheriting from T1 ... TN (although the keyword "class" might become too convoluted with this addition).

I realize that this issue is probably the wrong place to discuss this further, so what would be an appropriate place for this? Should I open a new issue on the python typing repository? I saw at least one similar suggestion there, but as far as I remember, it did not yield any significant results/pointed to the future implementation of intersections.
In the long term, I would be willing to work on a PEP for this, but I also realize that I probably missed some obvious issues with this and there are still lots of corner cases to be discussed, hence the question.

@mark-todd
Copy link
Collaborator

Hi @hoelzeli - thanks for taking an interest in intersection!

Just to provide some context, not much has happened on this since around April (although later than the current thread). Essentially we had a lot of discussions, went back and forth on a lot of the trade offs, but couldn't agree how to proceed in a coherent way.

Personally, I was very much in favour of an "MROEquivalent" version of intersection - I had a go at making a PEP for essentially this here: #49 (There were several debates on the name - in the end people were in favour of calling it Intersection, but honestly I think if another name was less controversial that might be better).

...however, I understand there was one particular edge case involving __getattr__ I believe where essentially it didn't follow the usual MRO logic, and things kind of fell apart.

A lot of the reasoning against taking the name Intersection was that someone might eventually come up with a formulation that would not depend on the MRO in this, and would be essentially be Unordered.

It's also worth noting that for the Protocol case it isn't quite as straight forward as following the MRO, but I think we can use this to guide the formulation.

In terms of how we proceed - I'd be keen to develop this into a feature people could use - perhaps we start off with the draft PEP I had before, but essentially "rebrand" it away from being an intersection as such?

@mark-todd
Copy link
Collaborator

Oh also perhaps let's discuss here: #48

This is the later issue on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants