Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out story about diamond inheritance #1065

Open
gnprice opened this issue Dec 11, 2015 · 14 comments
Open

Figure out story about diamond inheritance #1065

gnprice opened this issue Dec 11, 2015 · 14 comments

Comments

@gnprice
Copy link
Collaborator

gnprice commented Dec 11, 2015

In mypy/maptype.py, we have a function class_derivation_paths that finds all the paths up the hierarchy from a derived class to a given ancestor class. The code implicitly assumes that this function will only ever return a single path, for example by silently taking the first element of a list derived from its result and dropping the rest:

    return map_instance_to_supertypes(instance, superclass)[0]

And indeed there are a couple of assertions in comments in the code that this function should only ever find a single path:

    # FIX: Currently we should only have one supertype per interface, so no
    #      need to return an array
[...]
    # FIX: Currently we might only ever have a single path, so this could be
    #      simplified

But this doesn't seem to be true! In fact, if we set up the classic diamond inheritance pattern:

  A
 / \
B   C
 \ /
  D

then we do in fact get two paths, one through B and one through C. Demonstrated by adding an assert and a triggering test case here: gnprice@a85876521
which produces this exception:

AssertionError: [[<TypeInfo __main__.B>, <TypeInfo __main__.A>], [<TypeInfo __main__.C>, <TypeInfo __main__.A>]]

What should our approach to this situation be?

In roughly increasing order of semantic and implementation complexity, some options include

  • forbid this situation -- require that there's only a single inheritance path from a given derived class to a given ancestor, and say that code that uses diamond inheritance can't be typed
  • allow diamond inheritance only if the ancestor is not generic
  • allow diamond inheritance only if all the inheritance paths can be verified, just from the class definitions, to always result in the same instance of the ancestor for any instance of the derived
  • allow diamond inheritance only if the actual instances of the derived class that we encounter each result in a single instance of the ancestor across all inheritance paths (this is uncomfortably reminiscent of C++ templates)
  • something even more complicated that allows inheriting from several different instances of the same ancestor

I'm inclined to start with option 1, as it's super simple to understand and to implement, and see if that ever poses a problem. Maybe nobody actually uses diamond inheritance in code they want to try to type; it's a pretty tricky pattern in the first place. If we want to do this, we should detect the situation up front (like in verify_base_classes in semanal.py) and give an appropriate error.

Option 2 is also super simple to implement, if people use diamond inheritance but they don't need generics with it. Somewhat less satisfying as a rule for the user.

I'd be kind of concerned if we find ourselves starting to think we need one of the more complex options.

@gvanrossum
Copy link
Member

I haven't looked at the code, but why is this not using Python's standard MRO algorithm? In Python 3 that's always used and in Python 2 it's used for new-style classes (those deriving from object).

@gnprice
Copy link
Collaborator Author

gnprice commented Dec 11, 2015

I think of the MRO algorithm as providing an answer to the question "which of my ancestors do I get method foo from", when several ancestors provide a method with the same name. Which it does by providing a total ordering of the ancestors compatible with the partial ordering of the inheritance hierarchy.

Does the MRO algorithm also provide an answer to the question "which path of intermediate ancestors do I get to ancestor Foo through"? It's not obvious to me how it does that, though maybe I'd just need to go back and reread how it works and think more.

@gvanrossum
Copy link
Member

gvanrossum commented Dec 11, 2015 via email

@JukkaL
Copy link
Collaborator

JukkaL commented Dec 13, 2015

This is my favorite (assuming we tweak this to also consider Any base classes):

allow diamond inheritance only if all the inheritance paths can be verified, just from the class definitions, to always result in the same instance of the ancestor for any instance of the derived

However, initially just not doing anything about it is probably okay, as this is not very likely to cause much trouble.

@JukkaL JukkaL added the feature label Dec 13, 2015
@gvanrossum
Copy link
Member

gvanrossum commented Dec 13, 2015 via email

@JukkaL
Copy link
Collaborator

JukkaL commented Dec 13, 2015

The writeup uses internal mypy terminology which makes it difficult to follow.

Here is an example about where this would make a difference:

class A(Generic[T]): pass
class B(A[int]): pass
class C(A[str]): pass
class D(B, C): 
    # This would be a subtype of both A[int] and A[str]!
    pass

The rule I quoted would disallow inheritance hierarchies such as the above because the different ways of getting from D to A would result in different base types (A[int] and A[str]).

This would be okay, though:

class A(Generic[T]): pass
class B(A[int]): pass
class C(A[int]): pass
class D(B, C): 
    pass

@gvanrossum
Copy link
Member

Ah, I see. OTOH inheriting from two different parameterized generic classes
should still work right?

class A(Generic[T]): pass
class B(Generic[T]): pass
class C(A[int], B[str]): pass

Also, I'm still unsure why the code needs to look for paths rather than
check the MRO for some condition (like occurrences of the same generic
class -- but not Generic itself -- parameterized differently).

@JukkaL
Copy link
Collaborator

JukkaL commented Dec 13, 2015

Yeah, your example would be no problem because A and B are not related.

Paths are needed because you can have some really weird, complex inheritance hierarchies. Here is an example:

class A(Generic[T]): pass
class B(A[int]): pass
class C(A[T]): pass
class CC(C[List[T]]): pass
class D(B, CC[int]): 
    # This would be a subtype of both A[int] and A[List[int]]!
    pass

Now we'd look at the paths D -> CC -> C -> A and D -> B -> A to determine if things are compatible.

@gvanrossum
Copy link
Member

Actually Python itself (actually typing.py in the stdlib) doesn't currently
like that definition of CC: I get

TypeError: Cannot inherit from a generic class parameterized with
non-type-variable typing.List[~T]

Strangely, mypy gives an error on C:

/Users/guido/mypy_tests/mro.py:5: error: Invalid type "mro.T"

The source:

from typing import TypeVar, Generic, List
T = TypeVar('T')
class A(Generic[T]): pass
## class B(A[int]): pass
class C(A[T]): pass

This is an area where mypy, PEP 484 and typing.py all seem to have
different rules (and the PEP is pretty vague).

@JukkaL
Copy link
Collaborator

JukkaL commented Dec 13, 2015

Mypy wants you to do it like this (because of #302):

class A(Generic[T]): pass
class B(A[int]): pass
class C(A[T], Generic[T]): pass
class CC(C[List[T]], Generic[T]): pass
class D(B, CC[int]): 
    # This would be a subtype of both A[int] and A[List[int]]!
    pass

The code is still reasonable from a type checking point of view, other than having the incoherent generic base class problem. Maybe typing.py should not reject it?

@gvanrossum
Copy link
Member

gvanrossum commented Dec 13, 2015 via email

@gnprice
Copy link
Collaborator Author

gnprice commented Dec 15, 2015

This is my favorite (assuming we tweak this to also consider Any base classes):

allow diamond inheritance only if all the inheritance paths can be verified, just from the class definitions, to always result in the same instance of the ancestor for any instance of the derived

However, initially just not doing anything about it is probably okay, as this is not very likely to cause much trouble.

Yeah, I think this one is probably the right semantics in the end. It's just a little complicated to implement. Either this or either of the simpler options mean that the maptype code really can just take one path and not worry about the possibility of several, which will also allow simplifying that code.

I'd rather do a little more than make no change to the code -- I'd like to bring the comments in line with the code's behavior one way or another. The simple way would be to do, say, my second option for now:

allow diamond inheritance only if the ancestor is not generic

enforcing it somewhere like verify_base_classes, and then adjust the maptype comments to describe the actual situation which is that all paths are equivalent. (This might be what you meant already.) Sound good?

gnprice added a commit to gnprice/mypy that referenced this issue Dec 22, 2015
@gnprice
Copy link
Collaborator Author

gnprice commented Dec 22, 2015

Hmm. I implemented that simple second option:

allow diamond inheritance only if the ancestor is not generic

Pushed it as master...gnprice:diamond for reference.

But it turns out we actually already use diamond inheritance with generics! The unit tests tell me this -- I produce errors in typeshed/builtins/3/builtins.pyi on bytearray, list, and range, which diamond-inherit from Sequence[int], Reversible[_T], and Reversible[int] respectively.

The latter two seem to be down to a mismatch with the typing module -- we have list and range derive directly from Reversible as well as Sequence, but the typing.pyi in typeshed/stdlib/3/ has Sequence itself derive from Reversible[_T_co], which makes that redundant. Our own two typing.py files in lib-typing/{2.7,3.2}/ do no such thing, and probably a previous version of typing.pyi didn't either. builtins.pyi should just be updated to reflect that.

The other one is more real, though:

class bytearray(MutableSequence[int], ByteString):

while versions of both typing.py and typing.pyi consistently have both MutableSequence and ByteString derive from Sequence.

So maybe we actually do need option 3:

allow diamond inheritance only if all the inheritance paths can be verified, just from the class definitions, to always result in the same instance of the ancestor for any instance of the derived

Or can we do without some of this hierarchy over bytearray?

@gnprice
Copy link
Collaborator Author

gnprice commented Dec 22, 2015

(Fixing that mismatch in python/typeshed#32 , but that still leaves bytearray.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants