Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Descriptors #244

Closed
JukkaL opened this issue Jul 16, 2013 · 21 comments
Closed

Descriptors #244

JukkaL opened this issue Jul 16, 2013 · 21 comments
Labels

Comments

@JukkaL
Copy link
Collaborator

JukkaL commented Jul 16, 2013

Support descriptors in the type checker.

@JukkaL JukkaL removed the front end label Jul 25, 2014
@JukkaL JukkaL added the priority label Dec 8, 2014
@JukkaL JukkaL removed the priority label Mar 17, 2015
@NYKevin
Copy link

NYKevin commented Jan 20, 2016

Since properties are now largely functional (see #220), this can be worked around by writing the descriptor as a property in a stub file. This does, however, require a fair amount of coding since you need to rewrite the same property declaration everywhere the descriptor is instantiated (not to mention that a property declaration is considerably longer than a typical descriptor instantiation). The whole point of making your own descriptors is to avoid having to do this, so this is actually a pretty serious loss in terms of duplicate code.

As a stopgap, it would be nice if there was a way to make type aliases of properties, or if we had some kind of typing.Property[get, set] thing, but I'm not even sure that makes sense.

@gvanrossum
Copy link
Member

Jukka may understand what this is about -- but for my and others' benefit, can you post some examples of what the work-around looks like and what you think should be supported instead?

@NYKevin
Copy link

NYKevin commented Jan 20, 2016

@gvanrossum Suppose you have a class like this (a descriptor class):

class Foo:
    def __get__(self, obj: Optional[SomeType], klass: type) -> GetType:
        ...
    def __set__(self, obj: SomeType, value: SetType) -> None:
        ...

In most cases, GetType and SetType are identical. You might also have a __delete__(), but I'm leaving that out for the moment for simplicity.

Then you probably (read: in my use case) have a bunch of declarations like this:

class Bar(SomeType):
    foo = Foo(...)

In the stub file, you can write this:

class Bar(SomeType):
    @property
    def foo(self) -> GetType:
        ...
    @foo.setter
    def foo(self, value: SetType) -> None:
        ...

But that's a lot of code, and it defeats the purpose of making Foo (the class) in the first place: You now have to write out a property declaration everywhere you use Foo (or else live with uncheckable code). I don't want to write that every time; it's why I wrote Foo in the first place. This is also a lie, since Bar.foo is not a property at all, and in fact, Bar.foo is annotated to be of type GetType* (which is probably wrong, but I didn't want to involve @override because it would make the example harder to read, and a union would be imprecise). But so far as I'm aware, it should work in most realistic use cases.

As a stop-gap measure, it would be nice if we could just write this:

class Bar(SomeType):
    foo = Foo(...) # type: Property[GetType, SetType]

This is still a lie, but at least it's not a lie that takes six lines to express.

*: Bar.foo is equivalent to Bar.__dict__['foo'].__get__(Bar.__dict__['foo'], None, Bar), and __get__ is annotated to always return GetType.

@gvanrossum
Copy link
Member

OK, I think I see it now. Is there any way you could write this Property thing without changing mypy? (Then we could talk about adding it to typing.py or something like that.) BTW I think it should not be called Property -- maybe Descriptor?

@NYKevin
Copy link

NYKevin commented Jan 22, 2016

@gvanrossum: I was thinking of this as a stopgap (precisely equivalent to the @property declaration), but if we're looking at adding a whole abstract Descriptor class, I wouldn't object.

Having said that, if we're doing this, we need to get the semantics right. I would interpret it something like this:

G = TypeVar('G')
S = TypeVar('S')

class Descriptor(Generic[G, S]):
    def __get__(self, obj: object, owner: type) -> G:
        ...
    @overload
    def __get__(self, obj: None, owner: type) -> ???:
        ...
    def __set__(self, obj: object, value: S) -> None:
        ...
    def __delete__(self, obj: object) -> None:
        ...

The ??? indicates that I'm not sure what to write for that case. The convention is to return self when obj is None. ISTM that type(self) ought to be known at compile time, and would be more specific than Descriptor[G, S]. But maybe I'm overcomplicating this. I am quite certain, though, that we do need the @overload; you can't properly express this with Unions, and the casting would get very tiresome if we tried.

The difficulty here is that all three methods are independent of each other, meaning we arguably have three separate abstract classes. OTOH, it is uncommon to implement __set__ without __get__, and __delete__ is rarely implemented at all. My interpretation* of del foo.bar is "make foo.bar raise an AttributeError next time it's used," which is seriously problematic from a type safety perspective (now you can't rely on instances of a given class having a given attribute). So maybe we just need Descriptor and DataDescriptor; the former provides __get__, while the latter is a subclass which also provides __set__. We probably still want mypy to support __delete__, but the user can write that class by themself IMHO, and I would put it at a lower priority than the other two methods.

This is a rather opinionated idea, so feel free to tell me I'm wrong about everything.

* The __delete__ method can do anything, of course, but this behavior seems closest to Python's out of the box data model.


If we just want to write a @property shortcut that mypy will understand now, well, I'm not sure how to do that. This does not work:

def make_property(G: type, S: type) -> ???:
    @property
    def result(self) -> G:
        ...
    @result.setter
    def result(self, value: S) -> None:
        ...
    return result

The return type is precisely the type we are trying to write in the first place, so this is a non-starter.

We could subclass property and Generic[G, S], but mypy still has to know about __get__ and __set__ to see our overrides and their type annotations. I'm assuming that it doesn't, and it just special-cases the @property decorator, but feel free to correct me on that point.

@JukkaL Any other ideas on how to effectively convince mypy that something is a property without writing out a full property?

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 22, 2016

The @property implementation in mypy is not complete. You can only define read-only properties, and even then accessing the property via a class doesn't produce the right type. Mypy currently doesn't understand __get__ or __set__ at all -- properties are special-cased.

If you just have a read-write descriptor that is always to be accessed via an instance, you can kind of fake it my declaring it as a regular attribute:

class A:
    x = ... # type: int     # Actually a descriptor, but the value type is int so it's reasonably close

Also, it's not obvious how mypy should support statically typed descriptors outside stubs. The @overload trick doesn't work outside stubs. In stubs the None variant of the method should come first textually, as otherwise the object variant would take precedence.

@NYKevin
Copy link

NYKevin commented Jan 22, 2016

Also, it's not obvious how mypy should support statically typed descriptors outside stubs...

This may be a good argument in favor of the typing.Descriptor approach... if we can work out the details. That way, you just need one or two stubs that live in typing.pyi.

@gvanrossum
Copy link
Member

@NYKevin, do you need help coming up with the actual definition of the generic class that we can add to typing.py and to typing.pyi (in typeshed, for mypy)? You should be able to prototype this without any changes to typeshed or mypy, once we've seen the work we can discuss how to add it.

(I'm trying not to work on solving this problem directly myself, I need to focus on fewer things.)

@NYKevin
Copy link

NYKevin commented Jan 23, 2016

@gvanrossum I'd appreciate it if someone could give my code a quick once-over, but I'm pretty sure it's mostly correct. See below.

But first, I want to be very clear about one thing: this code is not a workaround for this bug (mypy not understanding descriptors). So far as I can tell, this bug cannot be worked around any better than the clumsy @property declaration I described above (and according to Jukka, that currently works even less well than I had thought). Instead, this code is an abstract class that might be useful to have in typing.pyi after we have full descriptor support in mypy (and other type checkers). So if you were just hoping for a quick mypy workaround, this is not the class you are looking for (and said class does not exist, so far as I am aware).

G = TypeVar('G', covariant=True)
S = TypeVar('S', contravariant=True)
T = TypeVar('T')

class Descriptor(Generic[G, T]):
    @abc.abstractmethod
    def __get__(self, obj: None, owner: type) -> 'Descriptor':
        ...

    @overload
    # XXX: @abc.abstractmethod?
    def __get__(self, obj: T, owner: type) -> G:
        ...


class DataDescriptor(Descriptor[G, T], Generic[S]):
    # XXX: It's legal to inherit from Generic multiple times like this, right?
    # Also, what's the order of our generic arguments?  Is it
    # DataDescriptor[G, T, S], or something else?
    # A quick fix would be putting S into Descriptor, but I don't like that.
    @abc.abstractmethod
    def __set__(self, obj: T, value: S) -> None:
        ...

    @abc.abstractmethod
    def __get__(self, obj: None, owner: type) -> 'DataDescriptor':
        ...

    @overload
    # XXX: @abc.abstractmethod?
    def __get__(self, obj: T, owner: type) -> G:
        # XXX: Do we need to redeclare this method?
        ...

Some notes about the code:

  • In case it isn't obvious, G is the type of the getter, S the type of the setter, and T the class in which the descriptor is instantiated.
  • @overload is stub-only, and would need to be removed for the abstract class in typing.py. You'd also need to change all of the ellipses to (something like) raise NotImplementedError().
  • There is intentionally no __delete__() support because the semantics are sufficiently unobvious and/or type unsafe that I think we'd be better off leaving it out. User-defined descriptors will still be able to provide it if they really want to. Feel free to overrule me on this one.
  • The return type of __get__(self, None, owner) is in my opinion pessimistic, but I can't figure out how to spell type(self) in a way that's actually legal.
  • If and when this feature lands, we should replace all instances of type with Type[T]. Until then, type checkers could special case this to infer that isinstance(obj, owner) is always true in __get__(), but it may not be worth the trouble.
  • Some user-defined descriptors will want to replace T with a bounded type variable (e.g. Django might say "You can only declare fields in subclasses of Model" and have that enforced by the type checker). I believe this is already possible via the normal subclassing mechanism, so no extra work has to be done to support it.

And some things that Jukka might actually care about (or may have already figured out by himself a long time ago):

  • The semantics for non-data descriptors are rather weird from a type safety perspective. It's probably best to treat setting a non-data descriptor as a type error, even though it's perfectly legal at runtime. The resulting behavior is to shadow the descriptor, which may be difficult to reason about at compile time. But it might be perfectly safe if you can prove the shadowing object is a G.
  • It's desirable that mypy and other type checkers are able to infer T without explicit type hinting. Most user-defined descriptors will want to remain generic in T while specifying G and S (otherwise, you could only instantiate the descriptor in some finite set of classes, which is of limited utility), and we don't want client code to be littered with bracketed forward references to the containing class (imagine if every model field in Django had to be written as foo = BarField['ModelClass'](...) with ['ModelClass'] repeated for each field in ModelClass).
  • It's also desirable (but this is extra credit) that checkers report an error at the point of instantiation (rather than, say, inferring T=Any) if T is bounded or constrained and cannot be unified with the containing class. if there is no containing class, I would suggest giving up and using T=Any (with a warning or error in strict mode?), but it might still be possible to infer T in some cases. The problem is that when you start playing that game, it becomes possible that the same descriptor object has different T values depending on how it's invoked, and that level of insanity is problematic even at runtime.
  • If all of this inference is too hard, it may be possible to special-case our way out. T is rather unusual in a number of ways, and checkers may be able to play fast and loose with the official rules here.

@gvanrossum
Copy link
Member

Sorry, my brain is not set up any more for understanding this kind of stuff, and I got a headache trying to understand this. :-( (This worries me, since I invented the descriptor concept -- many years ago.) I would have to whiteboard it out with someone who also knows how descriptors work and knows enough about mypy. All I can tell right now is that (1) @overload needs to be present on the first occurrence of the overloaded function too, and (2) we're probably going to allow it outside of stubs too (python/typing#175). But none of that addresses the core issue. I do get that mypy needs to be fixed for this.

@gvanrossum
Copy link
Member

Here's a tutorial example that may help my future self understand this.

This is much more concrete -- I'm assuming that we're implementing a database and descriptors are used to declare fields in records. I'm also combining your GetType and SetType into a single FieldType variable (I understand why you're doing that, but it makes the example more complex without adding extra light -- just like you already omitted __delete__). I've introduced factory functions that construct Field objects of the right type; and (to work around an implementation issue with this kind of descriptor usage) I'm passing the field name into the descriptor, so you have to write the somewhat repetitive

    age = Integer('age')

Anyway, here goes:

from typing import *

FieldType = TypeVar('FieldType')

class Database:
    def get(self, name: str, typename: str) -> Any: ...
    def set(self, name: str, typename: str, val: Any) -> None: ...

class Record:
    def __init__(self, db: Database) -> None:
        self._db = db

class Field(Generic[FieldType]):
    def __init__(self, name: str, typename: str) -> None:
        self.name = name
        self.typename = typename

    @overload
    def __get__(self, obj: None, cls: type) -> 'Field': ...
    @overload
    def __get__(self, obj: Record) -> FieldType: ...

    def __get__(self, obj, cls=None):
        if obj is None:
            assert cls is not None
            return self
        if cls is None:
            cls = obj.__class__
        else:
            assert isinstance(obj, cls)
        return obj._db.get(self.name, self.typename)

    def __set__(self, obj: Record, value: FieldType) -> None:
        obj._db.set(self.name, self.typename, value)

def String(name: str) -> Field[str]:
    return Field(name, 'string')

def Integer(name: str) -> Field[int]:
    return Field(name, 'integer')

class Employee(Record):
    first = String('first')
    last = String('last')
    age = Integer('age')

This currently doesn't pass mypy because python/typing#175 hasn't been implemented yet (heck, I cowardly haven't posted the PEP update to the peps repo yet :-). And here's the part that shows that mypy doesn't support descriptors yet:

r = Employee(Database())
r.age + 1 # E: Unsupported left operand type for +  (Field[int] and "int")

The error we get here shows that it doesn't make the leap from Employee.age being a Field[int] to Employee().age being an int. And that's what we need to add to mypy. @NYKevin, does this summarize your position?

@gvanrossum
Copy link
Member

PS. I think the overloading of __get__ is also a red herring. As first approximation I think it would be totally fine if mypy only recognized the second variant:

def __get__(self, obj: Record) -> FieldType: ...

or perhaps

def __get__(self, obj: Record, cls: type = None) -> FieldType: ...

As you remarked, the convention is that if obj is None __get__ returns self, and we might initially just hardcode that into mypy. That variant is only ever used when writing e.g. Employee.age -- and honestly I wish I had defined the protocol simpler and just declared that in this case the interpreter shouldn't call the descriptor at all.

So the key thing here is that if a class attribute a is Field[T] then the type of obj.a should be T, not Field[T], given that Field[T] is a class with a __get__ method returning a T.

The distinction between the get-type and the set-type, like so many other things, doesn't really change much fundamentally. (Do you have an actual use case where G and S differ? I'd like to hear about it.)

@NYKevin
Copy link

NYKevin commented Jan 31, 2016

I'm also combining your GetType and SetType into a single FieldType variable (I understand why you're doing that, but it makes the example more complex without adding extra light -- just like you already omitted __delete__)

No objection there. At least 90% of typical use cases will have the same types there, so it makes sense to combine them at the implementation level. Nevertheless, if we really are putting this in typing.py, we may want to think about flexibility vs. sanity here. Should descriptors always hew close to the standard Python object model, or do we want to support using them in weirder ways? I like the former from a "let's not make Python into C++" perspective, but I'm not sure how opinionated we want typing.py to be.

I've introduced factory functions that construct Field objects of the right type

Personally, I used inheritance + template method pattern to create different types of Fields, which is particularly handy when your different types have heterogeneous implementations, but this is a minor quibble. Factory functions work just as well for something like this.

I'm passing the field name into the descriptor

You don't actually have to do that; a metaclass can do it for you. It's very hairy code and I wouldn't recommend it for a simple descriptor example. In particular, I had to override the metaclass __setattr__ and __delattr__ methods to keep this information up-to-date when the class is dynamically modified (e.g. FooClass.bar = Field(...)). Don't do that. That way lies madness. But if you want to drive yourself insane for some reason, the code is behind that link.

@overload
def __get__(self, obj: Record) -> FieldType: ...

This looks wrong to me. I don't think __get__ is ever called with two arguments (incl. self). In my understanding, you always get three. But maybe I've missed an edge case?

As you remarked, the convention is that if obj is None __get__ returns self, and we might initially just hardcode that into mypy.

Again, no objection, but in the long run it might be nice to consider this more carefully. Proxy objects may want to return the thing they are proxying (You have two descriptors, one of which is a proxy for the other, and the first has its __get__(self, None, klass) method called. Does it return itself, or the object it is proxying? I'm doing the latter, but I'm not sure if this is actually good design. It does seem to produce better pydoc output.).

The distinction between the get-type and the set-type, like so many other things, doesn't really change much fundamentally. (Do you have an actual use case where G and S differ? I'd like to hear about it.)

No, I don't. I mostly put it in for completeness.

@gvanrossum
Copy link
Member

Nevertheless, if we really are putting this in typing.py, we may want to think about flexibility vs. sanity here. Should descriptors always hew close to the standard Python object model, or do we want to support using them in weirder ways? I like the former from a "let's not make Python into C++" perspective, but I'm not sure how opinionated we want typing.py to be.

I think mypy itself should just look for __get__ and __set__ methods with matching signatures and derive the type from there. For typing.py we may have something more opinionated, if it's simpler to use (e.g. not too many type variables).

Personally, I used inheritance + template method pattern to create different types of Fields

To mypy it should be all the same -- it just needs to know the ultimate type of the class variable so it can check whether it has a __get__ method (etc.).

You don't actually have to do that; a metaclass can do it for you.

You don't have to tell me that. :-) I just left it out because it would be a distraction in this example (a bigger distraction than passing the name redundantly).

I don't think __get__ is ever called with two arguments (incl. self).

True, it's another detail I tried to sweep under the rug as irrelevant to understanding the example.

Proxy objects may want to return the thing they are proxying [...]

You still haven't said a thing about your use case. You have no idea how frustrating that is. I can't participate in a discussion about an API design if it's all about hypotheticals and abstractions. The abstractions have to follow from use cases we care about.

No, I don't. I mostly put it in for completeness.

Point in case.

@NYKevin
Copy link

NYKevin commented Jan 31, 2016

You still haven't said a thing about your use case. You have no idea how frustrating that is. I can't participate in a discussion about an API design if it's all about hypotheticals and abstractions. The abstractions have to follow from use cases we care about.

My apologies.

The wrapped descriptor is a read-write data descriptor. The system of which it is a part generally assumes that it is OK to call __set__ and do other things which are morally equivalent on these fields, since they are supposed to be read-write. Sometimes, I want a read-only descriptor (mostly so that I can use it in __hash__), but the system isn't smart enough to cope with this (and making it smarter would mean more special cases, more metaclass code, bleagh!). So I wrap a writeable field in a proxy object that raises AttributeError on __set__ and __delete__, and which does not participate in the inheritance hierarchy. That makes sure the system doesn't try to do anything clever with it like call __set__ automatically in the containing class's __init__. The proxy forwards __get__ to the wrapped descriptor in the obvious way, without special casing obj=None. This means you get the docstring and repr of the wrapped object instead of the wrapper, which is (usually) the Right Thing.

@gvanrossum
Copy link
Member

gvanrossum commented Jan 31, 2016 via email

@NYKevin
Copy link

NYKevin commented Feb 1, 2016

It is open source. I've been linking to it in my comments above, but I suppose I wasn't very obvious about it. Here are the main components:

  • The fields themselves live in this file. I recommend focusing on the first few classes; those are the abstracts that everything else is built out of. Also, please note that MutableField is a red herring: it relates to the mutability of the GetType, not the mutability of the field itself.
  • The proxy is this thing. You will notice that unlike most of the nonsense in these files, this class is extremely simple. This is quite deliberate since I don't want to write more code than I have to. It is just a wrapper for the existing system that makes it read only. It does not do anything remotely interesting, unlike the standard fields which it wraps.
  • The base class which you need to extend when you use fields lives here. Unfortunately, this class is too clever and I'm trying to avoid new complexity here. Most of the brains actually live in the fields; the class is only as complex as it is because the fields need various bits of information which Python by default does not give them. In other words, this is mostly just doing dependency injection for the fields and invoking their hook methods as needed.

The read-write fields are significantly older than the read-only wrapper. The latter was essentially bolted on when I needed __hash__ support. Rejected alternatives include:

  • Add a readonly attribute to the existing descriptors: NBTObject.__init__() assumes fields are mutable and I already have doubly-nested if statements inside a for loop in that method. I don't want to make them triply-nested. While the method probably needs refactoring, the fact remains that its task is already rather complicated and I don't want to make it any worse. If this code were stable enough to have outside users, we'd also have to worry about people running into read-only fields when doing field-by-field introspection, or manually exclude the read only fields from said introspection (more special cases). By using a wrapper, the read only fields are excluded automatically since they're not AbstractFields.
  • Make a read-only mixin: Arguably violates LSP since the mixin is read-only while the superclass is read-write. Plus the issues from the previous bullet.
  • Write NBTObject.__getattr__(): Less amenable to code reuse. I have a large number of read-write descriptor classes, and I would have to either reinvent all of those wheels, or I would need to re-implement the descriptor protocol (i.e. manually call __get__ and __set__ from __getattr__). The former is obviously a non-starter and the latter is needless complexity since Python already provides a perfectly good implementation of the descriptor protocol.
  • Write NBTObject.__getattribute__: Same problem, plus now we have to take care not to break the standard Python object model.
  • Make a read-only superclass which has AbstractField as a subclass: Now we're just coming at the circle-ellipse problem from the other direction. In this model, how can we be sure that a field is read only and safe for __hash__? This also moves the root of the fields type hierarchy, which means a lot of code changes. Finally, we're running into the same code reuse problem again, but this time it's much harder to work around.

I would like to note that composition is often preferred to inheritance, and I perhaps could have taken that advice when I was initially designing this system. If I had gone down that road, I believe there would be a lot more of this kind of method call forwarding between descriptor classes, but I have no evidence of this and could be entirely mistaken. If anyone knows of a good example of composition-over-inheritance descriptor design, it would be very interesting to look at.

@gvanrossum
Copy link
Member

gvanrossum commented Feb 1, 2016 via email

@NYKevin
Copy link

NYKevin commented Feb 2, 2016

OK, now I understand what your code is doing -- though I wouldn't
recommend this architecture to anyone doing a similar thing from
scratch, and it sounds like you agree. :-)

Meh... Perhaps starting over from scratch would yield a better design,
but I'm not entirely sure there is a "good" way to do this kind of
thing. There's going to be some metaclass and descriptor magic when
you try to write code that does this (compare the protobuf library),
and that's always going to be somewhat hard to read.

But this is irrelevant bikeshedding (though I do have to wonder whether mypy has any chance whatsoever of understanding protobuf message classes without stubs that would get outdated w.r.t. the .proto file... but that's a different topic).

But those aren't useful until mypy has support for descriptors built
in, and I don't like to put the cart before the horse. Once that's in
mypy, it'll be easier to experiment with different approaches before
we decide to add something to typing.py.

SGTM.

EDIT: At the risk of reopening a rather lengthy and (IMHO) not entirely fruitful discussion, I'd like to provide another example of an object that overrides __get__() with obj=None to return something other than itself. It is a far more straightforward proxy object, and might serve as a more useful example. I'm also considering removing the example discussed above from my codebase in favor of an entirely different approach. None of this should affect the direction of this discussion; I'm just linking it so Guido and Jukka have an additional data point.

@gvanrossum
Copy link
Member

Looks like this got fixed by #2266 but we forgot to close this issue.

@eric-wieser
Copy link
Contributor

eric-wieser commented Apr 27, 2020

We're back to the design of the Descriptor and DataDescriptor classes that you posted way earlier. But those aren't useful until mypy has support for descriptors built in, and I don't like to put the cart before the horse.

Now that mypy has support for these, would it makes sense to add these as typing.Descriptor or perhaps collections.abc.Descriptor?

(the classes being mentioned are from #244 (comment))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants