Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Dataclass abstract base class for instance checking and type annotations #102699

Open
NeilGirdhar opened this issue Mar 14, 2023 · 15 comments
Labels
3.13 bugs and security fixes stdlib Python modules in the Lib dir topic-dataclasses topic-typing type-feature A feature request or enhancement

Comments

@NeilGirdhar
Copy link

NeilGirdhar commented Mar 14, 2023

Feature or enhancement

Add the class dataclasses.Dataclass that

  • supports instance and subclass checks using similar logic as dataclasses.is_dataclass, and
  • blocks inheritance in __init_subclass__.

(Edit: removed "has an attribute __dataclass_fields__ for the purpose of type checking")

Pitch

This class would simplify runtime type-checking of dataclasses:

not isinstance(x, type) and dataclasses.is_dataclass(x)  # Previously
isinstance(x, dataclasses.Dataclass)  # With this feature.

and similarly

isinstance(x, type) and dataclasses.is_dataclass(x)  # Previously.
issubclass(x, dataclasses.Dataclass)  # With this feature.

Besides being simpler, it would also allow combined instance checks like isinstance(x, Dataclass | SomeOtherBase), which is arguably more Pythonic than the combined function calls above (marked "Previously").

It would also allow users to annotate dataclasses using Dataclass or type[Dataclass], which is currently impossible without reaching into protected areas of the typeshed.

Inheritance could be blocked for now with a friendly error message (that perhaps tells you to use the decorator either directly or in a base class's __init_subclass__).

Previous discussion

There's some discussion here: python/typing_extensions#115
@AlexWaygood
@JelleZijlstra
@henryiii

Linked PRs

@NeilGirdhar NeilGirdhar added the type-feature A feature request or enhancement label Mar 14, 2023
@AlexWaygood AlexWaygood added stdlib Python modules in the Lib dir topic-typing 3.12 bugs and security fixes labels Mar 14, 2023
@AlexWaygood AlexWaygood changed the title Add a Dataclass base class for instance checking and type annotations Add a Dataclass abstract base class for instance checking and type annotations Mar 14, 2023
@AlexWaygood
Copy link
Member

AlexWaygood commented Mar 14, 2023

Thanks for opening this issue. I'm supportive of this idea -- I agree it would be quite helpful for typing.

To give context for those who don't follow stuff that goes on at typeshed/typing_extensions: we recently changed how we annotate some dataclasses-module functions over at typeshed. We now define a DataclassInstance protocol, which we use for our annotations for dataclasses.asdict, dataclasses.astuple, and dataclasses.replace. These are much more precise than our previous annotations, but the feedback we've received from users is that it's quite painful for users of type checkers not to be able to access our DataclassInstance protocol at runtime -- it's currently "private to typeshed", so can't easily be used in user code.

I think dataclasses.Dataclass would be a bad name for this class. dataclasses works through decorators, and I don't think anybody wants to change that, but the name dataclasses.Dataclass implies (to me) a concrete class that can be inherited from, similar to pydantic.BaseModel. To reflect that this is an abstract class, that users shouldn't even inherit from (since dataclasses works via decorators, not inheritance), maybe dataclasses.DataclassLike would be a better name? This would be a similar name to os.PathLike, which has a similar role. It would also make it slightly clearer that objects created by "dataclass-esque" third-party libraries (perhaps ones that use the dataclass_transform mechanism) could also be considered "instances of" this abstract base class if they have the __dataclass_fields__ attributes.

Cc. @ericvsmith and @carljm for dataclasses.

@NeilGirdhar
Copy link
Author

the name dataclasses.Dataclass implies a concrete class that can be inherited from, similar to pydantic.BaseModel.

I see your point about DataclassLike, but just to provide some reasoning for the simpler name: it's reflective of all the other ABCs like those in numbers and collections.abc. We don't have RealLike, but just Real; we don't have SequenceLike, but just Sequence.

And the function that currently checks dataclasses is called is_dataclass--not is_dataclass_like. So, from a natural language standpoint, I thought that since we say that that a class X is a dataclass, it seemed natural to me to spell that isinstance(X, Dataclass).

But DataclassLike would be fine too.

@AlexWaygood
Copy link
Member

I see your point about DataclassLike, but just to provide some reasoning for the simpler name: it's reflective of all the other ABCs like those in numbers and collections.abc. We don't have RealLike, but just Real; we don't have SequenceLike, but just Sequence.

Sure, but numbers and collections.abc are modules which are nothing but abstract base classes -- it would be surprising if anything from those modules wasn't an abstract base class. Whether it's put in typing or dataclasses, however, that won't hold true for this proposed class.

@carljm
Copy link
Member

carljm commented Mar 14, 2023

I don't have an issue with this. I think to some extent the dataclass design originally pushed in the direction of "this is just a utility to create ordinary classes, nobody should care that they were created by this particular utility." But in practice when you have a utility to create classes with a structured set of field metadata, people are going to find useful things to do with that field metadata, and in order to do that, they'll need to know which types have it. And this ship has long sailed, with the existence of free functions like dataclasses.fields, dataclasses.astuple, dataclasses.asdict that can only operate on instances of dataclasses, and the existence of dataclasses.is_dataclass to detect them.

So given that we already clearly support all of this, I don't see any reason we shouldn't make it a bit more ergonomic (and static type-checking friendly.)

Would like to know what @ericvsmith thinks.

@NeilGirdhar
Copy link
Author

I think to some extent the dataclass design originally pushed in the direction of "this is just a utility to create ordinary classes, nobody should care that they were created by this particular utility."

If you're interested in the history of this, from my discussion with the attrs creator, he was strongly opposed to inheritance in Python. Like you say, that ship has long sailed now 😄

@carljm
Copy link
Member

carljm commented Mar 14, 2023

I agree with Hynek about inheritance :) IMO this proposal is mostly orthogonal to inheritance; it's about ergonomically detecting whether a type has dataclass-style field metadata, which is relevant regardless of whether anyone is inheriting anything.

@NeilGirdhar
Copy link
Author

IMO this proposal is mostly orthogonal to inheritance; it's about ergonomically detecting whether a type has dataclass-style field metadata, which is relevant regardless of whether anyone is inheriting anything.

Yup, I was just providing some interesting history 😄 Glad to see there's so much agreement about having a base class.

@sobolevn
Copy link
Member

I think that adding such an interface would make concrete assumptions about the internals of a dataclass. Right now __dataclass_fields__ is not even mentioned in https://docs.python.org/3/library/dataclasses.html

As far as I understand, dataclass module tries to hide this API: this is why we have dataclass.fields function and dataclass.is_dataclass (they are abstract enough to not expose any API).

@NeilGirdhar
Copy link
Author

I think that adding such an interface would make concrete assumptions about the internals of a dataclass. Right now __dataclass_fields__ is

Maybe we could just remove that from the proposal? In retrospect it was a mistake to add it.

@AlexWaygood
Copy link
Member

Typeshed's stub for the class will have to have the __dataclass_fields__ attribute in order for type checkers to understand it properly, and I think it makes sense for the runtime to have that attribute as well. But I don't think that means that we need to document that it has that attribute or expose it as a public attribute in any way. I don't think this proposal would change the nature of this attribute for dataclasses at all, in fact -- it would still be an implementation detail. Users will only care that isinstance(<dataclass_instance>, DataclassLike) returns True; they won't care that it returns True because the isinstance machinery is checking to see if <dataclass_instance> has a __dataclass_fields__ attribute.

AlexWaygood added a commit to AlexWaygood/cpython that referenced this issue Mar 22, 2023
AlexWaygood added a commit to AlexWaygood/cpython that referenced this issue Mar 22, 2023
@AlexWaygood
Copy link
Member

Okay, to put some flesh on the bones of this proposal: here's a PR showing what this could look like:

@randolf-scholz
Copy link
Contributor

I don't really see the point of reserving the Dataclass name. Calling it DataclassLike rather makes me wonder what, besides @dataclass decorated classes, would be caught by isinstance(cls, DataclassLike).

Also, why use an abstract base class here, instead of a Protocol? It seems that in the current PR it doesn't have any @abstractmethod.

@AlexWaygood
Copy link
Member

I don't really see the point of reserving the Dataclass name. Calling it DataclassLike rather makes me wonder what, besides @dataclass decorated classes, would be caught by isinstance(cls, DataclassLike).

Other classes from third-party libraries that have dataclass-like semantics, and also synthesize __dataclass_fields__ attributes. Many of these third-party libraries make use of the @dataclass_transform decorator, so they are treated by type checkers as though they have the same semantics as dataclasses.

As a concrete example:

>>> from pydantic.dataclasses import dataclass
>>> @dataclass
... class Foo: ...
...
>>> Foo.__dataclass_fields__
{}
>>> import dataclasses
>>> dataclasses.is_dataclass(Foo)
True

I think a dataclasses.Dataclass class also sounds like you could inherit from it to create a dataclass (similar to enum.Enum or typing.Protocol), and that's not the intent of the new ABC. Dataclasses works via decorators, and we don't want to change that.

Also, why use an abstract base class here, instead of a Protocol?

What would using a protocol get us here? The runtime implementation for typing.Protocol is much more complicated and often has unexpected performance issues. dataclasses currently takes pains to avoid importing typing (so that it has a reasonably fast import time, and doesn't need to depend on typing), and I don't see a reason to use typing.Protocol here, when we can do everything we need with ABCs.

For os.PathLike and contextlib.AbstractContextManager, they're implemented as ABCs (so that they're fast) at runtime, but in typeshed, we pretend they're protocols, and type checkers treat them as though they're protocols. It works great; I imagine we'd do a very similar thing with this new ABC.

It seems that in the current PR it doesn't have any @abstractmethod.

Correct, but we need the ABCMeta metaclass in order for the custom __subclasshook__ method to work as designed.

@antonagestam
Copy link

antonagestam commented Oct 3, 2023

What would using a protocol get us here?

Using a protocol and allowing subclassing would allow composition with other protocols, which in my opinion would be useful. It allows creating a function that takes as argument a value that both can be passed to asdict, and has some special attribute or method.

(Keeping in mind that Python doesn't have intersections, and that the only way to compose protocols is by using subclassing).

@antonagestam
Copy link

Would DataclassType make sense for name?

@erlend-aasland erlend-aasland added 3.13 bugs and security fixes and removed 3.12 bugs and security fixes labels Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes stdlib Python modules in the Lib dir topic-dataclasses topic-typing type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

8 participants