Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom dataclasses are not recognized as dataclasses #15843

Closed
NeilGirdhar opened this issue Aug 10, 2023 · 20 comments · Fixed by #15915
Closed

Custom dataclasses are not recognized as dataclasses #15843

NeilGirdhar opened this issue Aug 10, 2023 · 20 comments · Fixed by #15915
Labels
bug mypy got something wrong topic-dataclasses

Comments

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Aug 10, 2023

Feel free to skip to the bottom of the code. The reset is custom dataclass specification.

import dataclasses
from dataclasses import replace
from typing import (Any, Callable, ClassVar, Protocol, dataclass_transform, overload,
                    runtime_checkable)


@runtime_checkable
class DataclassInstance(Protocol):
    __dataclass_fields__: ClassVar[dict[str, dataclasses.Field[Any]]]


@overload
@dataclass_transform()
def dataclass(*, init: bool = True, repr_: bool = True, eq: bool = True,
              order: bool = False) -> Callable[[type[Any]], type[DataclassInstance]]:
    ...


@overload
@dataclass_transform()
def dataclass(cls: type[Any], /, *, init: bool = True, repr_: bool = True, eq: bool = True,
              order: bool = False) -> type[DataclassInstance]:
    ...


@dataclass_transform()
def dataclass(cls: type[Any] | None = None, /, *, init: bool = True, repr_: bool = True,
              eq: bool = True, order: bool = False
              ) -> type[DataclassInstance] | Callable[[type[Any]], type[DataclassInstance]]:
    if cls is None:
        def f(x: type[Any], /) -> type[DataclassInstance]:
            return dataclass(x, init=init, repr_=repr_, eq=eq, order=order)
        return f  # Type checking support partial is poor.
    return dataclasses.dataclass(init=init, repr=repr_, eq=eq, order=order, frozen=True)(cls)

@dataclass
class X:
    a: int

def f(x: X) -> X:
    return replace(x, a=1)  # [Error](error: Argument 1 to "replace" has incompatible type "X"; expected a dataclass

It appears that MyPy recognizes that something is a dataclass through some means other than the protocol. See here for background: python/cpython#102699

@NeilGirdhar NeilGirdhar added the bug mypy got something wrong label Aug 10, 2023
@AlexWaygood
Copy link
Member

AlexWaygood commented Aug 11, 2023

It appears that MyPy recognizes that something is a dataclass through some means other than the protocol.

Yes, following #14849, mypy now infers a much more precise signature for dataclasses.replace() if it sees it being called on an instance of a class that it knows to have been constructed using @dataclass. But it looks like an unfortunate side effect is that it now only accepts @dataclass instances being passed to dataclasses.replace(), whereas the runtime function dataclasses.replace() accepts anything with a __dataclass_fields__ dictionary ClassVar.

@ikonst, would it be possible to just fallback to the signature typeshed gives if mypy sees an instance of a class that hasn't been constructed with @dataclass being passed to dataclasses.replace()? This issue is also causing one of typeshed's tests to fail: see the commented-out test in python/typeshed#10559.

(It's not a huge issue for typeshed, though, we can just comment out the test for now and that's not the end of the world.)

@NeilGirdhar
Copy link
Contributor Author

(Although, It would be cool to have the typed replace function for custom dataclasses 😄)

@AlexWaygood
Copy link
Member

(Although, It would be cool to have the typed replace function for custom dataclasses 😄)

Well sure, but I'm guessing that would be significantly more complex :)

@NeilGirdhar
Copy link
Contributor Author

Yup, fair enough. Thanks for taking a look at this so quickly.

@ikonst
Copy link
Contributor

ikonst commented Aug 20, 2023

BTW, we also have a false positive problem, assuming every dataclass_transform'd class has __dataclass_fields__. For example:

import typing

_T = typing.TypeVar("_T")


@typing.dataclass_transform()
def create_model(cls: type[_T]) -> type[_T]:
    cls.__init__ = lambda *args, **kwargs: None  # type: ignore
    return cls


@create_model
class C:
    a: int


c = C(a=42)
typing.reveal_type(c.__dataclass_fields__)

Runtime:

    typing.reveal_type(c.__dataclass_fields__)
                       ^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'C' object has no attribute '__dataclass_fields__'

mypy:

file.py:18: note: Revealed type is "builtins.dict[builtins.str, Any]"
Success: no issues found in 1 source file

@ikonst
Copy link
Contributor

ikonst commented Aug 20, 2023

To eliminate the "Argument 1 to "replace" has incompatible type "X"; expected a dataclass" error in the issue's code, we can make a simple change:

-if self._spec is _TRANSFORM_SPEC_FOR_DATACLASSES:
-    self._add_internal_replace_method(attributes)
+self._add_internal_replace_method(attributes)

If we want to go ahead with it, see #15915.

This would mean that you can call dataclasses.replace on any @dataclass_transform'd class's instance. This is of course not necessarily so (though, as Alex said here, seems to be common among third party libs). Similarly, is_dataclass isn't necessarily true for a @dataclass_transform'd class.

Right now @dataclass_transform doesn't have a feature specifying whether __dataclass_fields__ is synthesized. Perhaps the return type of the decorated function could allude to that, but at least in Pydantic it wouldn't correctly work (the decorated function returns some sort of "DataclassProxy" that doesn't have __dataclass_fields__).

@NeilGirdhar
Copy link
Contributor Author

NeilGirdhar commented Aug 20, 2023

Similarly, is_dataclass isn't necessarily true for a @dataclass_transform'd class.

Would it be worth exploring saying (in a PEP I guess) that it should be guaranteed that:

  • custom dataclasses return true for dataclasses.is_dataclass, and
  • support dataclasses.replace?

Are there many other examples of custom dataclasses that don't delegate construction to the dataclass decorator?

Could the Pydantic custom dataclass be easily made to conform to this idea? If not by delegating to dataclasses.dataclass, then by some other means like having is_dataclass and replace be singledispatch functions so that custom dataclasses can register hooks if they want? Or if DataclassLike (python/cpython#102933) is merged, then by registering with it.

This whole mess would have been avoided if dataclasses had been a base class from the start. Instead they're implemented using a decorator that acts like a base class (provides functionality and makes interface promises). I see custom dataclasses as virtual subclasses of this dataclass ABC, so I think it might make sense to have these virtual subclasses make the same promises.

@ikonst
Copy link
Contributor

ikonst commented Aug 20, 2023

Pydantic predates dataclasses so whatever approach they took - decorators, base classes, meta classes - is not a consequence of any dataclass design decisions, whether good or bad.

typing.dataclass_transform simply didn't include a dataclass_compatible: bool flag. I don't see how a Dataclass base class would change anything here: even if we'd have that base class to inject into the MRO (of a transformed class), we'd need to know when/whether.

@NeilGirdhar
Copy link
Contributor Author

Pydantic predates dataclasses so whatever approach they took - decorators, base classes, meta classes - is not a consequence of any dataclass design decisions, whether good or bad.

Okay. My main point was that I see the dataclasses decorator as analogous to a base class. And I see custom dataclass decorators as analogous to subclasses of that base class. So in my personal view, dataclass_compatible would always be true.

And that's really my question. Can we make it always true? Are there many examples of custom dataclasses that aren't compatible with dataclasses? And if so, how easily can they be made compatible?

To make them compatible, I propsed using dispatch, but registration of a replace/field method might be another alternative.

@ikonst
Copy link
Contributor

ikonst commented Aug 28, 2023

If I understand correctly, the Data Class Transforms mechanism was introduced not to create "custom data classes" (a runtime concept) but to let 3rd party libs tell mypy & pyright how to understand a custom type without having to write a plugin (a typing concept). That's why it's @typing.dataclass_transform and not @dataclasses.transform.

It's possible that PEP-681 was misunderstood and/or its use evolves differently than planned. Using Sourcegraph, I found them split into multiple categories:

  • as PEP-681 intended: attrs, pydantic, flax
  • wraps dataclasses: pyserde, strawberry, ibis
  • custom impl dataclass: n/a (couldn't find any)

By "wraps dataclasses", I mean that they make a my_dataclass decorator calling into actual dataclasses so what they produce follows the dataclasses semantics, but since it's no longer decorated with @dataclasses.dataclass, naive type checkers wouldn't treat it specially. And then @typing.dataclass_transform comes to the rescue. PEP-681 doesn't seem to foresee this really common use case.

re pydantic, unlike what's been said in the thread, I didn't see it produce a __dataclass_fields__ in runtime:

import dataclasses
import pydantic
from typing import reveal_type


class User(pydantic.BaseModel):
    name: str
    age: int


user = User(name='John', age=42)
print(f'{dataclasses.is_dataclass(user)=}')
reveal_type(user.__dataclass_fields__)

results in

% python main.py
dataclasses.is_dataclass(user)=False
Traceback (most recent call last):
  File "main.py", line 13, in <module>
    reveal_type(user.__dataclass_fields__)
                ^^^^^^^^^^^^^^^^^^^^^^^^^

My summary

  • There's fewer folks than we expected wanting to implement something custom that doesn't borrow from dataclasses yet compatible with dataclassses.replace etc.
  • There's many more people that simply want such constructs to preserve type-checking special treatment, and this is probably the main behavior that we should avoid breaking, even though it wasn't foreseen in PEP-681.
    @typing.dataclass_transform()
    def my_dataclass(kls):
      kls.foo = 'bar'
      return dataclasses.dataclass(kls)
  • We have a false positive in pydantic etc. (see Data Class Transforms add __dataclass_fields__ #15974)

@NeilGirdhar
Copy link
Contributor Author

NeilGirdhar commented Aug 28, 2023

Just FYI: The flax dataclass also wraps dataclasses though. Same with chex.

but to let 3rd party libs tell mypy & pyright how to understand a custom type without having to write a plugin

Right. The question is what are we telling mypy and pyright? Are we promising only fields or the dataclass interface (replace, fields, etc.).

What is the best way to make custom dataclasses support the dataclass interface? If a custom base class like this were added (see here), and if it were made to support inheritance, then we could do:

class MyCustomDataclass(MyDataclassInstance):
    def __init_subclass__(cls):
        super().__init_subclass__()
        # do your custom behaviour here

class SomeDataclass(CustomDataclass):
    x: int
    y: float

reveal_type(SomeDataclass.__dataclass_fields__)  # dict[str, Field[Any]]

If you're going to provide type checking for replace, it would probably be nice to support adding to the fields (which is done by flax for example). Although that might need more careful thought.

@ikonst
Copy link
Contributor

ikonst commented Aug 28, 2023

The flax dataclass also wraps dataclasses though.

You're right, flax has two cases - kw_only_dataclasses.dataclass that clearly produces a dataclass, and ModuleBase is a hack for something that'd always be a base class of a dataclass in runtime.

The question is what are we telling mypy and pyright? Are we promising only fields or the dataclass interface (replace, fields, etc.).

Right now, it promises that, quoting PEP-681, "certain decorator functions, classes, and metaclasses provide behaviors similar to dataclasses".

What is the best way to make custom dataclasses support the dataclass interface?

I'm not sure I'd start off with "custom dataclasses" since there's scant evidence anyone is doing it. For example, pydantic is very close to what I'd call a "custom dataclass" and it never bothered emulating dataclasses for runtime purposes. Focusing on "Wraps dataclasses" makes for a stronger case.

If a custom base class...

The path of least resistance is to simply add a parameter to dataclass_transform. PEP-681 says it could be done without another PEP: "In the future, we may add additional parameters to dataclass_transform as needed to support common behaviors in user code. These additions will be made after reaching consensus on typing-sig rather than via additional PEPs."

Then the question of whether dataclasses should offer a base class becomes totally orthogonal to this discussion. (PEP-681 is fine with both decorators, base classes and meta classes.)

@NeilGirdhar
Copy link
Contributor Author

NeilGirdhar commented Aug 28, 2023

The path of least resistance is to simply add a parameter to dataclass_transform.

That would perfectly satisfy me. For motivation, all of the Jax-ecosystem dataclasses would benefit (flax, chex, my tjax library). It would allow type checking of fields and replace. If you post it, I'll chime in with support for whatever it's worth 😄

I'm not sure I'd start off with "custom dataclasses" since there's scant evidence anyone is doing it. For example, pydantic is very close to what I'd call a "custom dataclass"

By custom dataclasses, I just mean users of the dataclass_transform decorator, e.g., I would call flax.struct.dataclass a "custom dataclass". But I see how you're using the word now.

@ikonst
Copy link
Contributor

ikonst commented Aug 28, 2023

👉 python/typing#1456

@NeilGirdhar
Copy link
Contributor Author

Really well-written! Thanks for taking the time to post that.

khaeru added a commit to iiasa/message-ix-models that referenced this issue Aug 30, 2023
khaeru added a commit to iiasa/message-ix-models that referenced this issue Aug 31, 2023
khaeru added a commit to iiasa/message-ix-models that referenced this issue Aug 31, 2023
khaeru added a commit to iiasa/message-ix-models that referenced this issue Aug 31, 2023
khaeru added a commit to iiasa/message-ix-models that referenced this issue Aug 31, 2023
ilevkivskyi pushed a commit that referenced this issue Sep 17, 2023
We [already
synthesize](#15843 (comment))
`__dataclass_fields__` for all classes including
`@dataclass_transform`'d ones, thus assume more than PEP-681 says. We
might as well assume `dataclasses.replace` applies on all same classes.
This way we risk false positive since it'll raise in runtime.

Fixes #15843.
@NeilGirdhar
Copy link
Contributor Author

I'm looking at this again, and there's something I don't understand: the function that I decorated with dataclass-transform (here called dataclass) has return type DataclassInstance, so why doesn't MyPy know that X is an ordinary dataclass?

@AlexWaygood
Copy link
Member

I'm looking at this again, and there's something I don't understand: the function that I decorated with dataclass-transform (here called dataclass) has return type DataclassInstance, so why doesn't MyPy know that X is an ordinary dataclass?

Mypy heavily special-cases dataclasses via a custom plugin that means it can have a far more precise understanding of dataclasses than could be achieved if it just looked at the stubs alone. (I believe all other major static type checkers take a similar approach.) Whether or not a type is annotated as DataclassInstance isn't relevant to whether mypy's dataclasses plugin considers something a dataclass or not.

@NeilGirdhar
Copy link
Contributor Author

NeilGirdhar commented Sep 27, 2023

Whether or not a type is annotated as DataclassInstance isn't relevant to whether mypy's dataclasses plugin considers something a dataclass or not.

Sorry, I do understand that. I'm just confused about this issue in particular.

MyPy has figured out that dataclass is a dataclass-creator since it's decorated with dataclass_transform. And I agree that not all dataclasses created by dataclass-creators can be passed to replace. However, this particular dataclass creator produces classes of type DataclassInstance whose instances are passable to replace.

Essentially, it appears that MyPy is ignoring the return type annotation of dataclass-creators (call it X), and instead making a set of assumptions (as you say) about what it returns (call it Y). What I'm suggesting is to set the type of the dataclass decorated by the creator to be X & Y. That would provide the missing information that allows it to be passed to replace.

@AlexWaygood
Copy link
Member

Can you open a new issue with a minimal repro of what you're trying to do, that doesn't currently work when you're using mypy's master branch?

@NeilGirdhar
Copy link
Contributor Author

@AlexWaygood Done: #16241

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug mypy got something wrong topic-dataclasses
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants