Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Class Transforms add __dataclass_fields__ #15974

Open
ikonst opened this issue Aug 27, 2023 · 10 comments
Open

Data Class Transforms add __dataclass_fields__ #15974

ikonst opened this issue Aug 27, 2023 · 10 comments

Comments

@ikonst
Copy link
Contributor

ikonst commented Aug 27, 2023

Per PEP 681, data class transforms are defined in terms of dataclasses, but it doesn't mean they make dataclasses.

In particular, transformed classes don't necessarily have __dataclass_fields__ or have is_dataclass return True, but per mypy - they do.

@erictraut
Copy link

Where is __dataclass_fields__ documented? I don't see any reference to it in PEP 557 (which introduced dataclass) or in the official Python documentation for dataclasses. I presume that it's an internal implementation detail of the stdlib dataclass. If so, then I'd argue that it shouldn't appear in typeshed stubs, and nothing in the typing specs (including PEP 681) should depend on it.

The only reference to __dataclass_fields__ that I can find is in this typeshed issue and this referenced mypy issue. I don't think I agree with the conclusions that were reached in those threads.

@ikonst
Copy link
Contributor Author

ikonst commented Aug 28, 2023

Yeah, you've got it. Nominally your approach is sound: if it's not in the spec, typeshed shouldn't build a protocol on it. Rather, #14263 should've been the way to go. The typeshed solution leverages a "plugin" behavior anyway (the existence of this undocumented attr) so might as well go "plugin" on this one.

In practice, libs like pydantic have been synthesizing this attr to make their models act as dataclasses, so there's interest in having is_dataclass serve as an effective (and correct!) type guard for using the other dataclasses utility functions. (correction: pydantic actually produces a false positive and would improve if this issue is fixed)

@ikonst
Copy link
Contributor Author

ikonst commented Aug 28, 2023

FWIW, pyright also synthesizes this symbol, and it could be similarly argued that since it's an implementation detail of dataclasses, then pyright should not synthesize it, and then typeshed wouldn't "take advantage" of it (as they did in python/typeshed#9345).

Or we can accept that we've grown to rely on it. 😔

This issue is about whether we should synthesize it indiscriminately for Data Class Transforms. Currently we do, and the following is a false positive:

import typing

_T = typing.TypeVar("_T")


@typing.dataclass_transform()
def create_model(cls: type[_T]) -> type[_T]:
    cls.__init__ = lambda *args, **kwargs: None  # type: ignore
    return cls


@create_model
class C:
    a: int


c = C(a=42)
typing.reveal_type(c.__dataclass_fields__)

Runtime:

    typing.reveal_type(c.__dataclass_fields__)
                       ^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'C' object has no attribute '__dataclass_fields__'

mypy:

file.py:18: note: Revealed type is "builtins.dict[builtins.str, Any]"
Success: no issues found in 1 source file

@JelleZijlstra
Copy link
Member

I don't think we should. PEP 681 is a specification for abstractions that work like dataclasses in their public interface; it doesn't require that abstractions that use dataclass_transform mirror private implementation details of dataclasses.

@ikonst
Copy link
Contributor Author

ikonst commented Aug 28, 2023

@JelleZijlstra I agree, and this issue about removing this synthesized attribute in that case, basically

-self._add_dataclass_fields_magic_attribute()
+if self._spec is _TRANSFORM_SPEC_FOR_DATACLASSES:
+    self._add_dataclass_fields_magic_attribute()

At the same time, I did some sleuthing for a related issue, and it would appear that doing that would break many users that have grown to rely on @dataclass_transform to create alternative decorators that ultimately do call into dataclasses.dataclass and thus produce true dataclasses (example). It's not what PEP-681 was intended for, but I found copious examples. So now I don't know (insert "two buttons" meme).

@erictraut
Copy link

FWIW, pyright also synthesizes this symbol

I added a synthesized __dataclass_fields__ in pyright because mypy added it and typeshed added a dependency on it. I should have done more research and asked questions before doing this.

Here are some questions that may help inform the best way forward...

  1. Why was the __dataclass_fields__ hack added to typeshed? I presume it was to provide better type checking for replace, asdict, astuple, fields, and is_dataclass.
  2. Was there evidence that developers are frequently passing non-dataclass objects to these functions? In other words, is this a common source of bugs?
  3. Which of these functions (if any) apply to dataclass-like classes? I presume the answer is none.
  4. How frequently are these functions used in real-world code? Is there a way for us to measure this?

Thinking about potential solutions...

  1. We could delete DataclassInstance and all of its uses from typeshed, replacing all instances with object. Both mypy and pyright could then also delete the synthesized __dataclass_fields__. This would decrease type safety for the functions I mentioned above. If they're infrequently used and/or not a common source of bugs, this probably doesn't matter.
  2. We could augment dataclass_transform to support a new parameter that indicates whether a __dataclass_fields__ attribute is synthesized. In other words, it could be specified by the user explicitly.

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Aug 28, 2023

As mentioned here, pyserde, strawberry, ibis would use the parameter mentioned in option 2. In fact, so would flax I believe.

@ikonst
Copy link
Contributor Author

ikonst commented Aug 28, 2023

We could augment dataclass_transform to support a new parameter that indicates whether a dataclass_fields attribute is synthesized. In other words, it could be specified by the user explicitly.

PEP-681 says: "In the future, we may add additional parameters to dataclass_transform as needed to support common behaviors in user code. These additions will be made after reaching consensus on typing-sig rather than via additional PEPs."

Should we start a topic in typing-sig on adding a makes_dataclass parameter?

@erictraut
Copy link

Typing-sig is pretty much defunct at this point. I recommend using python/typing instead. But yes, that sounds like a good next step. I'm still not convinced that this is a problem that needs solving. It would be useful for you to present the motivation for a change in the discussion thread.

@ikonst
Copy link
Contributor Author

ikonst commented Aug 28, 2023

👉 python/typing#1456

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants