-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF/EA-API: EA constructor without dtype specified #56430
Comments
In my mind, I understand there are multiple use cases, but that can be served by a single method depending on whether a dtype is passed or not? That feels quite clear to me: when a
In |
Correct. My point is that MaskedArray subclasses use a different pattern to achieve the same result. The datetimelike EAs have their own special-casing. If it is feasible (which im not ready to claim), then it would be preferable to have a single shared pattern for these.
Certainly possible. On the margin I'd prefer the cases where we intentionally want dtype inference to be more explicit. I'm spending some time this week tracking down just where those cases are. |
I've spent some time tracking down the places where we don't pass a dtype to from_sequence:
Also tracking down the various patterns we use for flavor-preserving-partial-inference:
Other places where we have special-casing for Masked/Arrow dtypes related to flavor-retention:
I expect there are more that I have missed, will update here as I find them. |
Re-reading, I think I missed an important point: a big part of the relevant use case is having a BooleanArray method that returns a FloatingArray/IntegerArray etc. (this example could also be addressed by condensing these classes down to just MaskedArray). xref #58258 |
TLDR: we should make
dtype
required in EA._from_sequence and implement a new EA constructor for flavor-preserving inference.ATM dtype is not required in EA._from_sequence. The behavior-- and more importantly the usage-- when it is not specified is not standardized. In many cases it does some kind of inference, but how much inference varies.
Most of the places where we don't pass a dtype are aimed at some type of dtype-flavor-retention. e.g. we did some type of operation starting with a pyarrow/masked/sparse dtype and we want the result.dtype to still be pyarrow/masked/sparse, but not necessarily the same exact dtype. The main examples that come to mind are maybe_cast_pointwise_result, MaskedArray._maybe_mask_result.
The main other place where we call _from_sequence without a dtype is pd.array. With a little bit of effort I'm pretty sure we can start passing dtypes there.
cc @jorisvandenbossche
The text was updated successfully, but these errors were encountered: