Skip to content

Update type for PeriodDtype / DatetimeTZDtype / IntervalDtype #22938

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 4, 2018
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fixed
  • Loading branch information
TomAugspurger committed Oct 2, 2018
commit d7a8e1b8686024b3c891d01ce267ccbfd1beabb9
6 changes: 3 additions & 3 deletions pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

from pandas.core.dtypes.dtypes import (
registry, CategoricalDtype, CategoricalDtypeType, DatetimeTZDtype,
DatetimeTZDtypeType, PeriodDtype, IntervalDtype,
PeriodDtype, IntervalDtype,
PandasExtensionDtype, ExtensionDtype,
_pandas_registry)
from pandas.core.dtypes.generic import (
Expand Down Expand Up @@ -1906,7 +1906,7 @@ def _get_dtype_type(arr_or_dtype):
elif isinstance(arr_or_dtype, CategoricalDtype):
return CategoricalDtypeType
elif isinstance(arr_or_dtype, DatetimeTZDtype):
return DatetimeTZDtypeType
return Timestamp
elif isinstance(arr_or_dtype, IntervalDtype):
return Interval
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really changing the semantics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which semantics?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the other dtypes return the meta class
now you are returning the actual class (and not even the type)

this is not right

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what we have been doing for all EAs.

Now in practice, I don't if we ever rely on the value, though?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that am not sure this is actually used. But If you are chaning, try changing for all the dtypes? (or don't change). that's what I mean by semantics, we neeed to be consistent.

Copy link
Contributor Author

@TomAugspurger TomAugspurger Oct 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the ExtensionArray docs

        """The scalar type for the array, e.g. ``int``

        It's expected ``ExtensionArray[item]`` returns an instance
        of ``ExtensionDtype.type`` for scalar ``item``.
        """

Period is the scalar type for arrays with PeriodDtype, in the sense that isinstance(periodarray[0], Period) is true. What's the issue?

now you are returning the actual class (and not even the type)

Classes are types (they can be used as the second argument to isinstance).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But If you are chaning, try changing for all the dtypes? (or don't change)

We discussed earlier change CategoricalDtype.type to be object, but IIRC that was rejected (I don't recall the specifics).

Happy to change DatetimeTZDtypeType (to Timestamp).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you see if we have tests that hit any of this? (if not pls add). happy to change to either way, but should just change all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/dtypes/test_common.py::test__get_dtype_type hits this. Otherwise, the only path to here seems to be _get_dtype_or_type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And even then, for EA's it is actually never used in _get_dtype_or_type (I mean, there is no check function relying on that value, only for numpy dtypes)

elif isinstance(arr_or_dtype, PeriodDtype):
Expand All @@ -1915,7 +1915,7 @@ def _get_dtype_type(arr_or_dtype):
if is_categorical_dtype(arr_or_dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was rationale for not making this object for Categorical here? otherwise this is the odd man out

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall the previous discussion, but: categorical can basically be everything (so object would still be generic, that is true), but object is actually not a proper callable like the other types (you cannot do object(1), like you can do int(1), np.int64(1), or Period(..) ), so it doesn't follow the scheme.

In principle we could pass through the type of the categories, but the problem is that those can be None.

Since the type is actually never used (for the extension dtypes), I would also not bother too much about this one inconsistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joris is correct.

The only additional comment is that having a .type of object doesn't really tell you anything useful. But something like Period, Interval, or Timestamp actually is meaningful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so why don't we return None then? I think it is going to be very confusing that only Categorical as a DtypeType, but nothing else does. If you are getting rid of it, then remove it completely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None has the same drawbacks of object (cannot be called, is overly generic, does not hold any useful information). The CategoricalDtypeType at least has in its name it is the type of the CategoricalDtype

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with Joris.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then

return CategoricalDtypeType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

elif is_datetime64tz_dtype(arr_or_dtype):
return Timestmap
return Timestamp
elif is_period_dtype(arr_or_dtype):
return Period
elif is_interval_dtype(arr_or_dtype):
Expand Down