Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppress incorrect warning in nargsort for timezone-aware DatetimeIndex #25629

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Suppress incorrect warning in nargsort for timezone-aware DatetimeIndex
  • Loading branch information
jorisvandenbossche committed Mar 10, 2019
commit cd11341b1441150721e7e4823b32bbcb0fae4ef7
9 changes: 8 additions & 1 deletion pandas/core/sorting.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
""" miscellaneous sorting / groupby utilities """
import warnings

import numpy as np

Expand Down Expand Up @@ -254,7 +255,13 @@ def nargsort(items, kind='quicksort', ascending=True, na_position='last'):
sorted_idx = np.roll(sorted_idx, cnt_null)
return sorted_idx

items = np.asanyarray(items)
with warnings.catch_warnings():
# https://github.com/pandas-dev/pandas/issues/25439
# can be removed once ExtensionArrays are properly handled by nargsort
warnings.filterwarnings(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we call .to_numpy() here instead? (I think think always is an index), or is this a performance issue? (until we have real EA support)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not opposed to this as an interim solution, just want to understand

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, to_numpy has the future behaviour (i.e. converting to object dtype), while here we want to have the current M8[ns]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But of course, with to_numpy(dtype='datetime64[ns]'), you can get that.

In that case, we have to do a check here for datetimetz data; something like:

if is_datetimetz_dtype(items):
    items = items.to_numpy(dtype='datetime64[ns]')
else:
    items = np.asanyarray(items)

It's also custom code for this case, but if the above is preferable, happy to change (personally don't have a strong favor for either of them).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think this needs to be sorted, but ok for now with your existing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however, shall we leave this or a follow-on open to fix this/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to expand PR #25595 to include a proper fix (for 0.25.0), involving a clean-up of EA.argsort, which will be the follow-up on this.
And apparently, I already opened an issue about the argsort API problem a few months ago: #21801 (will update that issue with the new information)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok cool.

"ignore", category=FutureWarning,
message="Converting timezone-aware DatetimeArray to")
items = np.asanyarray(items)
idx = np.arange(len(items))
mask = isna(items)
non_nans = items[~mask]
Expand Down
10 changes: 9 additions & 1 deletion pandas/tests/test_sorting.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@

from pandas.compat import PY2

from pandas import DataFrame, MultiIndex, Series, compat, concat, merge
from pandas import (
DataFrame, MultiIndex, Series, compat, concat, merge, to_datetime)
from pandas.core import common as com
from pandas.core.sorting import (
decons_group_index, get_group_index, is_int64_overflow_possible,
Expand Down Expand Up @@ -183,6 +184,13 @@ def test_nargsort(self):
exp = list(range(5)) + list(range(105, 110)) + list(range(104, 4, -1))
tm.assert_numpy_array_equal(result, np.array(exp), check_dtype=False)

def test_nargsort_datetimearray_warning(self):
# https://github.com/pandas-dev/pandas/issues/25439
# can be removed once the FutureWarning for np.array(DTA) is removed
data = to_datetime([0, 2, 0, 1]).tz_localize('Europe/Brussels')
with tm.assert_produces_warning(None):
nargsort(data)


class TestMerge(object):

Expand Down