-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: ExtensionArray.argsort places the missing value at the end #27137
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@makbigc can you add a test in the base extension array tests?
diff --git a/pandas/tests/extension/base/methods.py b/pandas/tests/extension/base/methods.py
index d9e61e6a2..17315a52e 100644
--- a/pandas/tests/extension/base/methods.py
+++ b/pandas/tests/extension/base/methods.py
@@ -52,6 +52,11 @@ class BaseMethodsTests(BaseExtensionTests):
expected = pd.Series(np.array([1, -1, 0], dtype=np.int64))
self.assert_series_equal(result, expected)
+ def test_argsort_missing_array(self, data_missing_for_sorting):
+ result = np.argsort(data_missing_for_sorting)
+ expected = data_missing_for_sorting.take([2, 0, 1])
+ self.assert_extension_array_equal(result, expected)
+
@pytest.mark.parametrize('na_position, expected', [
('last', np.array([2, 0, 1], dtype=np.dtype('intp'))),
('first', np.array([1, 2, 0], dtype=np.dtype('intp')))
If you don't have time today, LMK. I may push some changes to get this in.
FYI @jorisvandenbossche just doing the breaking Categorical change here. |
As expected, a few failures in our test suite. Going through them next. I think https://travis-ci.org/pandas-dev/pandas/jobs/553767675#L2498 is buggy / defined strangely. (DataFrame.sort_index with a CategoricalIndex and ascending=False) # 0.24
In [2]: df = pd.DataFrame({"A": list(range(6))}, index=pd.Categorical(['a', 'a', 'b', 'b', 'c', 'a'], categories=['c', 'a', 'b']))
In [3]: df.sort_index(ascending=False)
Out[3]:
A
b 3
b 2
a 5
a 1
a 0
c 4
In [4]: df.set_index(df.index.codes).sort_index(ascending=False)
Out[4]:
A
2 2
2 3
1 0
1 1
1 5
0 4 # this PR
In [3]: df.sort_index()
Out[3]:
A
c 4
a 0
a 1
a 5
b 2
b 3
In [4]: df.set_index(df.index.codes).sort_index()
Out[4]:
A
0 4
1 0
1 1
1 5
2 2
2 3 I think that sorting by a CategoricalIndex should be the same as sorting by its codes. 0.24 (and master, before the PR) doesn't have that property. |
It's probably more a comment on the changes in #26854, but it seems that So the current indirection seems a bit wrong. At least depending on where (but that is maybe not blocking for the RC, most important is to get the actual behaviour change in) |
Hmm, so we would want to tell users that they should always override |
Not fully sure. We could also pass |
I'm also having trouble with the indirection. AFAICT, with this PR we end up in a situation where |
All green. I had to slightly adjust the test in 47c6b57 I don't have a 32-bit machine to test on, but it seems like we may be inconsistent with whether |
I believe this will break on the 32-bit builds; argsort is supposed to return platform ints. So I guess our current interface is kind wrong. But let's merge this and consider that. |
@@ -409,7 +410,8 @@ def argsort(self, ascending=True, kind='quicksort', *args, **kwargs): | |||
Returns | |||
------- | |||
index_array : ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we aren't specific that these are actually intp (and NOT int64's)
Let's open an issue about that to fix after the RC before final 0.25.0 ? |
Writing one now.
…On Wed, Jul 3, 2019 at 3:48 PM Joris Van den Bossche < ***@***.***> wrote:
AFAICT, with this PR we end up in a situation where ExtensionArray.argsort
could differ from other places in pandas that do nargsort(ExtensionArray).
Since nargsort Does pandas' usual sorting on the
ExtensionArray._values_for_argsort...
Let's open an issue about that to fix after the RC before final 0.25.0 ?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#27137?email_source=notifications&email_token=AAKAOIW4K5LZLPQF6565JFLP5UGDHA5CNFSM4H4LZM7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZFUMEQ#issuecomment-508249618>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIQWYYAC45IW5HCW3W3P5UGDHANCNFSM4H4LZM7A>
.
|
#27218, but my thinking on this
may be a bit unclear.
On Wed, Jul 3, 2019 at 3:49 PM Tom Augspurger <tom.augspurger88@gmail.com>
wrote:
… Writing one now.
On Wed, Jul 3, 2019 at 3:48 PM Joris Van den Bossche <
***@***.***> wrote:
> AFAICT, with this PR we end up in a situation where
> ExtensionArray.argsort could differ from other places in pandas that do
> nargsort(ExtensionArray). Since nargsort Does pandas' usual sorting on the
> ExtensionArray._values_for_argsort...
>
> Let's open an issue about that to fix after the RC before final 0.25.0 ?
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#27137?email_source=notifications&email_token=AAKAOIW4K5LZLPQF6565JFLP5UGDHA5CNFSM4H4LZM7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZFUMEQ#issuecomment-508249618>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAKAOIQWYYAC45IW5HCW3W3P5UGDHANCNFSM4H4LZM7A>
> .
>
|
continue #26354
xref #21801 (doesn't close; Need to update existing EAs)