-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: GroupBy.get_group raises ValueError when group key contains NaT #6996
Conversation
@@ -1809,7 +1810,13 @@ def _make_labels(self): | |||
@property | |||
def groups(self): | |||
if self._groups is None: | |||
self._groups = self.index.groupby(self.grouper) | |||
if isinstance(self.grouper, DatetimeIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think better to add an argument to Index.groupby
and DatetimeIndex.groupby
, maybe exclude_missing=False
as the default (and you will pass True
here), then you can leave groupby alone and do the exlusions inside the index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've changed the logic and would like to ask about keyword.
Considering #3729, I preferred to dropna
as a keyword, and now it uses dropna
. But looks better to use exclude_missing
because pivot_table
already has dropna
for different purpose (to exclude index/columns from result which values are all NaN).
Is it OK to add exclude_missing
to Dataframe.groupby
and pivot_table
as new keyword?
Also, I think default should be dropna=True
based on current behavior which exclude NaT
as nan
.
you can't have nan in the int groupby nor datetime the cython routines are very strict about types so u don't need to check for a non existsnt type furthermore the datetimes won't be passed directly but as a i8 dtype |
I understand Existing cython logic doesn't care about index instance, but check |
@@ -1924,7 +1925,7 @@ def groupby_indices(ndarray values): | |||
k = labels[i] | |||
|
|||
# was NaN | |||
if k == -1: | |||
if k == -1 or k is tslib.NaT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do something like
cdef Py_ssize_t NaT = tslib.NaT
# ...
k == NaT
no need for any overhead of attribute access.
In above example to group by datetime column,
|
hmm... |
@jreback The behavior seems to be caused by
|
@sinhrks that doesn't seem a problem, its when its actually passed to cython that needs conversion (to and from), can you investigate? |
@sinhrks can you rebase and address comments? |
@jreback You mean to convert Conversion looks possible if |
I think u should convert to i8 if possible (and reverse after) - evn though I guess it's handled as object now |
#7560 does this conversion now (actually these are converted to float then coerced back). So rebase after I merge that and let's take another look |
can you show groupby benches after this change..thanks |
Thanks, but #7560 doesn't convert group keys, correct? Thus, I think dtype should be converted in |
Ahh, yes was thinking of something else.
|
OK, let me try. Maybe there is no side-effect, because currently |
@jreback I made it work when group-key has datelike dtypes, is this what you mean? Have to consider object dtype case and better way to coercing dtype back. |
can you rebase and revisit this...thxs |
pushing if you get 2 this in next day or 2 lmk |
@sinhrks let's do this for 0.17.0 |
1731852
to
ca22e12
Compare
Yes, fixed to work on the latest master. |
@sinhrks don't add ANOTHER nat checker, use https://github.com/pydata/pandas/blob/master/pandas/src/inference.pyx#L228 |
f91a28c
to
fe8860e
Compare
Thanks, I haven't noticed that. |
@@ -2010,7 +2012,7 @@ def groupby_indices(ndarray values): | |||
k = labels[i] | |||
|
|||
# was NaN | |||
if k == -1: | |||
if k == -1 or k is NaT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't look right. these are indices, not actual objects, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, actually no need to change algos.pyx. Fixed.
c423fcd
to
6e79756
Compare
@jreback pls check if there is anything I should do? |
lgtm....go ahead and merged |
Thanks, and sorry I found the doc header alignment is incorrect. Will fix it then merge if nothing else. |
BUG: GroupBy.get_group raises ValueError when group key contains NaT
Closes #6992. Made
GroupBy.get_group
works even if the key containsNaT
.NOTE: One issue is that
GroupBy.groups
returns incorrect key in numpy 1.6. This seems to be caused by_convert_grouper
usesgrouper.reindex(axis).values
to return value. This looks doesn't affect to main functionalities, but is there any expected result?