-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
repr string for pd.Grouper #17727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
repr string for pd.Grouper #17727
Conversation
Codecov Report
@@ Coverage Diff @@
## master #17727 +/- ##
==========================================
- Coverage 91.25% 91.24% -0.02%
==========================================
Files 163 163
Lines 50130 49876 -254
==========================================
- Hits 45748 45509 -239
+ Misses 4382 4367 -15
Continue to review full report at Codecov.
|
76ee87b
to
c399edf
Compare
@topper-123 : The |
Yeah, I just started doing this for a personal use case and thought it could be nice to have in pandas itself. so I maybe took two steps when only one was all that was needed. I felt silly to post a issue when the idea already was coded... My main concern was if you dislike caching the repr in pandas and/or the if the approach should be less complex. |
why are you doing this? this just makes the code confusing |
Timegrouper is deprecated |
@topper-123 : The point of opening an issue is to get a consensus on whether we actually think it's a good idea to do this in the first place. |
@topper-123 note that I DO like the idea of a nice repr for |
c399edf
to
7704774
Compare
Hello @topper-123! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on November 06, 2017 at 22:21 Hours UTC |
7704774
to
6841c62
Compare
pandas/core/groupby.py
Outdated
@@ -333,6 +333,16 @@ def _set_grouper(self, obj, sort=False): | |||
def groups(self): | |||
return self.grouper.groups | |||
|
|||
_init_defaults = dict(key=None, level=None, freq=None, axis=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in other context we call this _attributes
(I think we do this kind of signature generation already in resample). pls try to find examples and make consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, changed
6841c62
to
d4067fc
Compare
I've changed the code:
|
021921b
to
c52b439
Compare
Tests added. I've finished up, unless more comments . EDIT: Moved the test for TimeGrouper to test_resample.py. |
e36d2c3
to
da051ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some issues
pandas/core/groupby.py
Outdated
@@ -333,6 +333,17 @@ def _set_grouper(self, obj, sort=False): | |||
def groups(self): | |||
return self.grouper.groups | |||
|
|||
_attributes = dict(key=None, level=None, freq=None, axis=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to the top of the class (_attributes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be an OrderedDict. Though I wouldn't actually make this a dict at all, rather a list and just introspect the current values (which picks up the defaults, no need to have them in multiple places)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Used OrderDict.
The TimeGrouper in particular is even uglier now, but there are quite a bit of special cases there, so cant see a better way.
pandas/core/resample.py
Outdated
@@ -1286,6 +1286,29 @@ def _get_period_bins(self, ax): | |||
|
|||
return binner, bins, labels | |||
|
|||
# _attributes is used in __repr__below | |||
_attributes = Grouper._attributes.copy() | |||
_attributes.update(freq='Min', how='mean', nperiods=None, axis=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just reset them to what they should be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are overwriting an already set _attributes value (see the top of the class)
pandas/core/resample.py
Outdated
@@ -1028,7 +1028,7 @@ def __init__(self, freq='Min', closed=None, label=None, how='mean', | |||
convention=None, base=0, **kwargs): | |||
freq = to_offset(freq) | |||
|
|||
end_types = set(['M', 'A', 'Q', 'BM', 'BA', 'BQ', 'W']) | |||
self._end_types = end_types = {'M', 'A', 'Q', 'BM', 'BA', 'BQ', 'W'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, can see this is confusing. Moved it to class attribute.
93a2d42
to
236b12d
Compare
pandas/core/groupby.py
Outdated
@@ -233,6 +233,10 @@ class Grouper(object): | |||
|
|||
>>> df.groupby(Grouper(level='date', freq='60s', axis=1)) | |||
""" | |||
_attributes = collections.OrderedDict((('key', None), ('level', None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as i said before these should just be a list
and _attributes is already defined IIRC you are overwriting it i think
2a8f098
to
967512d
Compare
I've redone this. It's almost impossible to set By setting |
@topper-123 again, not sure why you want to repeat code for the default values. you simply need to introspect them, NOT redefine the default. |
pandas/core/groupby.py
Outdated
@@ -252,6 +252,11 @@ def __init__(self, key=None, level=None, freq=None, axis=0, sort=False): | |||
self.indexer = None | |||
self.binner = None | |||
|
|||
# _attributes is used in __repr__below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just not needed at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_attributes should be a list of the actual attributes, NOT attached to the default values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is already done for the resampler to produce the signature, just follow the example.
13e1125
to
33c4a81
Compare
The travis error seems to be unrelated to my PR. I've made some changes that I think correspond to your comments. Note that this makes >>> Grouper(key='key', freq='50Min', label='right')
TimeGrouper(key='key', freq=<50 * Minutes>, axis=0, sort=True, closed='left', label='right', how='mean', loffset=None) I've made a version where only input is displayed, but is that code quite complex, because there are so many special cases in Is this ok? |
33c4a81
to
62f2de8
Compare
@@ -333,6 +335,17 @@ def _set_grouper(self, obj, sort=False): | |||
def groups(self): | |||
return self.grouper.groups | |||
|
|||
def __repr__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why you need to introspect at all. when repr is called all of the values are set, simply iterate thru them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is to get the difference between the default values and the actual value, and only use attrubutes that deviate from the default values.
superseded by #18203 |
git diff upstream/master -u -- "*.py" | flake8 --diff
Edit: I've made a full proposal (sans some discussion points below)
Currently the repr for
Grouper
andTimeGrouper
is not pretty:I propose adding a
Grouper.__repr__
, so the repr will be like:The repr shows the instantiation form, so users could copy the repr and paste it in to use it, which is always nice.
See attached PR. Tests are still missing. Comments welcome.
Two points:
TimeGrouper
deprecated?