repr string for pd.Grouper #17727

topper-123 · 2017-09-30T12:57:07Z

closes #xxxx
[x ] tests added / passed
[ x] passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Edit: I've made a full proposal (sans some discussion points below)

Currently the repr for Grouper and TimeGrouper is not pretty:

>>> pd.Grouper(key='key')
<pandas.core.groupby.Grouper at 0x248d5ebfd30>
>>> pd.Grouper(key='key', freq='50Min')
<pandas.core.resample.TimeGrouper at 0x248d68d95c0>

I propose adding a Grouper.__repr__, so the repr will be like:

>>> pd.Grouper(key='key')
Grouper(key='key')
>>> pd.Grouper(key='key', freq='50Min')
TimeGrouper(key='key', freq='50T')
>>> pd.Grouper(key='key', freq='50Min', label='right')
TimeGrouper(label='right', key='k', freq='50T')

The repr shows the instantiation form, so users could copy the repr and paste it in to use it, which is always nice.

See attached PR. Tests are still missing. Comments welcome.

Two points:

The repr calculation is a bit heavy, so I've cached it. Don't know if that is going overboard?
Is TimeGrouper deprecated?

codecov · 2017-09-30T13:34:12Z

Codecov Report

Merging #17727 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #17727      +/-   ##
==========================================
- Coverage   91.25%   91.24%   -0.02%     
==========================================
  Files         163      163              
  Lines       50130    49876     -254     
==========================================
- Hits        45748    45509     -239     
+ Misses       4382     4367      -15

Flag	Coverage Δ
#multiple	`89.04% <100%> (-0.01%)`	⬇️
#single	`40.27% <75%> (-0.12%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/groupby.py	`92.04% <100%> (ø)`	⬆️
pandas/core/resample.py	`96.18% <100%> (+0.02%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/compat/pickle_compat.py	`69.51% <0%> (-6.1%)`	⬇️
pandas/core/indexes/range.py	`92.83% <0%> (-2.83%)`	⬇️
pandas/util/_decorators.py	`78% <0%> (-2.71%)`	⬇️
pandas/util/_validators.py	`93.75% <0%> (-2.6%)`	⬇️
pandas/io/html.py	`84.85% <0%> (-1.13%)`	⬇️
pandas/core/dtypes/concat.py	`98.26% <0%> (-0.89%)`	⬇️
pandas/io/common.py	`68.64% <0%> (-0.85%)`	⬇️
... and 52 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c176a3c...62f2de8. Read the comment docs.

gfyoung · 2017-10-01T06:56:36Z

@topper-123 : The __repr__ looks reasonable to me, though in the future, would have been good to create an issue first to get consensus before proceeding to a PR.

topper-123 · 2017-10-01T14:26:24Z

Yeah, I just started doing this for a personal use case and thought it could be nice to have in pandas itself. so I maybe took two steps when only one was all that was needed. I felt silly to post a issue when the idea already was coded... My main concern was if you dislike caching the repr in pandas and/or the if the approach should be less complex.

jreback · 2017-10-01T14:30:23Z

why are you doing this?
if it actually is expensive to compute the repr we have the cache_readyonly decorator for that

this just makes the code confusing

jreback · 2017-10-01T14:31:45Z

Timegrouper is deprecated

gfyoung · 2017-10-01T17:54:12Z

I felt silly to post a issue when the idea already was coded

@topper-123 : The point of opening an issue is to get a consensus on whether we actually think it's a good idea to do this in the first place.

jreback · 2017-10-01T17:58:29Z

@topper-123 note that I DO like the idea of a nice repr for Grouper, just don't think the caching is needed.

pep8speaks · 2017-10-01T20:42:09Z

Hello @topper-123! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on November 06, 2017 at 22:21 Hours UTC

jreback · 2017-10-01T20:46:04Z

pandas/core/groupby.py

@@ -333,6 +333,16 @@ def _set_grouper(self, obj, sort=False):
    def groups(self):
        return self.grouper.groups

+    _init_defaults = dict(key=None, level=None, freq=None, axis=0,


in other context we call this _attributes (I think we do this kind of signature generation already in resample). pls try to find examples and make consistent.

Ok, changed

topper-123 · 2017-10-01T20:56:16Z

I've changed the code:

no more caching, always show current values
the init default parameters are in a class attribute; improves subclassability (user can change them upon subclassing)
default closed and label for TimeGrouper are calculated at call time, so they will also always be correct now
ènd_types have one location, which also is better.
TimeGrouper still got a repr string: It looks to me that the top level location has been deprecated, but it will still be called through Grouper.__new__?

topper-123 · 2017-10-02T08:43:00Z

Tests added. I've finished up, unless more comments .

EDIT: Moved the test for TimeGrouper to test_resample.py.

jreback

some issues

jreback · 2017-10-02T12:38:32Z

pandas/core/groupby.py

@@ -333,6 +333,17 @@ def _set_grouper(self, obj, sort=False):
    def groups(self):
        return self.grouper.groups

+    _attributes = dict(key=None, level=None, freq=None, axis=0,


move this to the top of the class (_attributes)

should be an OrderedDict. Though I wouldn't actually make this a dict at all, rather a list and just introspect the current values (which picks up the defaults, no need to have them in multiple places)

Ok. Used OrderDict.

The TimeGrouper in particular is even uglier now, but there are quite a bit of special cases there, so cant see a better way.

jreback · 2017-10-02T12:39:52Z

pandas/core/resample.py

@@ -1286,6 +1286,29 @@ def _get_period_bins(self, ax):

        return binner, bins, labels

+    # _attributes is used in __repr__below
+    _attributes = Grouper._attributes.copy()
+    _attributes.update(freq='Min', how='mean', nperiods=None, axis=0,


just reset them to what they should be

you are overwriting an already set _attributes value (see the top of the class)

jreback · 2017-10-02T12:40:43Z

pandas/core/resample.py

@@ -1028,7 +1028,7 @@ def __init__(self, freq='Min', closed=None, label=None, how='mean',
                 convention=None, base=0, **kwargs):
        freq = to_offset(freq)

-        end_types = set(['M', 'A', 'Q', 'BM', 'BA', 'BQ', 'W'])
+        self._end_types = end_types = {'M', 'A', 'Q', 'BM', 'BA', 'BQ', 'W'}


Ok, can see this is confusing. Moved it to class attribute.

jreback · 2017-10-03T08:12:53Z

pandas/core/groupby.py

@@ -233,6 +233,10 @@ class Grouper(object):

    >>> df.groupby(Grouper(level='date', freq='60s', axis=1))
    """
+    _attributes = collections.OrderedDict((('key', None), ('level', None),


as i said before these should just be a list
and _attributes is already defined IIRC you are overwriting it i think

topper-123 · 2017-10-03T15:47:12Z

I've redone this.

It's almost impossible to set _attributes as a list before __init__ for TimeGrouper because there's quite a lot of logic in TimeGrouper.__init__ for what should count as a default value.

By setting _attributes right after __init__ and setting it to an OrderedDict, the implementation works relatively simple, also for TimeGrouper.

jreback · 2017-10-03T16:44:15Z

@topper-123 again, not sure why you want to repeat code for the default values. you simply need to introspect them, NOT redefine the default.

jreback · 2017-10-03T16:44:29Z

pandas/core/groupby.py

@@ -252,6 +252,11 @@ def __init__(self, key=None, level=None, freq=None, axis=0, sort=False):
        self.indexer = None
        self.binner = None

+    # _attributes is used in __repr__below


this is just not needed at all

_attributes should be a list of the actual attributes, NOT attached to the default values.

this is already done for the resampler to produce the signature, just follow the example.

topper-123 · 2017-10-12T11:13:00Z

The travis error seems to be unrelated to my PR.

I've made some changes that I think correspond to your comments. Note that this makes TimeGrouper more verbose:

>>> Grouper(key='key', freq='50Min', label='right')
TimeGrouper(key='key', freq=<50 * Minutes>, axis=0, sort=True, closed='left', label='right', how='mean', loffset=None)

I've made a version where only input is displayed, but is that code quite complex, because there are so many special cases in TimeGrouper.__init__.

Is this ok?

…arguments for TimeGrouper

jreback · 2017-11-07T13:25:07Z

pandas/core/groupby.py

@@ -333,6 +335,17 @@ def _set_grouper(self, obj, sort=False):
    def groups(self):
        return self.grouper.groups

+    def __repr__(self):


not sure why you need to introspect at all. when repr is called all of the values are set, simply iterate thru them.

It is to get the difference between the default values and the actual value, and only use attrubutes that deviate from the default values.

jreback · 2017-11-09T22:31:19Z

superseded by #18203

topper-123 force-pushed the Grouper_repr branch from 76ee87b to c399edf Compare September 30, 2017 14:09

topper-123 changed the title ~~WIP: repr string for pd.Grouper~~ repr string for pd.Grouper Sep 30, 2017

gfyoung added Enhancement Groupby labels Oct 1, 2017

topper-123 force-pushed the Grouper_repr branch from c399edf to 7704774 Compare October 1, 2017 20:42

topper-123 force-pushed the Grouper_repr branch from 7704774 to 6841c62 Compare October 1, 2017 20:44

jreback reviewed Oct 1, 2017

View reviewed changes

topper-123 force-pushed the Grouper_repr branch from 6841c62 to d4067fc Compare October 1, 2017 20:49

topper-123 force-pushed the Grouper_repr branch 3 times, most recently from 021921b to c52b439 Compare October 2, 2017 08:42

topper-123 force-pushed the Grouper_repr branch 2 times, most recently from e36d2c3 to da051ce Compare October 2, 2017 12:37

jreback requested changes Oct 2, 2017

View reviewed changes

jreback added the Output-Formatting __repr__ of pandas objects, to_string label Oct 2, 2017

topper-123 force-pushed the Grouper_repr branch 2 times, most recently from 93a2d42 to 236b12d Compare October 2, 2017 23:58

jreback requested changes Oct 3, 2017

View reviewed changes

topper-123 force-pushed the Grouper_repr branch from 2a8f098 to 967512d Compare October 3, 2017 15:40

jreback requested changes Oct 3, 2017

View reviewed changes

topper-123 force-pushed the Grouper_repr branch from 13e1125 to 33c4a81 Compare October 12, 2017 09:27

tp added 3 commits November 6, 2017 22:19

Added repr string for Grouper and TimeGrouper

63bbbde

Changed _attributes to use signature + changed tests to have ordered …

5595156

…arguments for TimeGrouper

New attempt at Grouper.__repr__

62f2de8

topper-123 force-pushed the Grouper_repr branch from 33c4a81 to 62f2de8 Compare November 6, 2017 22:21

jreback requested changes Nov 7, 2017

View reviewed changes

topper-123 mentioned this pull request Nov 9, 2017

Added repr string for Grouper and TimeGrouper #18203

Merged

3 tasks

jreback closed this Nov 9, 2017

topper-123 deleted the Grouper_repr branch December 11, 2017 08:09

Uh oh!

repr string for pd.Grouper #17727

repr string for pd.Grouper #17727

Uh oh!

Conversation

topper-123 commented Sep 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gfyoung commented Oct 1, 2017

Uh oh!

topper-123 commented Oct 1, 2017

Uh oh!

jreback commented Oct 1, 2017

Uh oh!

jreback commented Oct 1, 2017

Uh oh!

gfyoung commented Oct 1, 2017

Uh oh!

jreback commented Oct 1, 2017

Uh oh!

pep8speaks commented Oct 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on November 06, 2017 at 22:21 Hours UTC

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

topper-123 commented Oct 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

topper-123 commented Oct 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

topper-123 Oct 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

topper-123 commented Oct 3, 2017

Uh oh!

jreback commented Oct 3, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

topper-123 commented Oct 12, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Nov 9, 2017

Uh oh!

Uh oh!

topper-123 commented Sep 30, 2017 •

edited

Loading

codecov bot commented Sep 30, 2017 •

edited

Loading

pep8speaks commented Oct 1, 2017 •

edited

Loading

topper-123 commented Oct 1, 2017 •

edited

Loading

topper-123 commented Oct 2, 2017 •

edited

Loading

topper-123 Oct 3, 2017 •

edited

Loading