Skip to content

BUG: RecursionError using agg on a resampled SeriesGroupBy #42905

Closed
@manoelpqueiroz

Description

@manoelpqueiroz
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.


When you mix resample with groupby and try to use the agg method to supply multiple functions to either a DataFrameGroupBy or SeriesGroupBy, Python suddently exits without even raising an error.

I first thought I was running into this because I was supplying a single column expecting a DataFrame with multiple columns, but I can confirm this happens to me whether I provide a column (variable b) or apply the method to the entire GroupBy (variable c):

Code Sample

import pandas as pd

a = pd.DataFrame({
    'class': {
        0: 'beta', 1: 'alpha', 2: 'alpha', 3: 'gaga', 4: 'beta', 5: 'gaga',
        6: 'beta', 7: 'gaga', 8: 'beta', 9: 'gaga', 10: 'alpha', 11: 'beta',
        12: 'alpha', 13: 'gaga', 14: 'alpha'},
    'value': {
        0: 69, 1: 33, 2: 40, 3: 2, 4: 36, 5: 40, 6: 48, 7: 84, 8: 77, 9: 22,
        10: 55, 11: 82, 12: 37, 13: 88, 14: 41},
    'date': {
        0: pd.Timestamp('2021-02-28 00:00:00'),
        1: pd.Timestamp('2021-11-30 00:00:00'),
        2: pd.Timestamp('2021-02-28 00:00:00'),
        3: pd.Timestamp('2021-04-30 00:00:00'),
        4: pd.Timestamp('2021-02-28 00:00:00'),
        5: pd.Timestamp('2021-04-30 00:00:00'),
        6: pd.Timestamp('2021-07-31 00:00:00'),
        7: pd.Timestamp('2021-01-31 00:00:00'),
        8: pd.Timestamp('2021-01-31 00:00:00'),
        9: pd.Timestamp('2021-01-31 00:00:00'),
        10: pd.Timestamp('2021-04-30 00:00:00'),
        11: pd.Timestamp('2021-10-31 00:00:00'),
        12: pd.Timestamp('2021-09-30 00:00:00'),
        13: pd.Timestamp('2021-04-30 00:00:00'),
        14: pd.Timestamp('2021-05-31 00:00:00')}})

# This will exit Python
b = a\
    .set_index('date')\
    .groupby('class')\
    .resample('M')['value']\
    .agg(['sum', 'size'])

# Not informing a column will ALSO make Python exit
c = a\
    .set_index('date')\
    .groupby('class')\
    .resample('M')\
    .agg(['sum', 'size'])

Problem description

I'm not sure if this method is supported for instances of DatetimeIndexResamplerGroupby objects, but calling it without arguments is valid, giving:

<bound method Resampler.aggregate of <pandas.core.resample.DatetimeIndexResamplerGroupby object at 0x00000163B22B0100>>

Also, while the problem arises with either a Series or a DataFrame, given that using agg with multiple functions on a SeriesGroupBy will correctly create a DataFrame, I would expect the same to happen when resampling with timestamps:

In [1]: a.groupby('class')['value'].agg(['sum', 'size'])
Out[1]:
       sum  size
class
alpha  206     5
beta   312     5
gaga   236     5

Expected Output

                  sum  size
class date
alpha 2021-02-28   40     1
      2021-03-31    0     0
      2021-04-30   55     1
      2021-05-31   41     1
      2021-06-30    0     0
      2021-07-31    0     0
      2021-08-31    0     0
      2021-09-30   37     1
      2021-10-31    0     0
      2021-11-30   33     1
beta  2021-01-31   77     1
      2021-02-28  105     2
      2021-03-31    0     0
      2021-04-30    0     0
      2021-05-31    0     0
      2021-06-30    0     0
      2021-07-31   48     1
      2021-08-31    0     0
      2021-09-30    0     0
      2021-10-31   82     1
gaga  2021-01-31  106     2
      2021-02-28    0     0
      2021-03-31    0     0
      2021-04-30  130     3

Output of pd.show_versions()

INSTALLED VERSIONS

commit : c7f7443
python : 3.9.2.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : pt_BR.cp1252

pandas : 1.3.1
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
pip : 21.2.1
setuptools : 49.2.1
Cython : None
pytest : None
hypothesis : None
sphinx : 3.5.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.24.1
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.1
numexpr : None
odfpy : None
openpyxl : 3.0.6
pandas_gbq : None
pyarrow : None
pyxlsb : 1.0.8
s3fs : None
scipy : 1.7.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapBugGroupbyRegressionFunctionality that used to work in a prior pandas versionResampleresample method

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions