Skip to content

Major Performance regression of df.groupby(..).indices #38495

Closed
@bordingj

Description

@bordingj

I'm experiencing major performance regressions with pandas=1.1.5 versus 1.1.3

Version 1.1.3:

Python 3.7.9 | packaged by conda-forge | (default, Dec  9 2020, 20:36:16) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 7.19.0
Python 3.7.9 | packaged by conda-forge | (default, Dec  9 2020, 20:36:16) [MSC v.1916 64 bit (AMD64)] on win32
In[2]: import time
 ... : import numpy as np
 ... : import pandas as pd
 ... : pd.__version__
Out[2]: '1.1.3'
In[3]: numel = 10000000
 ... : df = pd.DataFrame(dict(a=np.random.rand(numel), b=np.random.randint(0,4000, numel)))
 ... : start = time.time()
 ... : groupby_indices = df.groupby('b').indices
 ... : time.time() - start
Out[3]: 0.46085023880004883

Version 1.1.5:

Python 3.7.9 | packaged by conda-forge | (default, Dec  9 2020, 20:36:16) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 7.19.0
Python 3.7.9 | packaged by conda-forge | (default, Dec  9 2020, 20:36:16) [MSC v.1916 64 bit (AMD64)] on win32
In[2]: import time
 ... : import numpy as np
 ... : import pandas as pd
 ... : pd.__version__
Out[2]: '1.1.5'
In[3]: numel = 10000000
 ... : df = pd.DataFrame(dict(a=np.random.rand(numel), b=np.random.randint(0,4000, numel)))
 ... : start = time.time()
 ... : groupby_indices = df.groupby('b').indices
 ... : time.time() - start
Out[3]: 57.36550998687744

Metadata

Metadata

Assignees

No one assigned

    Labels

    PerformanceMemory or execution speed performanceRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions