Closed
Description
I'm experiencing major performance regressions with pandas=1.1.5 versus 1.1.3
Version 1.1.3:
Python 3.7.9 | packaged by conda-forge | (default, Dec 9 2020, 20:36:16) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 7.19.0
Python 3.7.9 | packaged by conda-forge | (default, Dec 9 2020, 20:36:16) [MSC v.1916 64 bit (AMD64)] on win32
In[2]: import time
... : import numpy as np
... : import pandas as pd
... : pd.__version__
Out[2]: '1.1.3'
In[3]: numel = 10000000
... : df = pd.DataFrame(dict(a=np.random.rand(numel), b=np.random.randint(0,4000, numel)))
... : start = time.time()
... : groupby_indices = df.groupby('b').indices
... : time.time() - start
Out[3]: 0.46085023880004883
Version 1.1.5:
Python 3.7.9 | packaged by conda-forge | (default, Dec 9 2020, 20:36:16) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 7.19.0
Python 3.7.9 | packaged by conda-forge | (default, Dec 9 2020, 20:36:16) [MSC v.1916 64 bit (AMD64)] on win32
In[2]: import time
... : import numpy as np
... : import pandas as pd
... : pd.__version__
Out[2]: '1.1.5'
In[3]: numel = 10000000
... : df = pd.DataFrame(dict(a=np.random.rand(numel), b=np.random.randint(0,4000, numel)))
... : start = time.time()
... : groupby_indices = df.groupby('b').indices
... : time.time() - start
Out[3]: 57.36550998687744