Skip to content

Categorical column fails in groupby + transform. #29037

Closed
@stefansimik

Description

@stefansimik

Code Sample, a copy-pastable example if possible

import pandas as pd

# Create demo data for package delivery
df = pd.DataFrame({'package_id': [1, 1, 1, 2, 2, 3],
                   'status': ['Waiting', 'OnTheWay', 'Delivered', 'Waiting', 'OnTheWay', 'Waiting']})

# Status column: Make ordinal
delivery_status_type = pd.CategoricalDtype(categories=['Waiting', 'OnTheWay', 'Delivered'], ordered=True)
df['status'] = df['status'].astype(delivery_status_type)


# CALCULATE LAST-STATUS FOR EACH PACKAGE

# Way 1: (works OK)
# df['last_status'] = df.groupby('package_id')['status'].transform(lambda x: x.max())

# Way 2: Fails. Let's fix it.
df['last_status'] = df.groupby('package_id')['status'].transform(max)
df

Problem description

Problem is, that code above fails with error:

AttributeError                            Traceback (most recent call last)
<ipython-input-26-174c91615d45> in <module>
----> 1 df['last_status'] = df.groupby('package_id')['delivery_status'].transform(max)
      2 df

~\Miniconda3\lib\site-packages\pandas\core\groupby\generic.py in transform(self, func, *args, **kwargs)
   1015                 # cythonized aggregation and merge
   1016                 return self._transform_fast(
-> 1017                     lambda: getattr(self, func)(*args, **kwargs), func
   1018                 )
   1019 

~\Miniconda3\lib\site-packages\pandas\core\groupby\generic.py in _transform_fast(self, func, func_nm)
   1062         ids, _, ngroup = self.grouper.group_info
   1063         cast = self._transform_should_cast(func_nm)
-> 1064         out = algorithms.take_1d(func()._values, ids)
   1065         if cast:
   1066             out = self._try_cast(out, self.obj)

AttributeError: 'Categorical' object has no attribute '_values'

Expected Output

Code should work without error (especially Way 2)
and produce the same result as Way 1

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 0.25.1
numpy : 1.16.5
pytz : 2019.3
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.4.0
Cython : None
pytest : 5.0.1
hypothesis : None
sphinx : 2.2.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions