List min/max operations on List[Enum] are returning Strings #19269

etrotta · 2024-10-16T23:07:33Z

Checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
en = pl.Enum(['X', 'Z', 'Y'])
df = pl.DataFrame({'test': pl.Series(['X', 'Y', 'Z'], dtype=en), 'group': [1, 2, 2]})
print(df.group_by('group').agg(pl.col('test').mode()).select(pl.col('test').list.max()))

# It is returning the wrong dtype. Other list operations like `.first()` instead of `.max()`return the right dtype`
print(df.group_by('group').agg(pl.col('test').mode()).select(pl.col('test').list.first()))
# It is using the correct Enum ordering for the max() operation though, which you can verify by comparing the result against
df = pl.DataFrame({'test': pl.Series(['X', 'Y', 'Z'], dtype=str), 'group': [1, 2, 2]})

print(df.group_by('group').agg(pl.col('test').mode()).select(pl.col('test').list.max()))

Log output

stderr:
keys/aggregates are not partitionable: running default HASH AGGREGATION
keys/aggregates are not partitionable: running default HASH AGGREGATION
keys/aggregates are not partitionable: running default HASH AGGREGATION

stdout:
# list[enum] .list.max() ; correct values, wrong dtype
shape: (2, 1)
┌──────┐
│ test │
│ ---  │
│ str  │
╞══════╡
│ Y    │
│ X    │
└──────┘
# list[enum] .list.first() ; correct dtype for reference
shape: (2, 1)
┌──────┐
│ test │
│ ---  │
│ enum │
╞══════╡
│ X    │
│ Y    │
└──────┘
# list[str] list.max() ; just for reference to check that it is actually different from list[enum]'s .max()
shape: (2, 1)
┌──────┐
│ test │
│ ---  │
│ str  │
╞══════╡
│ X    │
│ Z    │
└──────┘

Issue description

In Lazy mode, collect_schema() also ends with up a result different from collect().schema,

>>> df.lazy().group_by('group').agg(pl.col('test').mode()).select(pl.col('test').list.max()).collect_schema()
Schema({'test': Enum(categories=['X', 'Z', 'Y'])})
>>> df.lazy().group_by('group').agg(pl.col('test').mode()).select(pl.col('test').list.max()).collect().schema
keys/aggregates are not partitionable: running default HASH AGGREGATION
Schema({'test': String})

Might be related to #18394

Expected behavior

The .list.max() operation applied on a list[Enum] should return a series of dtype Enum rather than str

Installed versions

--------Version info---------
Polars:              1.9.0
Index type:          UInt32
Platform:            Windows-11-10.0.22631-SP0
Python:              3.12.0 (tags/v3.12.0:0fb18b0, Oct  2 2023, 13:03:39) [MSC v.1935 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            0.9.0
fsspec               <not installed>
gevent               <not installed>
great_tables         <not installed>
matplotlib           3.8.2
nest_asyncio         <not installed>
numpy                1.26.4
openpyxl             3.1.2
pandas               2.2.2
pyarrow              15.0.0
pydantic             2.5.2
pyiceberg            <not installed>
sqlalchemy           2.0.29
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>

The text was updated successfully, but these errors were encountered:

etrotta added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Oct 16, 2024

ritchie46 mentioned this issue Oct 18, 2024

fix: Fix enum scalar output #19301

Merged

ritchie46 closed this as completed in #19301 Oct 18, 2024

cmdlineluser mentioned this issue Oct 18, 2024

min and max return all nulls in pl.Enum #18394

Open

c-peters added the accepted Ready for implementation label Oct 21, 2024

c-peters assigned ritchie46 Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List min/max operations on List[Enum] are returning Strings #19269

List min/max operations on List[Enum] are returning Strings #19269

etrotta commented Oct 16, 2024

List min/max operations on List[Enum] are returning Strings #19269

List min/max operations on List[Enum] are returning Strings #19269

Comments

etrotta commented Oct 16, 2024

Checks

Reproducible example

Log output

Issue description

Expected behavior

Installed versions