Skip to content

Groupby dictionary aggregation works incorrect in cases when 'by' and columns to aggregate has intersection #3376

@gshimansky

Description

@gshimansky

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):

Ubuntu 20.04.3 LTS

  • Modin version (modin.__version__):

0.10.2+2.g253ecd98.dirty

  • Python version:

Python 3.8.6

  • Code we can use to reproduce:
import modin.pandas as pd

df = pd.DataFrame({
    "col0": [ 1,  2,  1,  2],
    "col1": [11, 12, 13, 14],
    "col2": [ 4,  4,  4,  4],
    "col3": [ 1,  1,  1,  1]
})

print(df.shape)
gb = df.groupby("col0")
result = gb.agg({"col0": "count"})
string = "%s" % repr(result)
print(string)

Describe the problem

Code above works on Pandas but produces a ValueError: No objects to concatenate exception on Modin on Python and Ray engines (those that I tried). I traced the problem down to these lines

        if drop and len(df.columns.intersection(by_part)) > 0:
            df.drop(columns=by_part, errors="ignore", inplace=True)

that are located inside of reduce function. As a result of these lines df that contains only one column col0 gets empty because this column is dropped. I am not sure whether it is intentional, it doesn't look right for me. @dchigarev please comment.

Source code / logs

Metadata

Metadata

Assignees

Labels

Backport 🔙Issues that need to be backported to previous release(s)bug 🦗Something isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions