-
Notifications
You must be signed in to change notification settings - Fork 965
Open
Labels
NarwhalsIssue discovered by Narwhals integration testsIssue discovered by Narwhals integration testsPythonAffects Python cuDF API.Affects Python cuDF API.bugSomething isn't workingSomething isn't working
Description
Describe the bug
There's a few issues here:
- the return type of
transform('size')
isDataFrame
, whereas in pandas it would beSeries
transform('size')
raises if there are string columns in the dataframe (even if they're not being grouped by)
Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.
import cudf
df = cudf.DataFrame({"a": [1, 2, 2], "b": [4,5,6]})
print(df.groupby('a').transform('size'))
outputs
b
0 1
1 2
2 2
Expected behavior
pandas does
0 1
1 2
2 2
dtype: int64
Environment overview (please complete the following information)
- Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
- Method of cuDF install: [conda, Docker, or from source]
- If method of install is [Docker], provide
docker pull
&docker run
commands used
- If method of install is [Docker], provide
cudf version: '25.04.00'
pandas version: 2.2.3
Environment details
Please run and paste the output of the cudf/print_env.sh
script here, to gather any other relevant environment details
Additional context
spotted in narwhals
Metadata
Metadata
Assignees
Labels
NarwhalsIssue discovered by Narwhals integration testsIssue discovered by Narwhals integration testsPythonAffects Python cuDF API.Affects Python cuDF API.bugSomething isn't workingSomething isn't working
Type
Projects
Status
Todo