Skip to content

[BUG] pandas mismatch for groupby(...).transform('size') #18491

@MarcoGorelli

Description

@MarcoGorelli

Describe the bug

There's a few issues here:

  • the return type of transform('size') is DataFrame, whereas in pandas it would be Series
  • transform('size') raises if there are string columns in the dataframe (even if they're not being grouped by)

Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

import cudf


df = cudf.DataFrame({"a": [1, 2, 2], "b": [4,5,6]})
print(df.groupby('a').transform('size'))

outputs

   b
0  1
1  2
2  2

Expected behavior
pandas does

0    1
1    2
2    2
dtype: int64

Environment overview (please complete the following information)

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
  • Method of cuDF install: [conda, Docker, or from source]
    • If method of install is [Docker], provide docker pull & docker run commands used

cudf version: '25.04.00'
pandas version: 2.2.3

Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

Additional context

spotted in narwhals

Metadata

Metadata

Assignees

No one assigned

    Labels

    NarwhalsIssue discovered by Narwhals integration testsPythonAffects Python cuDF API.bugSomething isn't working

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions