Skip to content

Allow any length-preserving Expr in group_by(*keys) #3151

@dhirschfeld

Description

@dhirschfeld

Describe the bug

Note

I think this is more of a missing feature than a bug.

I have code that works fine with polars but gives a ComputeError when run with narwhals

Steps or code to reproduce the bug

from collections.abc import Generator
import narwhals as nw
import numpy as np
import pandas as pd
import polars as pl
from narwhals.typing import IntoDataFrameT



def partition_fn(
    data: IntoDataFrameT,
) -> Generator[tuple[str, pa.Table], None, None]:
    """Group by unique quarters in `value_timestamp` and yield `(quarter, group)`
    tuples where the quarters are formatted as `{year}Q{quarter}`.
    """
    ns = nw  # It works if we use `ns = pl` and call `.to_native()` on the df below
    df = nw.from_arrow(data, backend=pl)
    year_expr = ns.col("value_timestamp").dt.year()
    month_expr = ns.col("value_timestamp").dt.month()
    quarter_expr = ((month_expr - 1) // 3 + 1)
    expr = (
        year_expr.cast(ns.String)
        + ns.lit("Q")
        + quarter_expr.cast(ns.String)
    ).alias("_quarter")
    for key, group in df.group_by(expr):
        quarter = key[0] if isinstance(key, tuple) else key
        yield quarter, group.to_arrow()


identifiers = list("ABCDE")
start_date = pd.Timestamp.now().date() + pd.DateOffset(days=1)
end_date = start_date + pd.DateOffset(years=1)
index = pd.date_range(start_date, end_date, freq="D", name="value_timestamp")
rng = np.random.default_rng()
data = rng.standard_normal((len(index), len(identifiers))).cumsum(axis=0)
data += rng.uniform(40, 60, size=(1, len(identifiers)))
data = pd.DataFrame(data, index=index, columns=identifiers).reset_index()


quarters = {
    quarter: group
    for quarter, group
    in partition_fn(data)
}
quarters
Cell In[25], line 26, in partition_fn(data)
     20 quarter_expr = ((month_expr - 1) // 3 + 1)
     21 expr = (
     22     year_expr.cast(ns.String)
     23     + ns.lit("Q")
     24     + quarter_expr.cast(ns.String)
     25 ).alias("_quarter")
---> 26 for key, group in df.group_by(expr):
     27     quarter = key[0] if isinstance(key, tuple) else key
     28     yield quarter, group.to_arrow()

File [.pixi/envs/py312/lib/python3.12/site-packages/narwhals/dataframe.py:1726], in DataFrame.group_by(self, drop_null_keys, *keys)
   1721     from narwhals.exceptions import ComputeError
   1723     msg = (
   1724         "Group by is not supported with keys that are not elementwise expressions"
   1725     )
-> 1726     raise ComputeError(msg)
   1728 return GroupBy(self, expr_flat_keys, drop_null_keys=drop_null_keys)

ComputeError: Group by is not supported with keys that are not elementwise expressions

Expected results

It would be great if this also worked with narwhals.

Actual results

See error above

Please run narwhals.show_version() and enter the output below.

System:
    python: 3.12.11 | packaged by conda-forge | (main, Jun  4 2025, 14:45:31) [GCC 13.3.0]
executable: ~/.pixi/envs/py312/bin/python
   machine: Linux-6.14.0-1012-aws-x86_64-with-glibc2.39

Python dependencies:
     narwhals: 2.5.0
        numpy: 1.26.4
       pandas: 2.2.3
        modin: 
         cudf: 
      pyarrow: 21.0.0
      pyspark: 
       polars: 1.33.1
         dask: 2025.9.1
       duckdb: 1.3.2+g2063dda
         ibis: 
     sqlframe:

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions