-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Description
There is a dedicated list.agg method for aggregations, as well as an additional list.eval method that can do both element-wise mapping as well as aggregations. It's currently not clear when to use which - even the linked documentation of the two methods only contains examples for which the two yield the same results when replaced with the other - or what advantage one has over the other.
Even the docstring for list.agg presents the following example, which arguably isn't an aggregation (meaning it maps many values to a single value) but a per-element filter:
>> df.with_columns(no_nulls=pl.col.a.list.agg(pl.element().drop_nulls()))
shape: (3, 2)
┌──────────────┬───────────┐
│ a ┆ no_nulls │
│ --- ┆ --- │
│ list[i64] ┆ list[i64] │
╞══════════════╪═══════════╡
│ [1, null] ┆ [1] │
│ [42, 13] ┆ [42, 13] │
│ [null, null] ┆ [] │
└──────────────┴───────────┘
...so it arguably would probably be more accurate to use list.eval instead:
>>> df.with_columns(no_nulls=pl.col.a.list.agg(pl.element().drop_nulls()))
shape: (3, 2)
┌──────────────┬───────────┐
│ a ┆ no_nulls │
│ --- ┆ --- │
│ list[i64] ┆ list[i64] │
╞══════════════╪═══════════╡
│ [1, null] ┆ [1] │
│ [42, 13] ┆ [42, 13] │
│ [null, null] ┆ [] │
└──────────────┴───────────┘
The only obvious difference I can see is that list.eval has a parallel=True|False parameter, while list.agg doesn't. The confusion is compounded by the fact that for top-level mapping or aggregation, there is only a single DataFrame.select method instead of two separate agg/eval methods.