Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose null_on_oob on gather. #16842

Open
ritchie46 opened this issue Jun 10, 2024 · 6 comments
Open

expose null_on_oob on gather. #16842

ritchie46 opened this issue Jun 10, 2024 · 6 comments
Labels
A-api Area: changes to the public API A-ops Area: operations enhancement New feature or an improvement of an existing feature P-medium Priority: medium

Comments

@ritchie46
Copy link
Member

ritchie46 commented Jun 10, 2024

Description

Similar to other get and list.gather, but then on Expr/Series.

Ref #15240

@ritchie46 ritchie46 added enhancement New feature or an improvement of an existing feature P-low Priority: low labels Jun 10, 2024
@stinodego stinodego added A-ops Area: operations A-api Area: changes to the public API labels Jun 10, 2024
@cmdlineluser
Copy link
Contributor

Will this issue also cover Expr.get - or does it need to be opened separately?

There have been a couple of questions in the past few days 12 with people trying to use .list.get() as a workaround and running into confusion.

Footnotes

  1. https://stackoverflow.com/questions/78878210/polars-get-nth-position-from-expression-type

  2. https://stackoverflow.com/questions/78868961/how-to-get-the-value-of-a-specified-index-number-from-the-sorting-of-a-column-an

@ddouglas87
Copy link

Request: Have expr.get() default to null_on_oob=True. Most real world use cases would want null_on_oob=True. Always having to type that in becomes tedious. I know this breaks backwards compatibility but in this case it probably will not cause issues in preexisting code bases.

@deanm0000
Copy link
Collaborator

It doesn't just break backwards compatibility it goes against the fail early, fail loudly principle. It would also be inconsistent with cast strict=True default.

@ddouglas87
Copy link

ddouglas87 commented Aug 19, 2024

Using group_by_dynamic at the end of the rolling aggregate there is always a list size of 1 so .get() will always fail, so one would have to type in null_on_oob=True for all use cases to get it to work. It will not not break preexisting code bases because you have to do null_on_oob=True or it doesn't work at all.

Right now because null_on_oob=True doesn't exist the only way to get an nth element in a rolling aggregate is to do .shift(nth).last(). Shift allows nulls by default.

An alternative solution is to allow rolling aggregates to stop ahead of time instead of unwinding at the end. When I use group_by_dynamic I always chop off the ending. imo the root cause here isn't a lack of null_on_oob=True but that rolling aggregates have no option of stopping early.

@dalejung
Copy link

dalejung commented Aug 19, 2024

I think the oob-ing on non-null indexes makes sense. The issue is with contexts like aggregations where you can get nulls.

Example:
#15163

I don't think raising OOB there makes sense.

But that might be discussion for another ticket. Prio should be to at least get an opt out with Expr.get(null_on_oob=False).

@orlp orlp added P-medium Priority: medium and removed P-low Priority: low labels Oct 17, 2024
@cmdlineluser
Copy link
Contributor

Found a couple of other older issues which can probably be closed in favour of this:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-api Area: changes to the public API A-ops Area: operations enhancement New feature or an improvement of an existing feature P-medium Priority: medium
Projects
Status: Ready
Development

No branches or pull requests

7 participants