Skip to content

feat: Add having to group_by context #23550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

borchero
Copy link
Contributor

Fixes #23290.

A few comments:

  • This implementation is still missing a specialized implementation for the streaming engine as I had a little bit of a hard time figuring out how to achieve this.
  • I only added one test case so far as I first wanted to get feedback on the current implementation

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jul 13, 2025
Copy link

codecov bot commented Jul 13, 2025

Codecov Report

Attention: Patch coverage is 44.92754% with 38 lines in your changes missing coverage. Please review.

Project coverage is 81.28%. Comparing base (e99abdc) to head (6378db6).

Files with missing lines Patch % Lines
py-polars/polars/dataframe/group_by.py 50.00% 14 Missing and 5 partials ⚠️
crates/polars-python/src/lazygroupby.rs 0.00% 8 Missing ⚠️
crates/polars-lazy/src/frame/mod.rs 61.53% 5 Missing ⚠️
.../polars-plan/src/plans/conversion/dsl_to_ir/mod.rs 0.00% 3 Missing ⚠️
py-polars/polars/lazyframe/group_by.py 25.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #23550      +/-   ##
==========================================
- Coverage   81.28%   81.28%   -0.01%     
==========================================
  Files        1644     1644              
  Lines      223249   223312      +63     
  Branches     2841     2847       +6     
==========================================
+ Hits       181479   181511      +32     
- Misses      41071    41097      +26     
- Partials      699      704       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@borchero borchero marked this pull request as ready for review July 13, 2025 19:19
@ritchie46
Copy link
Member

I am not sure we should implement this as opposed this lowering to a group-by with a post filter. I think it should be the latter.

@borchero
Copy link
Contributor Author

I am not sure we should implement this as opposed this lowering to a group-by with a post filter. I think it should be the latter.

Fair, the only problem I see is that I now need to choose column names for the predicates which may not clash with each other and any keys/aggregations. How do you typically solve such an issue here?

@ritchie46
Copy link
Member

You can use _POLARS_HAVING_ as a temporary name or as a prefix.

@borchero borchero marked this pull request as draft July 14, 2025 09:23
@borchero
Copy link
Contributor Author

borchero commented Jul 14, 2025

One more question @ritchie46: would you want to do the

group_by(key, predicate, agg)
-> group_by(key, predicate | agg) + filter(predicate) + drop(predicate)

translation when building the DslPlan (i.e. from LazyGroupBy.agg) or when going from DslPlan to IR (i.e. in to_alp_impl) or some place I haven't identified yet (😄)?

@ritchie46
Copy link
Member

Unless absolutely trivial, I think it should be when converting from DslPlan -> IR.

@borchero
Copy link
Contributor Author

I removed the predicate field from the IR and do the translation in to_alp_impl now. This generally seems much more ergonomic 😄

@borchero borchero marked this pull request as ready for review July 15, 2025 20:28
@borchero
Copy link
Contributor Author

@ritchie46 do you have any thoughts on the updated implementation? :)


Examples
--------
Filter groups that contain only one element.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it "that contain more than one element" instead? ("filter" is equivalent to "keep" for me)

Or "Keep groups that contain more than one element" ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add having or filter to GroupBy
3 participants