-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
combine/select/transform on empty GroupedDataFrame
#2297
Comments
Option 1 |
Option 2 |
Option 3 |
Is this similar to #1837? |
No it is orthogonal. The question is what should happen when you work on a |
Option 3 is probably the best in the long term, but Option 2 is good in the short term as it allows doing anything after 1.0. |
Actually we have to go for option 3 to fix a bug in |
OK, if we go for option 3 (which I agree) then the rules would be:
|
2 is actually consistent with 3 AFAICT, it's just that For 1, it's tricky since functions have to be called on an empty pseudo group to find out return types. Is that what you have in mind? |
You have summarized it correctly - I just wanted to be explicit. So now regarding implementation:
To be clear where the challenge is if we accept these rules:
will produce
will produce I am spelling this example out to double check that we all agree this is how things should work. |
|
The key thing in what we discuss is a difference between So technically you could say that |
Maybe we'd better consider a 0-row In doubt, we could throw an error when returning a non-zero number of rows from |
This is the point of this issue to do exactly what you write 😄.
It would not reduce the complexity of the code (I anyway have to "hardcode" a case "0-row |
Currently
combine
/select
onGroupedDataFrame
with 0 groups just executes no operation and returns a result with grouping columns only (or empty). This is a bit inconsistent as the output shape of the operation depends on the number of groups.We have several options here:
length(::GroupedDataFrame)
is0
(this is a safest option and probably is not a problem in practice as probably almost no real use casescombine
empty groups)Point 3. is related to a more general issue of allowing empty groups (we currently disallow them). They naturally happen if e.g. we would like to do aggregation over all levels of a categorical column, where not all levels are present.
Option 2. (throw an error) is a safe choice for 1.0, as later we would have an option to switch to any behaviour we like.
The text was updated successfully, but these errors were encountered: