Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format: allow arbitrary expressions for grouping keys in AggregateRel #92

Closed
cpcloud opened this issue Nov 29, 2021 · 3 comments · Fixed by #100
Closed

Format: allow arbitrary expressions for grouping keys in AggregateRel #92

cpcloud opened this issue Nov 29, 2021 · 3 comments · Fixed by #100

Comments

@cpcloud
Copy link
Contributor

cpcloud commented Nov 29, 2021

Currently, AggregateRel requires its grouping field to reference all fields by name:

message AggregateRel {
  // ...
  repeated Grouping groupings = 3;
  repeated Measure measures = 4;
  // ...
  message Grouping { repeated int32 input_fields = 1; }
  // ...
}

This forces producers to insert a projection that contains:

  1. all grouping expressions
  2. the unfurled arguments of every aggregation function call

The second is extremely onerous for producers. Any tree-like producer will have to be able to reconsitute every aggregate expression.

This leads to a huge difference in the amount of code needed for producing AggregateRels versus that needed for producing every other Rel variant: https://github.com/cpcloud/ibis/blob/substrait/ibis/backends/substrait/compiler.py#L687-L771 and makes producer code for AggregateRels extremely fragile.

To me, this suggests that the AggregateRel grouping keys are likely at the wrong level of abstraction for the goal of producing a would-be-logical-plan (I understand the line is blurry between logical and physical, perhaps we can call this a "level 0 plan"?).

I propose that we allow arbitrary expressions for grouping.

@cpcloud cpcloud changed the title Use case: allow arbitrary expressions for grouping keys in AggregateRel Format: allow arbitrary expressions for grouping keys in AggregateRel Nov 29, 2021
@jacques-n
Copy link
Contributor

I don't understand point two. Can you expound how changing grouping inner field to expression affects point 2?

@cpcloud
Copy link
Contributor Author

cpcloud commented Nov 29, 2021

I don't understand point two. Can you expound how changing grouping inner field to expression affects point 2?

Sure.

Any aggregate expressions containing existing references to the table on which the new projection occurs are invalid, and so these aggregates need to be reprojected.

@jacques-n
Copy link
Contributor

Any aggregate expressions containing existing references to the table on which the new projection occurs are invalid

Unless you remap the output with emit, this is not true. The direct output mapping of a project is all input columns in their original positions followed by the new projected expressions in the order declared (see direct output order here). As such, you should be able to simply add a project, leave the aggregate expressions alone and reference the new additional projected fields for grouping.

That being said, this has been a stumbling block with pretty much everyone I've interacted with so I'm in support of changing the inner field type of grouping to expression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants