-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial aggregation is missing when filter reduces the data to a single group #10971
Comments
This is because |
CC @sopel39 |
I'm seeing that
becomes
Then |
Streaming aggregation over repartitioning of gather exchange implies that all grouping keys are constant. One option is to rewrite it as a global aggregation. That would be my preference, but I think global aggregations currently don't support
becomes
We can drop streaming from the partial aggregation for simplicity. |
Maybe we could simply add partial streaming aggregation always, e.g: rewrite:
into
Is there a case when such rewrite is invalid? As you noted streaming aggregation above exchange implies constant pre-grouped symbols. Alternatively we could rewrite to:
Global aggregation produce default value, which we don't want here. |
That's right. Thanks for pointing it out.
That should work. It should be a pretty simple change. |
@sopel39 @martint Martin and I discussed this some more. We believe |
Still, I think that sources of In any case, we would catch a regression via:
Traits would be a useful abstraction, but not required. In
That would add slower hash exchange when grouping symbols are constant. Do we want that? |
Karol, thanks for sharing your thoughts. My thinking on this evolved a bit since yesterday.
The question here is whether streaming aggregation can be split over exchange. I believe it is always safe to drop
This does "add slower hash exchange when grouping symbols are constant" and we'd rather avoid it. It is also safe to not split the aggregation, but the loss of performance in this case is not acceptable. To keep streaming aggregation for final, partial or both we need to either derive properties or consider all possible cases. It is a given that exchange output is pre-grouped. There are two possibilities: (1) exchange input is pre-grouped and exchange preserve that grouping; (2) exchange itself enforces grouping (e.g. merging exchange enforces order). In case of (1) it is safe to split the aggregation into streaming partial and streaming final. In case of (2) we can't safely add streaming partial aggregation, hence, we'll add hash aggregation. Since we don't really know how exchange enforces grouping and whether it relies of some properties of the input stream to do that, we can't assume that it will be able to enforce grouping on output of partial hash aggregation. Therefore, we can't assume that exchange output will be grouped and can't safely use streaming final aggregation. To summarize, in case of (2) it is only safe to drop streaming when splitting. Without deriving properties, we can't know whether we have (1) or (2), hence, we have to drop streaming. That said, I prefer this change over updating At the same time, I think it would be valuable to avoid "slower hash exchange when grouping symbols are constant". I see a couple of options to achieve that:
My preference would be the following: (1) submit a PR to always drop streaming when splitting to fix the regression in the upcoming release; Martin, Karol, what do you think? |
That seems reasonable, but it "(2) submit a follow-up PR to derive properties" may be harder than it seems or introduce things we want to move away from. The rule-based optimizer is not set up to make it easy for rules to arbitrarily walk the plan tree (to derive properties), which is why I've mentioned we might need to have support for traits and first-class properties support in the rule-based optimization framework before we can do that. As to "slower hash exchange when grouping symbols are constant", we might want to measure the actual impact. It may turn out to be small enough that adding short-term hacks or unwanted complexity is not warranted. |
Just to note:
Merging exchange does preserve grouping property as sources themselves are ordered |
Streaming aggregation requires pre-grouped input. Splitting the aggregation and pushing partial aggregation through the exchange may or may not preserve grouping properties of the input. See prestodb#10971 for more details.
Streaming aggregation requires pre-grouped input. Splitting the aggregation and pushing partial aggregation through the exchange may or may not preserve grouping properties of the input. See #10971 for more details.
The plan for the following query is missing partial aggregation making it so that aggregation happens very slowly on a single node. This looks like a regression caused by #10731
The text was updated successfully, but these errors were encountered: