-
Notifications
You must be signed in to change notification settings - Fork 28.6k
showcase, DO NOT MERGE #14876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
showcase, DO NOT MERGE #14876
Conversation
Test build #64660 has finished for PR 14876 at commit
|
Thank you for your concrete example! I'll check in hours. |
// Normal partial aggregate pair | ||
case outer @ HashAggregateExec(_, _, _, _, _, _, inner: HashAggregateExec) | ||
if outer.aggregateExpressions.forall(_.mode == Final) && | ||
inner.aggregateExpressions.forall(_.mode == Partial) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to have more strict conditions to make sure these two operators are for the same group by clause (Although I do not have a case showing this will break, it is better to list the condition in a more specific way).
I found that we need to push-down partial aggregation below exchange operators instead of merging them? For example, in the spark v2.0 branch,
This prints like:
In this case, I think it is more natural to push-down the partial aggregation below the exchange. |
On the other hand, when caching the already-partitioned input table, we cannot push-down them;
ISTM all we can do is merge the aggregations into one in this case. |
yea, pushing down partial aggregate below exchange is a good idea, but I think it's out of the scope of SPARK-12978, which is aim to remove unnecessary partial aggregate right? |
yea, I thinks so. I like the approach in this pr. |
closing, @maropu will take over |
## What changes were proposed in this pull request? according to the discussion in the original PR #10896 and the new approach PR #14876 , we decided to revert these 2 PRs and go with the new approach. ## How was this patch tested? N/A Author: Wenchen Fan <wenchen@databricks.com> Closes #14909 from cloud-fan/revert.
What changes were proposed in this pull request?
this is another approach to fix SPARK-12978, which was done in #10896
How was this patch tested?
new test in PlannerSuite