-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec #27019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #115814 has finished for PR 27019 at commit
|
@@ -153,9 +153,7 @@ case class HashAggregateExec( | |||
|
|||
override def supportCodegen: Boolean = { | |||
// ImperativeAggregate and filter predicate are not supported right now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Let's also update the comment here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks.
Test build #115878 has finished for PR 27019 at commit
|
cc @rednaxelafx as well |
@@ -91,60 +91,39 @@ case class ProjectExec(projectList: Seq[NamedExpression], child: SparkPlan) | |||
} | |||
} | |||
|
|||
|
|||
/** Physical plan for Filter. */ | |||
case class FilterExec(condition: Expression, child: SparkPlan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu I don't understand why we need change FilterExec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for sharing code to process predicates between aggregates and filters.
retest this please |
Test build #116458 has finished for PR 27019 at commit
|
retest this please |
Test build #116467 has finished for PR 27019 at commit
|
retest this please |
Test build #116759 has finished for PR 27019 at commit
|
retest this please |
Test build #116780 has finished for PR 27019 at commit
|
Would it be possible to add benchmark result? |
| ${generatePredicateCode(ctx, condition, inputAttrs, input)} | ||
| $aggCode | ||
|} while(false); | ||
""".stripMargin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentation problem?
| ${generatePredicateCode(ctx, condition, inputAttrs, input)} | ||
| $aggCode | ||
|} while(false); | ||
""".stripMargin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| ${generatePredicateCode(ctx, condition, inputAttrs, input)} | ||
| $aggCode | ||
|} while(false); | ||
""".stripMargin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@kiszk I added the performance numbers in the PR description. I think the codegen for hash-aggregates can have performance gains in most queries. But, aggregate filters (recently merged in the master) forcibly disable the codegen. So, I think this fix has a good effect on performance. |
Test build #117048 has finished for PR 27019 at commit
|
@@ -375,16 +373,30 @@ case class HashAggregateExec( | |||
""".stripMargin | |||
} | |||
|
|||
val codeToEvalAggFunc = if (conf.codegenSplitAggregateFunc && | |||
val codeToEvalAggFunc = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we refactor this and that to use one common function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I'll update.
5a2114f
to
ba48342
Compare
Test build #117150 has finished for PR 27019 at commit
|
retest this please |
Test build #117144 has finished for PR 27019 at commit
|
Test build #117154 has finished for PR 27019 at commit
|
retest this please |
Test build #117174 has finished for PR 27019 at commit
|
ba48342
to
2796407
Compare
retest this please |
Test build #126907 has finished for PR 27019 at commit
|
retest this please |
Test build #127912 has finished for PR 27019 at commit
|
464fbaa
to
a43aa25
Compare
Test build #127941 has finished for PR 27019 at commit
|
9b1aea0
to
26ce9b2
Compare
Test build #128517 has finished for PR 27019 at commit
|
Test build #128519 has finished for PR 27019 at commit
|
26ce9b2
to
009fe4a
Compare
Test build #128525 has finished for PR 27019 at commit
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
@maropu Looks like this PR is not properly tracked. I've removed the Stale tag assuming you want to move forward, but wanted to check again, and curious how you handle this. |
Thanks for taking care of it, @HeartSaVioR. If there are no reviewer who is against this PR and one has no more comment, I will merge this for the v3.2.0 release in a few weeks. cc: @cloud-fan @viirya @dongjoon-hyun |
009fe4a
to
86d89ba
Compare
Kubernetes integration test starting |
Test build #133120 has finished for PR 27019 at commit
|
Kubernetes integration test status success |
retest this please |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #133188 has finished for PR 27019 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Sorry for the delay and thank you, @maropu .
Merged to master for Apache Spark 3.2.0.
Merry Christmas and Happy New Year!
Thanks for your review, Dongjoon ! Yea, you, too. Happy Merry Christmas! |
What changes were proposed in this pull request?
This pr intends to support code generation for
HashAggregateExec
with filters.Quick benchmark results:
The query above is compiled into code below;
Why are the changes needed?
For high performance.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests.