-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-38832][SQL][FOLLOWUP] Support propagate empty expression set for distinct key #36281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/DistinctKeyVisitor.scala
Outdated
Show resolved
Hide resolved
...catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/DistinctKeyVisitor.scala
Outdated
Show resolved
Hide resolved
f2ceed8
to
1f0185f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we filter out empty distinctKey for safe?
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
Lines 442 to 443 in 21e48b7
agg.child.distinctKeys.exists( | |
_.subsetOf(ExpressionSet(ae.aggregateFunction.children.filterNot(_.foldable)))) => |
@wangyum what's wrong with that code? |
For example: Set[ExpressionSet](ExpressionSet()).exists(_.subsetOf(ExpressionSet(ae.aggregateFunction.children.filterNot(_.foldable)))) It is always true, it may have potential problems. |
if we get an unexpected empty ExpressionSet it should be a bug .. shall we trust the framework ? |
I don't see why empty expression set is special. If the framework has bugs and produces incorrect distinct keys, we will have query correctness bugs. |
I'm ok if you do not think it is needed. |
My point is empty expression set is informative and we shouldn't skip it. It means the data is distinct by Nil, so it has at most 1 row and can override any other distinct keys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
thanks, merging to master! |
thank you @wangyum @cloud-fan |
What changes were proposed in this pull request?
DistinctKeyVisitor
that support propagate empty setWhy are the changes needed?
Make distinct keys can be used to optimize more case, see comment #36117 (comment)
Does this PR introduce any user-facing change?
Improve performance
How was this patch tested?
add test