-
Notifications
You must be signed in to change notification settings - Fork 28.8k
[SPARK-7269] [SQL] Incorrect analysis for aggregation(use semanticEquals) #6173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Merged build triggered. |
Merged build started. |
Test build #32779 has started for PR 6173 at commit |
def semanticEquals(other: Expression): Boolean = this.getClass == other.getClass && | ||
this.productIterator.zip(other.asInstanceOf[Product].productIterator).forall { | ||
case (e1: Expression, e2: Expression) => e1 semanticEquals e2 | ||
case (i1, i2) => i1 == i2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about case class Coalesce(children: Seq[Expression])
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difficulty here is we probably never knows semanticEquals
in a general way, that's why I said we need to re-implemented for lots of expressions if we added this.
Merged build triggered. |
Merged build started. |
Test build #32785 has started for PR 6173 at commit |
Hi @chenghao-intel , as far as I know, we can only instantiate leaf expressions which are all case classes. So probably we can design the |
Test build #32779 has finished for PR 6173 at commit
|
Merged build finished. Test PASSed. |
Test PASSed. |
Test build #32785 has finished for PR 6173 at commit
|
Merged build finished. Test PASSed. |
Test PASSed. |
@@ -76,6 +76,15 @@ abstract class Expression extends TreeNode[Expression] { | |||
case u: UnresolvedAttribute => PrettyAttribute(u.name) | |||
}.toString | |||
} | |||
|
|||
def semanticEquals(other: Expression): Boolean = this.getClass == other.getClass && { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scala doc please.
Merged build triggered. |
Merged build started. |
Test build #32882 has started for PR 6173 at commit |
Test build #32882 has finished for PR 6173 at commit
|
Merged build finished. Test PASSed. |
Test PASSed. |
Thank you @cloud-fan for doing this, and I think this can work as a workaround for cases like |
@@ -76,6 +76,19 @@ abstract class Expression extends TreeNode[Expression] { | |||
case u: UnresolvedAttribute => PrettyAttribute(u.name) | |||
}.toString | |||
} | |||
|
|||
/** | |||
* Returns true if 2 expressions are equal in semantic, which is similar to equals method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns true when two expressions will always compute the same result, even if they differ cosmetically (i.e. capitalization of names in attributes may be different).
@chenghao-intel, a single method seems easier to maintain then full Map/Set implementations. I'll add that its not clear to me the use in patterns is actually invalid. You can compare expression with equality when you are only trying to find the exact same expression. I'm merging this to master and 1.4. I'll improve the wording of the scala doc while merging. |
…als) A modified version of #6110, use `semanticEquals` to make it more efficient. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #6173 from cloud-fan/7269 and squashes the following commits: e4a3cc7 [Wenchen Fan] address comments cc02045 [Wenchen Fan] consider elements length equal d7ff8f4 [Wenchen Fan] fix 7269 (cherry picked from commit 103c863) Signed-off-by: Michael Armbrust <michael@databricks.com>
I am not sure why we use the Map[Expression, NamedExpression] in |
…als) A modified version of apache#6110, use `semanticEquals` to make it more efficient. Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#6173 from cloud-fan/7269 and squashes the following commits: e4a3cc7 [Wenchen Fan] address comments cc02045 [Wenchen Fan] consider elements length equal d7ff8f4 [Wenchen Fan] fix 7269
…als) A modified version of apache#6110, use `semanticEquals` to make it more efficient. Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#6173 from cloud-fan/7269 and squashes the following commits: e4a3cc7 [Wenchen Fan] address comments cc02045 [Wenchen Fan] consider elements length equal d7ff8f4 [Wenchen Fan] fix 7269
It's a follow up of #6173, for expressions like `Coalesce` that have a `Seq[Expression]`, when we do semantic equal check for it, we need to do semantic equal check for all of its children. Also we can just use `Seq[(Expression, NamedExpression)]` instead of `Map[Expression, NamedExpression]` as we only search it with `find`. chenghao-intel, I agree that we probably never knows `semanticEquals` in a general way, but I think we have done that in `TreeNode`, so we can use similar logic. Then we can handle something like `Coalesce(children: Seq[Expression])` correctly. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #6261 from cloud-fan/tmp and squashes the following commits: 4daef88 [Wenchen Fan] address comments dd8fbd9 [Wenchen Fan] correct semanticEquals
…als) A modified version of apache#6110, use `semanticEquals` to make it more efficient. Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#6173 from cloud-fan/7269 and squashes the following commits: e4a3cc7 [Wenchen Fan] address comments cc02045 [Wenchen Fan] consider elements length equal d7ff8f4 [Wenchen Fan] fix 7269
It's a follow up of apache#6173, for expressions like `Coalesce` that have a `Seq[Expression]`, when we do semantic equal check for it, we need to do semantic equal check for all of its children. Also we can just use `Seq[(Expression, NamedExpression)]` instead of `Map[Expression, NamedExpression]` as we only search it with `find`. chenghao-intel, I agree that we probably never knows `semanticEquals` in a general way, but I think we have done that in `TreeNode`, so we can use similar logic. Then we can handle something like `Coalesce(children: Seq[Expression])` correctly. Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#6261 from cloud-fan/tmp and squashes the following commits: 4daef88 [Wenchen Fan] address comments dd8fbd9 [Wenchen Fan] correct semanticEquals
A modified version of #6110, use
semanticEquals
to make it more efficient.