Skip to content

Conversation

cloud-fan
Copy link
Contributor

A modified version of #6110, use semanticEquals to make it more efficient.

@cloud-fan
Copy link
Contributor Author

cc @chenghao-intel @marmbrus

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 15, 2015

Test build #32779 has started for PR 6173 at commit d7ff8f4.

def semanticEquals(other: Expression): Boolean = this.getClass == other.getClass &&
this.productIterator.zip(other.asInstanceOf[Product].productIterator).forall {
case (e1: Expression, e2: Expression) => e1 semanticEquals e2
case (i1, i2) => i1 == i2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about case class Coalesce(children: Seq[Expression])?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difficulty here is we probably never knows semanticEquals in a general way, that's why I said we need to re-implemented for lots of expressions if we added this.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 15, 2015

Test build #32785 has started for PR 6173 at commit cc02045.

@cloud-fan
Copy link
Contributor Author

Hi @chenghao-intel , as far as I know, we can only instantiate leaf expressions which are all case classes. So probably we can design the semanticEquals as how case class implement equals but use semanticEquals for expression elements.

@SparkQA
Copy link

SparkQA commented May 15, 2015

Test build #32779 has finished for PR 6173 at commit d7ff8f4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32779/
Test PASSed.

@SparkQA
Copy link

SparkQA commented May 15, 2015

Test build #32785 has finished for PR 6173 at commit cc02045.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32785/
Test PASSed.

@@ -76,6 +76,15 @@ abstract class Expression extends TreeNode[Expression] {
case u: UnresolvedAttribute => PrettyAttribute(u.name)
}.toString
}

def semanticEquals(other: Expression): Boolean = this.getClass == other.getClass && {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scala doc please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 16, 2015

Test build #32882 has started for PR 6173 at commit e4a3cc7.

@SparkQA
Copy link

SparkQA commented May 16, 2015

Test build #32882 has finished for PR 6173 at commit e4a3cc7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32882/
Test PASSed.

@chenghao-intel
Copy link
Contributor

Thank you @cloud-fan for doing this, and I think this can work as a workaround for cases like Seq[Expression].find, however, there are still cases like Map[Expression, Expression] e.g. in patterns.scala as I did in #5798.
Adding new method for Expression, probably impact all of Expression system (increasing the overhead for maintenance for future?), instead of doing this, I still insist to provide specific collection like ExpressionMap or ExpressionSet.

@@ -76,6 +76,19 @@ abstract class Expression extends TreeNode[Expression] {
case u: UnresolvedAttribute => PrettyAttribute(u.name)
}.toString
}

/**
* Returns true if 2 expressions are equal in semantic, which is similar to equals method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns true when two expressions will always compute the same result, even if they differ cosmetically (i.e. capitalization of names in attributes may be different).

@marmbrus
Copy link
Contributor

@chenghao-intel, a single method seems easier to maintain then full Map/Set implementations. I'll add that its not clear to me the use in patterns is actually invalid. You can compare expression with equality when you are only trying to find the exact same expression.

I'm merging this to master and 1.4. I'll improve the wording of the scala doc while merging.

asfgit pushed a commit that referenced this pull request May 18, 2015
…als)

A modified version of #6110, use `semanticEquals` to make it more efficient.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes #6173 from cloud-fan/7269 and squashes the following commits:

e4a3cc7 [Wenchen Fan] address comments
cc02045 [Wenchen Fan] consider elements length equal
d7ff8f4 [Wenchen Fan] fix 7269

(cherry picked from commit 103c863)
Signed-off-by: Michael Armbrust <michael@databricks.com>
@asfgit asfgit closed this in 103c863 May 18, 2015
@chenghao-intel
Copy link
Contributor

I am not sure why we use the Map[Expression, NamedExpression] in patterns.scala, seems Seq[(Expression, NamedExpression)] is more reasonable, which is definitely confusing for people why we couldn't using Set/Map for expressions or should use them in a very ugly way.

@cloud-fan cloud-fan deleted the 7269 branch May 19, 2015 07:47
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
…als)

A modified version of apache#6110, use `semanticEquals` to make it more efficient.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes apache#6173 from cloud-fan/7269 and squashes the following commits:

e4a3cc7 [Wenchen Fan] address comments
cc02045 [Wenchen Fan] consider elements length equal
d7ff8f4 [Wenchen Fan] fix 7269
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
…als)

A modified version of apache#6110, use `semanticEquals` to make it more efficient.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes apache#6173 from cloud-fan/7269 and squashes the following commits:

e4a3cc7 [Wenchen Fan] address comments
cc02045 [Wenchen Fan] consider elements length equal
d7ff8f4 [Wenchen Fan] fix 7269
asfgit pushed a commit that referenced this pull request Jun 12, 2015
It's a follow up of #6173, for expressions like `Coalesce` that have a `Seq[Expression]`, when we do semantic equal check for it, we need to do semantic equal check for all of its children.
Also we can just use `Seq[(Expression, NamedExpression)]` instead of `Map[Expression, NamedExpression]` as we only search it with `find`.

chenghao-intel, I agree that we probably never knows `semanticEquals` in a general way, but I think we have done that in `TreeNode`, so we can use similar logic. Then we can handle something like `Coalesce(children: Seq[Expression])` correctly.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes #6261 from cloud-fan/tmp and squashes the following commits:

4daef88 [Wenchen Fan] address comments
dd8fbd9 [Wenchen Fan] correct semanticEquals
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
…als)

A modified version of apache#6110, use `semanticEquals` to make it more efficient.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes apache#6173 from cloud-fan/7269 and squashes the following commits:

e4a3cc7 [Wenchen Fan] address comments
cc02045 [Wenchen Fan] consider elements length equal
d7ff8f4 [Wenchen Fan] fix 7269
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
It's a follow up of apache#6173, for expressions like `Coalesce` that have a `Seq[Expression]`, when we do semantic equal check for it, we need to do semantic equal check for all of its children.
Also we can just use `Seq[(Expression, NamedExpression)]` instead of `Map[Expression, NamedExpression]` as we only search it with `find`.

chenghao-intel, I agree that we probably never knows `semanticEquals` in a general way, but I think we have done that in `TreeNode`, so we can use similar logic. Then we can handle something like `Coalesce(children: Seq[Expression])` correctly.

Author: Wenchen Fan <cloud0fan@outlook.com>

Closes apache#6261 from cloud-fan/tmp and squashes the following commits:

4daef88 [Wenchen Fan] address comments
dd8fbd9 [Wenchen Fan] correct semanticEquals
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants