-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-28189][SQL] Use semanticEquals in Dataset drop method for attributes comparison #25055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2322,7 +2322,7 @@ class Dataset[T] private[sql]( | |
} | ||
val attrs = this.logicalPlan.output | ||
val colsAfterDrop = attrs.filter { attr => | ||
attr != expression | ||
!attr.semanticEquals(expression) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nice There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What is the reason the comparison should be related to the deterministic when we want to drop it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nvm. The output only contains Attribute |
||
}.map(attr => Column(attr)) | ||
select(colsAfterDrop : _*) | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -572,6 +572,29 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { | |
assert(df.schema.map(_.name) === Seq("value")) | ||
} | ||
|
||
test("SPARK-28189 drop column using drop with column reference with case-insensitive names") { | ||
// With SQL config caseSensitive OFF, case insensitive column name should work | ||
withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { | ||
val col1 = testData("KEY") | ||
val df1 = testData.drop(col1) | ||
checkAnswer(df1, testData.selectExpr("value")) | ||
assert(df1.schema.map(_.name) === Seq("value")) | ||
|
||
val col2 = testData("Key") | ||
val df2 = testData.drop(col2) | ||
checkAnswer(df2, testData.selectExpr("value")) | ||
assert(df2.schema.map(_.name) === Seq("value")) | ||
} | ||
|
||
// With SQL config caseSensitive ON, AnalysisException should be thrown | ||
withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") { | ||
val e = intercept[AnalysisException] { | ||
testData("KEY") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. wait, it seems this test is not related to this pr?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ? @maropu . In your example, the following will fail in the same way because
For me, the test case looked okay because There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh, sorry and my bad. That line wasn't needed and I updated the example above.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, thanks for the check! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you have any suggestion for your concern? Then, please share it with us~ You're welcome. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NVM. I just a bit worry about not use cases but the test coverage. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. This test case of the second part (SQLConf.CASE_SENSITIVE.key -> "true") is weird to me. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Anyway, we can keep it unchanged. Ideally, we do not need the second part. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, I made a pr to drop that part: #25216 |
||
}.getMessage | ||
assert(e.contains("Cannot resolve column name")) | ||
} | ||
} | ||
|
||
test("drop unknown column (no-op) with column reference") { | ||
val col = Column("random") | ||
val df = testData.drop(col) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no palce having the same issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through Dataset.scala - didn't find similar issue. However there might be the same problems in other places in our SQL code..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are other places. Please see #21449.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks. I think its ok to only target dataset.drop in this pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for @maropu 's opinion.