-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19887][SQL] dynamic partition keys can be null or empty string #17277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,7 +22,7 @@ import java.io.File | |
| import org.apache.hadoop.fs.Path | ||
|
|
||
| import org.apache.spark.metrics.source.HiveCatalogMetrics | ||
| import org.apache.spark.sql.{AnalysisException, QueryTest} | ||
| import org.apache.spark.sql.{AnalysisException, QueryTest, Row} | ||
| import org.apache.spark.sql.catalyst.TableIdentifier | ||
| import org.apache.spark.sql.hive.test.TestHiveSingleton | ||
| import org.apache.spark.sql.internal.SQLConf | ||
|
|
@@ -316,6 +316,28 @@ class PartitionProviderCompatibilitySuite | |
| } | ||
| } | ||
| } | ||
|
|
||
| test(s"SPARK-19887 partition value is null - partition management $enabled") { | ||
| withTable("test") { | ||
| Seq((1, "p", 1), (2, null, 2)).toDF("a", "b", "c") | ||
| .write.partitionBy("b", "c").saveAsTable("test") | ||
| checkAnswer(spark.table("test"), | ||
| Row(1, "p", 1) :: Row(2, null, 2) :: Nil) | ||
|
|
||
| Seq((3, null: String, 3)).toDF("a", "b", "c") | ||
| .write.mode("append").partitionBy("b", "c").saveAsTable("test") | ||
| checkAnswer(spark.table("test"), | ||
| Row(1, "p", 1) :: Row(2, null, 2) :: Row(3, null, 3) :: Nil) | ||
| // make sure partition pruning also works. | ||
| checkAnswer(spark.table("test").filter($"b".isNotNull), Row(1, "p", 1)) | ||
|
|
||
| // empty string is an invalid partition value and we treat it as null when read back. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This looks weird that you read back something different from what you wrote,
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can't remember all the details as this PR is pretty old. This is probably the behavior of Hive so we just followed it. Looking at it now, I agree it's not ideal to treat invalid partition values as null. We'd better fail earlier. Can we leave it as a known bug of v1 table and fix it in v2?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, Hive cannot differentiate null and empty string in this case and we basically followed that for compatibility. |
||
| Seq((4, "", 4)).toDF("a", "b", "c") | ||
| .write.mode("append").partitionBy("b", "c").saveAsTable("test") | ||
| checkAnswer(spark.table("test"), | ||
| Row(1, "p", 1) :: Row(2, null, 2) :: Row(3, null, 3) :: Row(4, null, 4) :: Nil) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /** | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
is null->is null or empty.