-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-31707][SQL] Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax #28517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -2195,22 +2195,4 @@ class DDLParserSuite extends AnalysisTest { | |||
parsePlan("COMMENT ON TABLE a.b.c IS 'xYz'"), | |||
CommentOnTable(UnresolvedTable(Seq("a", "b", "c")), "xYz")) | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't exist before SPARK-30098, and this test depends on SPARK-30098.
@@ -256,24 +256,6 @@ class DataSourceV2SQLSuite | |||
checkAnswer(spark.internalCreateDataFrame(rdd, table.schema), Seq.empty) | |||
} | |||
|
|||
test("CreateTable: without USING clause") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't exist before SPARK-30098, and this test depends on SPARK-30098.
@@ -681,24 +663,6 @@ class DataSourceV2SQLSuite | |||
} | |||
} | |||
|
|||
test("CreateTableAsSelect: without USING clause") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't exist before SPARK-30098, and this test depends on SPARK-30098.
@@ -76,12 +75,6 @@ class DDLParserSuite extends AnalysisTest with SharedSparkSession { | |||
}.head | |||
} | |||
|
|||
private def withCreateTableStatement(sql: String)(prediction: CreateTableStatement => Unit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to check with DSV2 but Hive create table syntax doesn't follow the path. That said, it didn't exist before SPARK-30098.
@@ -40,8 +40,7 @@ import org.apache.spark.sql.test.SharedSparkSession | |||
import org.apache.spark.sql.types.{IntegerType, StructField, StructType} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the changes in this test suite are simply done via reverting back to pre-SPARK-30098.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to remove the line assert(desc.viewDefaultDatabase.isEmpty)
as desc.viewDefaultDatabase
no longer exists. I hope it is expected.
@@ -42,7 +42,6 @@ class HiveCompatibilitySuite extends HiveQueryFileTest with BeforeAndAfter { | |||
private val originalInMemoryPartitionPruning = TestHive.conf.inMemoryPartitionPruning | |||
private val originalCrossJoinEnabled = TestHive.conf.crossJoinEnabled | |||
private val originalSessionLocalTimeZone = TestHive.conf.sessionLocalTimeZone | |||
private val originalCreateHiveTable = TestHive.conf.createHiveTableByDefaultEnabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Effectively no-op.
@@ -25,22 +25,6 @@ import org.apache.spark.sql.internal.{HiveSerDe, SQLConf} | |||
|
|||
class HiveShowCreateTableSuite extends ShowCreateTableSuite with TestHiveSingleton { | |||
|
|||
private var origCreateHiveTableConfig = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Effectively no-op.
@@ -87,8 +87,7 @@ class HiveSerDeSuite extends HiveComparisonTest with PlanTest with BeforeAndAfte | |||
SQLConf.withExistingConf(TestHive.conf)(super.withSQLConf(pairs: _*)(f)) | |||
|
|||
test("Test the default fileformat for Hive-serde tables") { | |||
withSQLConf("hive.default.fileformat" -> "orc", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Effectively no-op.
@@ -97,8 +96,7 @@ class HiveSerDeSuite extends HiveComparisonTest with PlanTest with BeforeAndAfte | |||
assert(desc.storage.serde == Some("org.apache.hadoop.hive.ql.io.orc.OrcSerde")) | |||
} | |||
|
|||
withSQLConf("hive.default.fileformat" -> "parquet", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Effectively no-op.
@@ -2706,33 +2706,6 @@ class HiveDDLSuite | |||
} | |||
} | |||
|
|||
test("SPARK-30098: create table without provider should " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't exist before SPARK-30098, and this test depends on SPARK-30098.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two tests are depending on the config, hence I'd rather not restore these tests.
I'd like to see this be applied to branch-3.0, and even master branch, so that we don't leave any bad state in any branches. This would require the PR of "unified create table syntax" (#28026) be rebased, but I don't expect we will merge the PR which contains the changes of +1,661 -1,146 lines, so I don't think it matters to us. Once there's a consensus on the direction of #28026 then we'd probably like to break down the PR into multiple PRs. |
@@ -944,7 +944,7 @@ class AdaptiveQueryExecSuite | |||
withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true", | |||
SQLConf.ADAPTIVE_EXECUTION_FORCE_APPLY.key -> "true") { | |||
withTable("t1") { | |||
val plan = sql("CREATE TABLE t1 AS SELECT 1 col").queryExecution.executedPlan | |||
val plan = sql("CREATE TABLE t1 USING parquet AS SELECT 1 col").queryExecution.executedPlan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test fails with below error message:
org.apache.spark.sql.AnalysisException: Hive support is required to CREATE Hive TABLE (AS SELECT);; 'CreateTable `t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, ErrorIfExists +- Project [1 AS col#545614] +- OneRowRelation
This test initially assumed the create table without USING is following native syntax, and it's no longer valid. Added USING parquet
explicitly.
@@ -1005,7 +1005,7 @@ class AdaptiveQueryExecSuite | |||
} | |||
spark.sparkContext.addSparkListener(listener) | |||
try { | |||
sql("CREATE TABLE t1 AS SELECT 1 col").collect() | |||
sql("CREATE TABLE t1 USING parquet AS SELECT 1 col").collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
Test build #122567 has finished for PR 28517 at commit
|
Build failure on I think |
just a reminder: we need to add back the removed test when we unify the CREATE TABLE syntax. |
Test build #122568 has finished for PR 28517 at commit
|
Uh, also cc. to @dongjoon-hyun as he tried this first (though the approach was different) and I'm borrowing the JIRA issue here. (Would like to reopen the JIRA issue if we agree to merge this.) |
Thank you for pinging me. Unfortunately, I thought the decision was permanent at that time and closed it. So, Apache Spark JIRA policy doesn't allow us reopen SPARK-31136 technically. It's the same for all members (contributor and committer and PMC). Please use a new JIRA issue. |
Ah OK. That was closed and cannot be reopened. I also wondered why there's no reopen button and I thought it requires privileges. Looks like that's not the case. Thanks for letting me know. |
I'll file a new issue when we decide to merge this - as the discussion is on the way and someone might not agree about the resolution. |
can you create a new ticket? I'm fine with this PR. |
Thanks! Filed a new issue SPARK-31707 and updated the title. |
@@ -2195,22 +2195,4 @@ class DDLParserSuite extends AnalysisTest { | |||
parsePlan("COMMENT ON TABLE a.b.c IS 'xYz'"), | |||
CommentOnTable(UnresolvedTable(Seq("a", "b", "c")), "xYz")) | |||
} | |||
|
|||
test("create table - without using") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after a second thought, can we keep the test but ignore it? We still need these tests after the syntax unification is done. We can add a comment to reference the JIRA ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for other removed tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. +1 for @cloud-fan 's suggestion. (ignoring tests instead of removal).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's straightforward for removed tests - I'll do that.
What about the tests being reverted as Hive create table is not integrated with DSv2?
(I'm referring sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLParserSuite.scala
)
I guess they still need to be reverted instead of ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Maybe ignoring tests and add TODO comments to them is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It cannot be simply changed to "ignore" instead of "test" as we're removing the config. I'll comment out instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, just remove the conf after revert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but could you please elaborate? Removing config is what this patch proposed, and it's a part of revert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think he meant to remove withSQLConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED.key -> "false")
. We don't need the config anyway, if we unify the syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I see. That's one of valid approaches and good idea. Maybe then we need to remove some tests which depend on the config (two tests for on and off).
@@ -40,7 +40,7 @@ CREATE TABLE [ IF NOT EXISTS ] table_identifier | |||
[ AS select_statement ] | |||
``` | |||
|
|||
Note that, the clauses between the USING clause and the AS SELECT clause can come in | |||
Note that, the clauses between the OPTIONS clause and the AS SELECT clause can come in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should still be USING
since OPTIONS
itself can also come in any order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
between is more likely be inclusive, otherwise previous statement was also wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
between the USING clause and the AS SELECT clause
USING and AS SELECT are both not included, so this is still corrected. We shouldn't change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK my bad. I found USING shouldn't com in any order so previous statement was correct. Sorry about that.
@@ -2195,22 +2195,4 @@ class DDLParserSuite extends AnalysisTest { | |||
parsePlan("COMMENT ON TABLE a.b.c IS 'xYz'"), | |||
CommentOnTable(UnresolvedTable(Seq("a", "b", "c")), "xYz")) | |||
} | |||
|
|||
test("create table - without using") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, just remove the conf after revert?
081866e
to
84c172f
Compare
Test build #122649 has finished for PR 28517 at commit
|
retest this, please |
Test build #122658 has finished for PR 28517 at commit
|
The test failure comes from HiveExternalCatalogVersionsSuite - Unable to download Spark 2.3.4. |
retest this, please |
Test build #5040 has started for PR 28517 at commit |
Test build #122676 has finished for PR 28517 at commit
|
Test build #5041 has finished for PR 28517 at commit
|
Test build #5039 has finished for PR 28517 at commit
|
retest this, please |
Test build #122696 has finished for PR 28517 at commit
|
retest this, please |
Test build #122721 has finished for PR 28517 at commit
|
### What changes were proposed in this pull request? It's quite annoying to be blocked by flaky tests in several PRs. This PR disables them. The tests come from 3 PRs I'm recently watching: #28526 #28463 #28517 ### Why are the changes needed? To make PR builder more stable ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A Closes #28547 from cloud-fan/test. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request? It's quite annoying to be blocked by flaky tests in several PRs. This PR disables them. The tests come from 3 PRs I'm recently watching: #28526 #28463 #28517 ### Why are the changes needed? To make PR builder more stable ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A Closes #28547 from cloud-fan/test. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 2012d58) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…der for CREATE TABLE syntax ### What changes were proposed in this pull request? This patch effectively reverts SPARK-30098 via below changes: * Removed the config * Removed the changes done in parser rule * Removed the usage of config in tests * Removed tests which depend on the config * Rolled back some tests to before SPARK-30098 which were affected by SPARK-30098 * Reflect the change into docs (migration doc, create table syntax) ### Why are the changes needed? SPARK-30098 brought confusion and frustration on using create table DDL query, and we agreed about the bad effect on the change. Please go through the [discussion thread](http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Resolve-ambiguous-parser-rule-between-two-quot-create-table-quot-s-td29051i20.html) to see the details. ### Does this PR introduce _any_ user-facing change? No, compared to Spark 2.4.x. End users tried to experiment with Spark 3.0.0 previews will see the change that the behavior is going back to Spark 2.4.x, but I believe we won't guarantee compatibility in preview releases. ### How was this patch tested? Existing UTs. Closes #28517 from HeartSaVioR/revert-SPARK-30098. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit d2bec5e) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
thanks, merging to master/3.0! |
Thanks all for reviewing and merging! |
… `false` by default ### What changes were proposed in this pull request? This PR aims to switch `spark.sql.legacy.createHiveTableByDefault` to `false` by default in order to move away from this legacy behavior from `Apache Spark 4.0.0` while the legacy functionality will be preserved during Apache Spark 4.x period by setting `spark.sql.legacy.createHiveTableByDefault=true`. ### Why are the changes needed? Historically, this behavior change was merged at `Apache Spark 3.0.0` activity in SPARK-30098 and reverted officially during the `3.0.0 RC` period. - 2019-12-06: #26736 (58be82a) - 2019-12-06: https://lists.apache.org/thread/g90dz1og1zt4rr5h091rn1zqo50y759j - 2020-05-16: #28517 At `Apache Spark 3.1.0`, we had another discussion and defined it as `Legacy` behavior via a new configuration by reusing the JIRA ID, SPARK-30098. - 2020-12-01: https://lists.apache.org/thread/8c8k1jk61pzlcosz3mxo4rkj5l23r204 - 2020-12-03: #30554 Last year, this was proposed again twice and `Apache Spark 4.0.0` is a good time to make a decision for Apache Spark future direction. - SPARK-42603 on 2023-02-27 as an independent idea. - SPARK-46122 on 2023-11-27 as a part of Apache Spark 4.0.0 idea ### Does this PR introduce _any_ user-facing change? Yes, the migration document is updated. ### How was this patch tested? Pass the CIs with the adjusted test cases. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46207 from dongjoon-hyun/SPARK-46122. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
… `false` by default ### What changes were proposed in this pull request? This PR aims to switch `spark.sql.legacy.createHiveTableByDefault` to `false` by default in order to move away from this legacy behavior from `Apache Spark 4.0.0` while the legacy functionality will be preserved during Apache Spark 4.x period by setting `spark.sql.legacy.createHiveTableByDefault=true`. ### Why are the changes needed? Historically, this behavior change was merged at `Apache Spark 3.0.0` activity in SPARK-30098 and reverted officially during the `3.0.0 RC` period. - 2019-12-06: apache#26736 (58be82a) - 2019-12-06: https://lists.apache.org/thread/g90dz1og1zt4rr5h091rn1zqo50y759j - 2020-05-16: apache#28517 At `Apache Spark 3.1.0`, we had another discussion and defined it as `Legacy` behavior via a new configuration by reusing the JIRA ID, SPARK-30098. - 2020-12-01: https://lists.apache.org/thread/8c8k1jk61pzlcosz3mxo4rkj5l23r204 - 2020-12-03: apache#30554 Last year, this was proposed again twice and `Apache Spark 4.0.0` is a good time to make a decision for Apache Spark future direction. - SPARK-42603 on 2023-02-27 as an independent idea. - SPARK-46122 on 2023-11-27 as a part of Apache Spark 4.0.0 idea ### Does this PR introduce _any_ user-facing change? Yes, the migration document is updated. ### How was this patch tested? Pass the CIs with the adjusted test cases. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46207 from dongjoon-hyun/SPARK-46122. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
This patch effectively reverts SPARK-30098 via below changes:
Why are the changes needed?
SPARK-30098 brought confusion and frustration on using create table DDL query, and we agreed about the bad effect on the change.
Please go through the discussion thread to see the details.
Does this PR introduce any user-facing change?
No, compared to Spark 2.4.x. End users tried to experiment with Spark 3.0.0 previews will see the change that the behavior is going back to Spark 2.4.x, but I believe we won't guarantee compatibility in preview releases.
How was this patch tested?
Existing UTs.