-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-31136] Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax #27894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #119724 has finished for PR 27894 at commit
|
As expected, it fails. But, unfortunately, it failed at the first failure. |
Test build #119728 has finished for PR 27894 at commit
|
Test build #119729 has finished for PR 27894 at commit
|
A lot of failures remains.
|
I agree that we should evaluate the "cost to break", but looking at unit tests may not be a good idea. They heavily rely on internal assumptions and changing the table format will definitely break a lot of unit tests. Ideally, table format only decides how the table is stored and should be a performance thing, but hive table is a bit different. IMO, the "cost to break" is losing hive compatibility a bit: Now the tables created without USING may not be readable to hive, and some hive specific commands like LOAD TABLE doesn't work for them. On the other hand, the "cost to maintain" is losing Spark's perf benefits: Many users just run I think in this case the "cost to maintain" is more serious and we should accept that change and don't revert it. cc @marmbrus @srowen @maropu @viirya @HyukjinKwon UPDATED to make my opinion more clear. |
I'm not looking at the number of failures. Since the UT had better be independent to the global conf changes, we had better add guards with The following is not a cost at all. The users already know that the vectorization is only enabled in case of native tables, don't they? There exist user groups who have some reasons why they choose a vanilla Hive table. As a side note, some users prefer
Please see the old mail thread. This will also break the query result on |
Hi, All. I marked SPARK-31136 as a correctness issue and added the example. |
Looks coherent to revert it per @marmbrus's policy and reverts made so far, although I personally are not really convinced that we should revert. Reverting this change discourages users to leverage Spark's new features and edge optimizations, and let users more stick to Hive. What's your thought @marmbrus? |
cc @rxin too |
It just became a correctness issue due to the another example of |
Test build #119948 has finished for PR 27894 at commit
|
…or CREATE TABLE syntax
Test build #119967 has finished for PR 27894 at commit
|
Retest this please |
Test build #119971 has finished for PR 27894 at commit
|
…TE TABLE test cases ### What changes were proposed in this pull request? A few `CREATE TABLE` test cases have some assumption on the default value of `LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED`. This PR (SPARK-31181) makes the test cases more explicit from test-case side. The configuration change was tested via #27894 during discussing SPARK-31136. This PR has only the test case part from that PR. ### Why are the changes needed? This makes our test case more robust in terms of the default value of `LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED`. Even in the case where we switch the conf value, that will be one-liner with no test case changes. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes #27946 from dongjoon-hyun/SPARK-EXPLICIT-TEST. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Thank you all! I'm going to close this PR because this is discussed explicitly. |
…TE TABLE test cases A few `CREATE TABLE` test cases have some assumption on the default value of `LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED`. This PR (SPARK-31181) makes the test cases more explicit from test-case side. The configuration change was tested via #27894 during discussing SPARK-31136. This PR has only the test case part from that PR. This makes our test case more robust in terms of the default value of `LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED`. Even in the case where we switch the conf value, that will be one-liner with no test case changes. No. Pass the Jenkins with the existing tests. Closes #27946 from dongjoon-hyun/SPARK-EXPLICIT-TEST. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit f1cc867) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…TE TABLE test cases ### What changes were proposed in this pull request? A few `CREATE TABLE` test cases have some assumption on the default value of `LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED`. This PR (SPARK-31181) makes the test cases more explicit from test-case side. The configuration change was tested via apache#27894 during discussing SPARK-31136. This PR has only the test case part from that PR. ### Why are the changes needed? This makes our test case more robust in terms of the default value of `LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED`. Even in the case where we switch the conf value, that will be one-liner with no test case changes. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with the existing tests. Closes apache#27946 from dongjoon-hyun/SPARK-EXPLICIT-TEST. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This is one of the drafts which we are discussion now to comply with new rubrics.
This PR (SPARK-31136) is created to see how many tests are affected.
Why are the changes needed?
This aims to comply the new community policy.
Does this PR introduce any user-facing change?
Yes. This restores the existing behavior to avoid breaking the existing user pipeline.
How was this patch tested?
Pass the Jenkins.