Skip to content

[SPARK-46122][SQL] Set spark.sql.legacy.createHiveTableByDefault to false by default #46207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Apr 24, 2024

What changes were proposed in this pull request?

This PR aims to switch spark.sql.legacy.createHiveTableByDefault to false by default in order to move away from this legacy behavior from Apache Spark 4.0.0 while the legacy functionality will be preserved during Apache Spark 4.x period by setting spark.sql.legacy.createHiveTableByDefault=true.

Why are the changes needed?

Historically, this behavior change was merged at Apache Spark 3.0.0 activity in SPARK-30098 and reverted officially during the 3.0.0 RC period.

At Apache Spark 3.1.0, we had another discussion and defined it as Legacy behavior via a new configuration by reusing the JIRA ID, SPARK-30098.

Last year, this was proposed again twice and Apache Spark 4.0.0 is a good time to make a decision for Apache Spark future direction.

  • SPARK-42603 on 2023-02-27 as an independent idea.
  • SPARK-46122 on 2023-11-27 as a part of Apache Spark 4.0.0 idea

Does this PR introduce any user-facing change?

Yes, the migration document is updated.

How was this patch tested?

Pass the CIs with the adjusted test cases.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun dongjoon-hyun marked this pull request as draft April 24, 2024 16:59
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-46122][SQL] Disable spark.sql.legacy.createHiveTableByDefault by default [SPARK-46122][SQL] Disable spark.sql.legacy.createHiveTableByDefault by default Apr 24, 2024
@github-actions github-actions bot added the SQL label Apr 24, 2024
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-46122][SQL] Disable spark.sql.legacy.createHiveTableByDefault by default [SPARK-46122][SQL] Set spark.sql.legacy.createHiveTableByDefault to false by default Apr 24, 2024
@github-actions github-actions bot added the DOCS label Apr 24, 2024
@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review April 25, 2024 01:59
Copy link
Member

@yaooqinn yaooqinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dongjoon-hyun
Copy link
Member Author

Thank you, @yaooqinn . I'll throw a discussion thread for this Tonight.

@dongjoon-hyun
Copy link
Member Author

I started a vote for this PR too.

@dongjoon-hyun
Copy link
Member Author

Hi, @cloud-fan , @yaooqinn , @ulysses-you . If you don't mind, could you participate the vote? :)

@dongjoon-hyun
Copy link
Member Author

Thank you all. Votes passed.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-46122 branch April 30, 2024 08:44
@dongjoon-hyun
Copy link
Member Author

Merged to master for Apache Spark 4.0.0.

JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
… `false` by default

### What changes were proposed in this pull request?

This PR aims to switch `spark.sql.legacy.createHiveTableByDefault` to `false` by default in order to move away from this legacy behavior from `Apache Spark 4.0.0` while the legacy functionality will be preserved during Apache Spark 4.x period by setting `spark.sql.legacy.createHiveTableByDefault=true`.

### Why are the changes needed?

Historically, this behavior change was merged at `Apache Spark 3.0.0` activity in SPARK-30098 and reverted officially during the `3.0.0 RC` period.

- 2019-12-06: apache#26736 (58be82a)
- 2019-12-06: https://lists.apache.org/thread/g90dz1og1zt4rr5h091rn1zqo50y759j
- 2020-05-16: apache#28517

At `Apache Spark 3.1.0`, we had another discussion and defined it as `Legacy` behavior via a new configuration by reusing the JIRA ID, SPARK-30098.
- 2020-12-01: https://lists.apache.org/thread/8c8k1jk61pzlcosz3mxo4rkj5l23r204
- 2020-12-03: apache#30554

Last year, this was proposed again twice and `Apache Spark 4.0.0` is a good time to make a decision for Apache Spark future direction.
- SPARK-42603 on 2023-02-27 as an independent idea.
- SPARK-46122 on 2023-11-27 as a part of Apache Spark 4.0.0 idea

### Does this PR introduce _any_ user-facing change?

Yes, the migration document is updated.

### How was this patch tested?

Pass the CIs with the adjusted test cases.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46207 from dongjoon-hyun/SPARK-46122.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants