Skip to content

[SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier #32542

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

imback82
Copy link
Contributor

What changes were proposed in this pull request?

This PR proposes to migrate the following ALTER TABLE commands that alter columns to use UnresolvedTable as a child to resolve the table identifier:

  • ALTER TABLE ... ADD COLUMNS
  • ALTER TABLE ... REPLACE COLUMNS
  • ALTER TABLE ... ALTER COLUMN
  • ALTER TABLE ... RENAME COLUMN
  • ALTER TABLE ... DROP COLUMNS

This allows consistent resolution rules (temp view first, etc.) to be applied for both v1/v2 commands. More info about the consistent resolution rule proposal can be found in JIRA or proposal doc.

Why are the changes needed?

This is a part of effort to make the relation lookup behavior consistent: SPARK-29900.

Does this PR introduce any user-facing change?

After this PR, the above ALTER TABLE commands will have a consistent resolution behavior.

How was this patch tested?

Updated existing tests.

@github-actions github-actions bot added the SQL label May 14, 2021
@SparkQA
Copy link

SparkQA commented May 14, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43055/

@SparkQA
Copy link

SparkQA commented May 14, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43055/

@SparkQA
Copy link

SparkQA commented May 14, 2021

Test build #138533 has finished for PR 32542 at commit 00a7607.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 14, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43063/

@SparkQA
Copy link

SparkQA commented May 14, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43063/

@imback82 imback82 changed the title [WIP][SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier [SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier May 14, 2021
@imback82
Copy link
Contributor Author

@cloud-fan this is a PR to attempt to migrate the remaining ALTER TABLE commands to the new resolution framework. Previous attempt was reverted due to few concerns (#27327). Please let me know what you think about the new changes. Thanks in advance!

@SparkQA
Copy link

SparkQA commented May 14, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43068/

@SparkQA
Copy link

SparkQA commented May 14, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43068/

case add: AddColumn =>
CatalogV2Util.failNullType(add.dataType)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved from ResolveSessionCatalog.scala.

@SparkQA
Copy link

SparkQA commented May 14, 2021

Test build #138542 has finished for PR 32542 at commit 0da0219.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 14, 2021

Test build #138549 has finished for PR 32542 at commit 9e058b1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

I think there are 2 options we can pick:

  1. Have a single AlterTable logical plan, which has 2 members: table: LogicalPlan and changes: Seq[TableChange]. We also have a single AlterTableExec physical plan.
  2. Have various AlterTableXXX logical plans, which are similar to the existing AlterTableXXXStatement. Update the code that resolve/check Seq[TableChange] to deal with AlterTableXXX logical plans instead. We still have a single AlterTableExec physical plan, the planner will convert all AlterTableXXX logical plans to it.

@imback82 what do you think? I feel option 1 is a smaller change.

@imback82
Copy link
Contributor Author

  1. Have a single AlterTable logical plan, which has 2 members: table: LogicalPlan and changes: Seq[TableChange]. We also have a single AlterTableExec physical plan.

Do you mean generating AlterTable in AstBuilder? One issue with this approach is for v1 commands (ResolveSessionCatalog) because there could be an ambiguity b/w ALTER TABLE ADD COLUMN and ALTER TABLE REPLACE COLUMN. That's why this PR created different AlterTableXXX with a base AlterTable.

  1. Have various AlterTableXXX logical plans, which are similar to the existing AlterTableXXXStatement. Update the code that resolve/check Seq[TableChange] to deal with AlterTableXXX logical plans instead. We still have a single AlterTableExec physical plan, the planner will convert all AlterTableXXX logical plans to it.

I can try this approach and create a separate PR for comparison. Let me know if I misunderstood 1) option above.

@cloud-fan
Copy link
Contributor

I see, then option 1 is not valid. Can you open a new PR to try option 2?

@imback82
Copy link
Contributor Author

Closing in favor of #32542

@imback82 imback82 closed this Jun 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants