Skip to content

[SPARK-30282][SQL] Migrate SHOW TBLPROPERTIES to new framework #26921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from

Conversation

imback82
Copy link
Contributor

@imback82 imback82 commented Dec 17, 2019

What changes were proposed in this pull request?

Use the new framework to resolve the SHOW TBLPROPERTIES command. This PR along with #27243 should update all the existing V2 commands with UnresolvedV2Relation.

Why are the changes needed?

This is a part of effort to make the relation lookup behavior consistent: SPARK-2990.

Does this PR introduce any user-facing change?

Yes SHOW TBLPROPERTIES temp_view now fails with AnalysisException will be thrown with a message temp_view is a temp view not table. Previously, it was returning empty row.

How was this patch tested?

Existing tests

@imback82 imback82 changed the title [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first [WIP][SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first Dec 17, 2019
@SparkQA
Copy link

SparkQA commented Dec 17, 2019

Test build #115436 has finished for PR 26921 at commit 5d8422d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 17, 2019

Test build #115437 has finished for PR 26921 at commit db311fd.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82 imback82 changed the title [WIP][SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first Dec 17, 2019
@SparkQA
Copy link

SparkQA commented Dec 17, 2019

Test build #115447 has finished for PR 26921 at commit ae524c3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 17, 2019

Test build #115445 has finished for PR 26921 at commit fc2c5a3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 17, 2019

Test build #115470 has finished for PR 26921 at commit 467145f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82
Copy link
Contributor Author

cc: @cloud-fan

@cloud-fan
Copy link
Contributor

I have some ideas about how to move this forward, @imback82 can you take a look at #26847 (comment) and share your thoughts?

@imback82 imback82 changed the title [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first [SPARK-30282][SQL] Integrate V2 commands with UnresolvedV2Relation into new resolution framework Jan 9, 2020
Copy link
Contributor Author

@imback82 imback82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I integrated two commands that were using UnresolvedV2Relation (ShowTableProperties and DescribeTable). Please let me know if the approach looks fine, then I will finish up AlterTable (and move onto other commands). Thanks!

@SparkQA
Copy link

SparkQA commented Jan 9, 2020

Test build #116343 has finished for PR 26921 at commit a845488.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 9, 2020

Test build #116341 has finished for PR 26921 at commit 517793f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

} else {
if (partitionSpec.nonEmpty) {
throw new AnalysisException("DESCRIBE TABLE does not support partition for v2 tables.")
case d @ DescribeTable(SessionCatalogAndResolvedTable(resolved), partitionSpec, isExtended) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems simpler to write

d @ DescribeTable(r: ResolvedTable, partitionSpec, isExtended) if isSessionCatalog(r.catalog) =>

DescribeTableCommand(getTableIdentifier(resolved), partitionSpec, isExtended)
case _ =>
// The v1 `DescribeTableCommand` can describe view as well.
if (isView(resolved.identifier.asMultipartIdentifier)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before we reach here, we may already fail with table/view mismatch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I checked with hive: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DescribeTable/View/MaterializedView/Column

The DESCRIBE command works for both table and view, in Spark we have a weird parser rule DESC TABLE? which makes people think it's for table only.

We can keep the parser rule, but we should rename DescribeTable to DescribeRelation, and use UnresolvedRelation to allow both table and view.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've open #27187 to refine it.

This is a feature we missed when designing the new framework, so I opened a separated PR to update the framework to support accepting both table and view like DESCRIBE command.

@imback82
Copy link
Contributor Author

imback82 commented Jan 10, 2020

Based on the discussion from #26847 (comment), should I make this PR WIP and try to integrate all commands first to verify the new framework is solid? Afterwards, we can split this PR into multiple ones.

@cloud-fan
Copy link
Contributor

SGTM

gatorsmile pushed a commit that referenced this pull request Jan 16, 2020
### What changes were proposed in this pull request?

Use the new framework to resolve the DESCRIBE TABLE command.

The v1 DESCRIBE TABLE command supports both table and view. Checked with Hive and Presto, they don't have DESCRIBE TABLE syntax but only DESCRIBE, which supports both table and view:
1. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DescribeTable/View/MaterializedView/Column
2. https://prestodb.io/docs/current/sql/describe.html

We should make it clear that DESCRIBE support both table and view, by renaming the command to `DescribeRelation`.

This PR also tunes the framework a little bit to support the case that a command accepts both table and view.

### Why are the changes needed?

This is a part of effort to make the relation lookup behavior consistent: SPARK-29900.

Note that I make a separate PR here instead of #26921, as I need to update the framework to support a new use case: accept both table and view.

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

existing tests

Closes #27187 from cloud-fan/describe.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
@imback82 imback82 changed the title [SPARK-30282][SQL] Integrate V2 commands with UnresolvedV2Relation into new resolution framework [SPARK-30282][SQL] Migrate SHOW TBLPROPERTIES to new framework Jan 16, 2020
@imback82
Copy link
Contributor Author

@cloud-fan, actually, it was a big change to migrate all commands into one WIP PR (also prob. hard to review anyway). I just changed this PR to only migrate SHOW TBLPROPERTIES. Hope this is OK.

@SparkQA
Copy link

SparkQA commented Jan 17, 2020

Test build #116877 has finished for PR 26921 at commit 4fe3e69.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 17, 2020

Test build #116876 has finished for PR 26921 at commit 72b3307.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val message = intercept[AnalysisException] {
sql("SHOW TBLPROPERTIES parquet_temp")
}.getMessage
assert(message.contains("parquet_temp is a temp view not table"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change LGTM, but can we add a note in the migration guide?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will add it. Thanks!

@cloud-fan
Copy link
Contributor

LGTM if tests pass

@SparkQA
Copy link

SparkQA commented Jan 17, 2020

Test build #116889 has finished for PR 26921 at commit fb47ea9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 17, 2020

Test build #116890 has finished for PR 26921 at commit 1ff1dcd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 17, 2020

Test build #116905 has finished for PR 26921 at commit 23b743c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

the last commit is just adding migration guide, and the previous commit passes all tests. I'm merging it to master, thanks!

@@ -130,10 +130,10 @@ class HiveCommandSuite extends QueryTest with SQLTestUtils with TestHiveSingleto
}

test("show tblproperties for datasource table - errors") {
val message1 = intercept[NoSuchTableException] {
val message = intercept[AnalysisException] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, this sounds like another behavior change from Apache Spark 2.4.4.
cc @gatorsmile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching it! I will update the migration guide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a follow-up PR: #27276

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CREATE VIEW gatorsmileView1
TBLPROPERTIES(prop1="3")
AS SELECT 1 AS c1, 2 AS c2

SHOW TBLPROPERTIES gatorsmileView1

This does not work after the migration. @imback82 @cloud-fan We need to double check whether the other migration causes the same regression.

val catalog = sparkSession.sessionState.catalog

if (catalog.isTemporaryTable(table)) {
Seq.empty[Row]
Copy link
Member

@gatorsmile gatorsmile Apr 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We allow users to specify the table properties when creating temporary views.

CREATE TEMPORARY VIEW v1
TBLPROPERTIES(prop1 ="3")
AS SELECT 1 AS c1, 2 AS c2

Instead of disallowing it, I think we should support it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I will create a PR for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon looking at this further, properties specified in CREATE TEMP VIEW is not used in CreateViewCommand, so we can revert the behavior back to the original one where we showed empty properties. What do you think?

I will still create a PR to support a permanent view, which is a regression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me. please remove the item in the migration guide too.

Thanks!

@imback82
Copy link
Contributor Author

This does not work after the migration. @imback82 @cloud-fan We need to double check whether the other migration causes the same regression.

Let me take a look.

@imback82
Copy link
Contributor Author

imback82 commented Apr 25, 2020

UnresolvedTable is used for SHOW TBLPROPERTIES and COMMENT ON TABLE, so only those should be affected. @cloud-fan can correct me if I am wrong here.

Since COMMENT ON TABLE is introduced in Spark 3.0, there should be no regression.

@cloud-fan
Copy link
Contributor

It's unfortunate that many TABLE commands in Spark support view as well. We need to be careful when we migrate commands to v2 later.

@imback82 can you send a PR to make SHOW TBLPROPERTIES support views?

@imback82
Copy link
Contributor Author

I created a PR: #28375

cloud-fan pushed a commit that referenced this pull request Apr 29, 2020
### What changes were proposed in this pull request?

This PR addresses two things:
- `SHOW TBLPROPERTIES` should supports view (a regression introduced by #26921)
- `SHOW TBLPROPERTIES` on a temporary view should return empty result (2.4 behavior instead of throwing `AnalysisException`.

### Why are the changes needed?

It's a bug.

### Does this PR introduce any user-facing change?

Yes, now `SHOW TBLPROPERTIES` works on views:
```
scala> sql("CREATE VIEW view TBLPROPERTIES('p1'='v1', 'p2'='v2') AS SELECT 1 AS c1")
scala> sql("SHOW TBLPROPERTIES view").show(truncate=false)
+---------------------------------+-------------+
|key                              |value        |
+---------------------------------+-------------+
|view.catalogAndNamespace.numParts|2            |
|view.query.out.col.0             |c1           |
|view.query.out.numCols           |1            |
|p2                               |v2           |
|view.catalogAndNamespace.part.0  |spark_catalog|
|p1                               |v1           |
|view.catalogAndNamespace.part.1  |default      |
+---------------------------------+-------------+
```
And for a temporary view:
```
scala> sql("CREATE TEMPORARY VIEW tview TBLPROPERTIES('p1'='v1', 'p2'='v2') AS SELECT 1 AS c1")
scala> sql("SHOW TBLPROPERTIES tview").show(truncate=false)
+---+-----+
|key|value|
+---+-----+
+---+-----+
```

### How was this patch tested?

Added tests.

Closes #28375 from imback82/show_tblproperties_followup.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Apr 29, 2020
### What changes were proposed in this pull request?

This PR addresses two things:
- `SHOW TBLPROPERTIES` should supports view (a regression introduced by #26921)
- `SHOW TBLPROPERTIES` on a temporary view should return empty result (2.4 behavior instead of throwing `AnalysisException`.

### Why are the changes needed?

It's a bug.

### Does this PR introduce any user-facing change?

Yes, now `SHOW TBLPROPERTIES` works on views:
```
scala> sql("CREATE VIEW view TBLPROPERTIES('p1'='v1', 'p2'='v2') AS SELECT 1 AS c1")
scala> sql("SHOW TBLPROPERTIES view").show(truncate=false)
+---------------------------------+-------------+
|key                              |value        |
+---------------------------------+-------------+
|view.catalogAndNamespace.numParts|2            |
|view.query.out.col.0             |c1           |
|view.query.out.numCols           |1            |
|p2                               |v2           |
|view.catalogAndNamespace.part.0  |spark_catalog|
|p1                               |v1           |
|view.catalogAndNamespace.part.1  |default      |
+---------------------------------+-------------+
```
And for a temporary view:
```
scala> sql("CREATE TEMPORARY VIEW tview TBLPROPERTIES('p1'='v1', 'p2'='v2') AS SELECT 1 AS c1")
scala> sql("SHOW TBLPROPERTIES tview").show(truncate=false)
+---+-----+
|key|value|
+---+-----+
+---+-----+
```

### How was this patch tested?

Added tests.

Closes #28375 from imback82/show_tblproperties_followup.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 3680303)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants