Skip to content

[SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework #27187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Jan 13, 2020

What changes were proposed in this pull request?

Use the new framework to resolve the DESCRIBE TABLE command.

The v1 DESCRIBE TABLE command supports both table and view. Checked with Hive and Presto, they don't have DESCRIBE TABLE syntax but only DESCRIBE, which supports both table and view:

  1. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DescribeTable/View/MaterializedView/Column
  2. https://prestodb.io/docs/current/sql/describe.html

We should make it clear that DESCRIBE support both table and view, by renaming the command to DescribeRelation.

This PR also tunes the framework a little bit to support the case that a command accepts both table and view.

Why are the changes needed?

This is a part of effort to make the relation lookup behavior consistent: SPARK-29900.

Note that I make a separate PR here instead of #26921, as I need to update the framework to support a new use case: accept both table and view.

Does this PR introduce any user-facing change?

no

How was this patch tested?

existing tests

* Holds the name of a namespace that has yet to be looked up in a catalog. It will be resolved to
* [[ResolvedNamespace]] during analysis.
*/
case class UnresolvedNamespace(multipartIdentifier: Seq[String]) extends LeafNode {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move these new unresolve plans to one file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: v2Unresolved seems not to fit all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about v2ResolutionPlans?

@cloud-fan
Copy link
Contributor Author

relation: LogicalPlan,
partitionSpec: TablePartitionSpec,
isExtended: Boolean) extends Command {
override def children: Seq[LogicalPlan] = Seq(relation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this def is needed for all commands resolved by the new framework, we may create a trait to reduced redundancy

@SparkQA
Copy link

SparkQA commented Jan 13, 2020

Test build #116611 has finished for PR 27187 at commit 4e0e8c1.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 13, 2020

Test build #116609 has finished for PR 27187 at commit 0b729b9.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jan 13, 2020

Test build #116614 has finished for PR 27187 at commit 4e0e8c1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 13, 2020

Test build #116627 has finished for PR 27187 at commit 7dcd69b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 13, 2020

Test build #116643 has finished for PR 27187 at commit 234b091.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Comment on lines 909 to 912
CatalogV2Util.loadTable(catalog, ident) match {
// We don't support view in v2 catalog yet. Here we treat v1 view as table and use
// `ResolvedTable` to carry it.
case Some(table) => ResolvedTable(catalog.asTableCatalog, ident, table)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to add a case?

CatalogV2Util.loadTable(catalog, ident) match {
  case Some(v1Table: V1Table) if v1Table.v1Table.tableType == CatalogTableType.VIEW =>
  ...
}

Otherwise I'm not sure why case Some(table) => is for v1 view.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. Adding a case is clearer.

Comment on lines 762 to 763
case u @ UnresolvedTableOrView(ident) =>
lookupTempView(ident).getOrElse(u)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it is very similar to UnresolvedRelation?

UnresolvedRelation can also be resolve to view by ResolveTempViews, can be resolved to v2 relation by ResolveTables, can be resolved to v1 relation by ResolveRelations.

Seems we have a few similar ways to do resolving?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some commands don't need to scan the table/view (like DESCRIBE). If we use UnresolvedRelation, then we will end up with DataSourceV2ScanRelation, which creates Scan from Table and is not necessary.

Another problem is view. UnresolvedRelation can be resolved to View, and then get removed at the beginning of optimizer.

I'll think of how to overcome these issues and reuse UnresolvedRelation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did more investigation, and think we can't reuse UnresolvedRelation.

UnresolvedRelation means something we want to read/write, so it needs to resolve the data source, and fail if something goes wrong, e.g. the data source class can't be loaded.

For DESCRIBE commands, it should work as long as we can get the table. For example, we may want to use DESCRIBE to debug a table which can't load data source class.

*/
// TODO: create a generic representation for temp view, v1 view and v2 view, after we add view
// support to v2 catalog. For now we only need the identifier to fallback to v1 command.
case class ResolvedView(identifier: Identifier) extends LeafNode {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add UnresolvedView when we migrate ALTER VIEW command. @imback82

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Thanks!

@SparkQA
Copy link

SparkQA commented Jan 14, 2020

Test build #116707 has finished for PR 27187 at commit 8ed967c.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 14, 2020

Test build #116708 has started for PR 27187 at commit 7654a20.

@SparkQA
Copy link

SparkQA commented Jan 14, 2020

Test build #116706 has finished for PR 27187 at commit 7654a20.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ResolvedView(identifier: Identifier) extends LeafNode

@SparkQA
Copy link

SparkQA commented Jan 14, 2020

Test build #116715 has finished for PR 27187 at commit 1919054.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ResolvedView(identifier: Identifier) extends LeafNode

Copy link
Contributor

@imback82 imback82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. (This is a good enhancement to the resolution framework!)

*/
// TODO: create a generic representation for temp view, v1 view and v2 view, after we add view
// support to v2 catalog. For now we only need the identifier to fallback to v1 command.
case class ResolvedView(identifier: Identifier) extends LeafNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Thanks!

case None => u
}
case _ => u
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is almost same as UnresolvedTable case? Maybe we can do the following, or may not be worthwhile:

      case u @ UnresolvedTable(identifier) =>
        lookupTableOrView(u, identifier) {
          u.failAnalysis(s"$ident is a view not table.")
        }

      case u @ UnresolvedTableOrView(identifier) =>
        lookupTableOrView(u, identifier) {
          ResolvedView(identifier)
        }
    }

    private def lookupTableOrView(
        unresolved: LogicalPlan, identifier: Seq[String])(viewHandler: => LogicalPlan): LogicalPlan = {
      expandRelationName(identifier) match {
        case SessionCatalogAndIdentifier(catalog, ident) =>
          CatalogV2Util.loadTable(catalog, ident) match {
            case Some(v1Table: V1Table) if v1Table.v1Table.tableType == CatalogTableType.VIEW =>
              viewHandler
            case Some(table) => ResolvedTable(catalog.asTableCatalog, ident, table)
            case None => unresolved
          }
        case _ => unresolved
      }

case DescribeRelation(ResolvedTable(_, ident, _: V1Table), partitionSpec, isExtended) =>
DescribeTableCommand(ident.asTableIdentifier, partitionSpec, isExtended)

// Use v1 command to describe temp view, as v2 catalog doesn't support view yet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

describe (temp) view


// Use v1 command to describe temp view, as v2 catalog doesn't support view yet.
case DescribeRelation(ResolvedView(ident), partitionSpec, isExtended) =>
DescribeTableCommand(ident.asTableIdentifier, partitionSpec, isExtended)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks clean 👍

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented Jan 15, 2020

Test build #116747 has finished for PR 27187 at commit 5622322.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jan 15, 2020

Test build #116763 has finished for PR 27187 at commit 5622322.

  • This patch fails from timeout after a configured wait of 400m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 15, 2020

Test build #116784 has finished for PR 27187 at commit 5622322.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merged to master

@gatorsmile
Copy link
Member

cc @imback82

@imback82
Copy link
Contributor

imback82 commented Jan 16, 2020

Thanks. I will start migrating more commands to the new framework this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants