Skip to content

[SPARK-17502] [SQL] Fix Multiple Bugs in DDL Statements on Temporary Views #15054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from

Conversation

gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Sep 12, 2016

What changes were proposed in this pull request?

  • When the permanent tables/views do not exist but the temporary view exists, the expected error should be NoSuchTableException for partition-related ALTER TABLE commands. However, it always reports a confusing error message. For example,
Partition spec is invalid. The spec (a, b) must match the partition spec () defined in table '`testview`';
  • When the permanent tables/views do not exist but the temporary view exists, the expected error should be NoSuchTableException for ALTER TABLE ... UNSET TBLPROPERTIES. However, it reports a missing table property. For example,
Attempted to unset non-existent property 'p' in table '`testView`';
  • When ANALYZE TABLE is called on a view or a temporary view, we should issue an error message. However, it reports a strange error:
ANALYZE TABLE is not supported for Project
  • When inserting into a temporary view that is generated from Range, we will get the following error message:
assertion failed: No plan for 'InsertIntoTable Range (0, 10, step=1, splits=Some(1)), false, false
+- Project [1 AS 1#20]
   +- OneRowRelation$

This PR is to fix the above four issues.

How was this patch tested?

Added multiple test cases

@SparkQA
Copy link

SparkQA commented Sep 12, 2016

Test build #65240 has finished for PR 15054 at commit cc47c3e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile gatorsmile changed the title [SPARK-17502] [SQL] Fix Multiple Bugs in DDL Statements on Temporary Views [WIP] [SPARK-17502] [SQL] Fix Multiple Bugs in DDL Statements on Temporary Views Sep 13, 2016
@SparkQA
Copy link

SparkQA commented Sep 13, 2016

Test build #65299 has finished for PR 15054 at commit 855df61.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @cloud-fan @yhuai

@@ -91,6 +91,9 @@ case class AnalyzeTableCommand(tableName: String, noscan: Boolean = true) extend
case logicalRel: LogicalRelation if logicalRel.catalogTable.isDefined =>
updateTableStats(logicalRel.catalogTable.get, logicalRel.relation.sizeInBytes)

case o if !o.isInstanceOf[LeafNode] =>
Copy link
Contributor

@cloud-fan cloud-fan Sep 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better fix is: when do loopupRelation at L40, we always pass in a table identifier with database, so that it won't search the temp views.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! I like it!

@@ -913,7 +913,7 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach {
withTempView("show1a", "show2b") {
sql(
"""
|CREATE TEMPORARY TABLE show1a
|CREATE TEMPORARY VIEW show1a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep them unchanged, as CREATE TEMPORARY TABLE is still supported(but deprecated).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will roll it back

@gatorsmile
Copy link
Member Author

Will fix the remaining commands in ddl.scala

@SparkQA
Copy link

SparkQA commented Sep 14, 2016

Test build #65389 has finished for PR 15054 at commit 6029a95.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 14, 2016

Test build #65381 has finished for PR 15054 at commit 60371d8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* assume the table/view is in the current database. If the specified table/view is not found
* in the database then a [[NoSuchTableException]] is thrown.
*/
def getNonTempTableMetadata(name: TableIdentifier): CatalogTable = {
Copy link
Member Author

@gatorsmile gatorsmile Sep 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan In your PR #14962, getTableMetadata will not deal with temporary table. To simplify the potential conflict resolution, I just introduce a new function getNonTempTableMetadata. Thus, you can just basically replace it by the new getTableMetadata. Does it sound OK to you? Sorry for any inconvenience it might cause.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think getTableMetadata should handle temp view, maybe we can fix it in this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do it soon.

}.getMessage
assert(e.contains(s"Operation not allowed: TRUNCATE TABLE on temporary tables: `$viewName`"))
private def assertNoSuchTable(query: String): Unit = {
intercept[NoSuchTableException] {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, the behaviors are consistent. For the DDL commands that are unable to handle temporary views, we issue the same exception NoSuchTableException.

@SparkQA
Copy link

SparkQA commented Sep 15, 2016

Test build #65413 has finished for PR 15054 at commit 7c1a8e5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -360,6 +360,7 @@ trait CheckAnalysis extends PredicateHelper {

case InsertIntoTable(t, _, _, _, _)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @yhuai do you remember why we have this check? InsertIntoTable can only be used for table right? When will we hit this branch?

Copy link
Member Author

@gatorsmile gatorsmile Sep 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    spark.sql(
      s"""CREATE TEMPORARY VIEW normal_orc_source
         |USING org.apache.spark.sql.hive.orc
         |OPTIONS (
         |  PATH '${new File(orcTableAsDir.getAbsolutePath).getCanonicalPath}'
         |)
       """.stripMargin)
    sql("INSERT INTO TABLE normal_orc_source SELECT * FROM orc_temp_table WHERE intField > 5")

We allow users to insert rows into temporary views that is created using CREATE TEMPORARY VIEW.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the temporary view is created by createOrReplaceTempView, users are unable to insert rows into the temporary view.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can insert data into a temp view? Is it by design? cc @rxin @marmbrus @yhuai

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this might be designed for JDBC data sources at the beginning. Just a guess.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably dates back to when we called these temp tables and when we didn't have persistent data source tables. Given that I think we probably have to support this for compatibility. I'm not sure why this isn't a whitelist though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the reason is LogicalRelation is not in the package of sql\catalyst. Otherwise, it should be like

          case InsertIntoTable(t, _, _, _, _)
            if !t.isInstanceOf[SimpleCatalogRelation] && !t.isInstanceOf[LogicalRelation]

@SparkQA
Copy link

SparkQA commented Sep 15, 2016

Test build #65428 has finished for PR 15054 at commit 333497a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public static final class UnsignedPrefixComparatorNullsLast extends RadixSortSupport
    • public static final class UnsignedPrefixComparatorDescNullsFirst extends RadixSortSupport
    • public static final class SignedPrefixComparatorNullsLast extends RadixSortSupport
    • public static final class SignedPrefixComparatorDescNullsFirst extends RadixSortSupport
    • abstract sealed class NullOrdering
    • case class SortOrder(
    • trait Command extends LeafNode
    • trait RunnableCommand extends logical.Command
    • case class CreateTable(
    • case class AnalyzeCreateTable(sparkSession: SparkSession) extends Rule[LogicalPlan]
    • class SetAccumulator[T] extends AccumulatorV2[T, java.util.Set[T]]

@SparkQA
Copy link

SparkQA commented Sep 15, 2016

Test build #65426 has finished for PR 15054 at commit a01f6b3.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • class NoSuchTempViewException(table: String)
    • case class ShowColumnsCommand(tableName: TableIdentifier) extends RunnableCommand

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Sep 15, 2016

Test build #65444 has finished for PR 15054 at commit 333497a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public static final class UnsignedPrefixComparatorNullsLast extends RadixSortSupport
    • public static final class UnsignedPrefixComparatorDescNullsFirst extends RadixSortSupport
    • public static final class SignedPrefixComparatorNullsLast extends RadixSortSupport
    • public static final class SignedPrefixComparatorDescNullsFirst extends RadixSortSupport
    • abstract sealed class NullOrdering
    • case class SortOrder(
    • trait Command extends LeafNode
    • trait RunnableCommand extends logical.Command
    • case class CreateTable(
    • case class AnalyzeCreateTable(sparkSession: SparkSession) extends Rule[LogicalPlan]
    • class SetAccumulator[T] extends AccumulatorV2[T, java.util.Set[T]]

@@ -65,7 +64,11 @@ case class CreateTableLikeCommand(
s"Source table in CREATE TABLE LIKE does not exist: '$sourceTable'")
}

val sourceTableDesc = catalog.getTableMetadata(sourceTable)
val sourceTableDesc = if (catalog.isTemporaryTable(sourceTable)) {
catalog.getTempViewMetadata(sourceTable.table)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not atomic, we should have a method that return temp view metadata or None.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right. Let me fix it.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Sep 18, 2016

Test build #65574 has finished for PR 15054 at commit 333497a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public static final class UnsignedPrefixComparatorNullsLast extends RadixSortSupport
    • public static final class UnsignedPrefixComparatorDescNullsFirst extends RadixSortSupport
    • public static final class SignedPrefixComparatorNullsLast extends RadixSortSupport
    • public static final class SignedPrefixComparatorDescNullsFirst extends RadixSortSupport
    • abstract sealed class NullOrdering
    • case class SortOrder(
    • trait Command extends LeafNode
    • trait RunnableCommand extends logical.Command
    • case class CreateTable(
    • case class AnalyzeCreateTable(sparkSession: SparkSession) extends Rule[LogicalPlan]
    • class SetAccumulator[T] extends AccumulatorV2[T, java.util.Set[T]]

@SparkQA
Copy link

SparkQA commented Sep 19, 2016

Test build #65587 has finished for PR 15054 at commit f305c4c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val table = formatTableName(name.table)
val tid = TableIdentifier(table)
if (isTemporaryTable(name)) {
def getTempViewMetadata(name: String): CatalogTable = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move these 2 methods to the right section: handles only temp view

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we need this method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: ) Just removed getTempViewMetadata and moved getTempViewMetadataOption to the position you want

@@ -37,7 +38,9 @@ case class AnalyzeTableCommand(tableName: String, noscan: Boolean = true) extend
override def run(sparkSession: SparkSession): Seq[Row] = {
val sessionState = sparkSession.sessionState
val tableIdent = sessionState.sqlParser.parseTableIdentifier(tableName)
val relation = EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdent))
val db = tableIdent.database.getOrElse(sessionState.catalog.getCurrentDatabase)
val qualifiedName = TableIdentifier(tableIdent.table, Some(db))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's follow the existing convention and call it tableIdentWithDB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Clean all the related naming issues.

@SparkQA
Copy link

SparkQA commented Sep 19, 2016

Test build #65610 has finished for PR 15054 at commit c662d2c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* Retrieve the metadata of an existing temporary view.
* If the temporary view does not exist, return None.
*/
def getTempViewMetadataOption(name: String): Option[CatalogTable] = synchronized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we always use it with the pattern getTempViewMetadataOption.getOrElse(getTableMetadata), maybe we should just rename it to getTempViewOrPermanentTableMetadata?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can combine them. Let me do it. Thanks!

* If the specified table/view is not found in the database then a [[NoSuchTableException]] is
* thrown.
*/
def getTempViewOrPermanentTableMetadata(name: TableIdentifier): CatalogTable = synchronized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should just take a string.

@@ -444,7 +444,7 @@ class SessionCatalogSuite extends SparkFunSuite {
assert(!catalog.tableExists(TableIdentifier("view1", Some("default"))))
}

test("getTableMetadata on temporary views") {
test("getTableMetadata and getTempViewOrPermanentTableMetadata on temporary views") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it's unnecessary to test getTableMetadata on temporary views, let's just test getTempViewOrPermanentTableMetadata here.

@SparkQA
Copy link

SparkQA commented Sep 20, 2016

Test build #65637 has finished for PR 15054 at commit 48ce44e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 20, 2016

Test build #65638 has finished for PR 15054 at commit ee3096c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Sep 20, 2016

Test build #65642 has finished for PR 15054 at commit ee3096c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in d5ec5db Sep 20, 2016
@cloud-fan
Copy link
Contributor

LGTM, merging to master!

@cloud-fan
Copy link
Contributor

JIRA is down, I'll resolve the ticket later.

@cloud-fan
Copy link
Contributor

@gatorsmile how hard is it to backport this PR to 2.0?

@gatorsmile
Copy link
Member Author

It should be not hard. Will make a try today to backport to 2.0. Thanks!

ghost pushed a commit to dbtsai/spark that referenced this pull request Sep 22, 2016
## What changes were proposed in this pull request?

After apache#15054 , there is no place in Spark SQL that need `SessionCatalog.tableExists` to check temp views, so this PR makes `SessionCatalog.tableExists` only check permanent table/view and removes some hacks.

This PR also improves the `getTempViewOrPermanentTableMetadata` that is introduced in  apache#15054 , to make the code simpler.

## How was this patch tested?

existing tests

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#15160 from cloud-fan/exists.
asfgit pushed a commit that referenced this pull request Sep 23, 2016
…tements on Temporary Views

### What changes were proposed in this pull request?
This PR is to backport #15054 and #15160 to Spark 2.0.

- When the permanent tables/views do not exist but the temporary view exists, the expected error should be `NoSuchTableException` for partition-related ALTER TABLE commands. However, it always reports a confusing error message. For example,
```
Partition spec is invalid. The spec (a, b) must match the partition spec () defined in table '`testview`';
```
- When the permanent tables/views do not exist but the temporary view exists, the expected error should be `NoSuchTableException` for `ALTER TABLE ... UNSET TBLPROPERTIES`. However, it reports a missing table property. For example,
```
Attempted to unset non-existent property 'p' in table '`testView`';
```
- When `ANALYZE TABLE` is called on a view or a temporary view, we should issue an error message. However, it reports a strange error:
```
ANALYZE TABLE is not supported for Project
```

- When inserting into a temporary view that is generated from `Range`, we will get the following error message:
```
assertion failed: No plan for 'InsertIntoTable Range (0, 10, step=1, splits=Some(1)), false, false
+- Project [1 AS 1#20]
   +- OneRowRelation$
```

This PR is to fix the above four issues.

There is no place in Spark SQL that need `SessionCatalog.tableExists` to check temp views, so this PR makes `SessionCatalog.tableExists` only check permanent table/view and removes some hacks.

### How was this patch tested?
Added multiple test cases

Author: gatorsmile <gatorsmile@gmail.com>

Closes #15174 from gatorsmile/PR15054Backport.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants