Skip to content

[SPARK-17353] [SPARK-16943] [SPARK-16942] [BACKPORT-2.0] [SQL] Fix multiple bugs in CREATE TABLE LIKE command #14946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Sep 3, 2016

What changes were proposed in this pull request?

This PR is to backport #14531.

The existing CREATE TABLE LIKE command has multiple issues:

  • The generated table is non-empty when the source table is a data source table. The major reason is the data source table is using the table property path to store the location of table contents. Currently, we keep it unchanged. Thus, we still create the same table with the same location.
  • The table type of the generated table is EXTERNAL when the source table is an external Hive Serde table. Currently, we explicitly set it to MANAGED, but Hive is checking the table property EXTERNAL to decide whether the table is EXTERNAL or not. (See https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1407-L1408) Thus, the created table is still EXTERNAL.
  • When the source table is a VIEW, the metadata of the generated table contains the original view text and view original text. So far, this does not break anything, but it could cause something wrong in Hive. (For example, https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1405-L1406)
  • The issue regarding the table comment. To follow what Hive does, the table comment should be cleaned, but the column comments should be still kept.
  • The INDEX table is not supported. Thus, we should throw an exception in this case.
  • owner should not be retained. ToHiveTable set it here no matter which value we set in CatalogTable. We set it to an empty string for avoiding the confusing output in Explain.
  • Add a support for temp tables
  • Like Hive, we should not copy the table properties from the source table to the created table, especially for the statistics-related properties, which could be wrong in the created table.
  • unsupportedFeatures should not be copied from the source table. The created table does not have these unsupported features.
  • When the type of source table is a view, the target table is using the default format of data source tables: spark.sql.sources.default.

This PR is to fix the above issues.

How was this patch tested?

Improve the test coverage by adding more test cases

@SparkQA
Copy link

SparkQA commented Sep 3, 2016

Test build #64892 has finished for PR 14946 at commit fc419f6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

It sounds like all the build 2.0 failed the same test case.

https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.0-test-sbt-hadoop-2.3/

Let me try to fix it.

@SparkQA
Copy link

SparkQA commented Sep 5, 2016

Test build #64945 has finished for PR 14946 at commit 66842b8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 6, 2016

Test build #64954 has finished for PR 14946 at commit 3d80d69.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -263,8 +263,7 @@ class SessionCatalog(
CatalogColumn(
name = c.name,
dataType = c.dataType.catalogString,
nullable = c.nullable,
comment = Option(c.name)
Copy link
Member Author

@gatorsmile gatorsmile Sep 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the existing master build, we removed this useless comment attribute. The major reason is the schema comparison also checks the comment. This is introduced in the PR: #14114

@gatorsmile
Copy link
Member Author

cc @cloud-fan @yhuai This PR is ready for review. Thanks!

@gatorsmile gatorsmile changed the title [SPARK-17353] [SPARK-16943] [SPARK-16942] [SPARK-16959] [BACKPORT-2.0] [SQL] Fix multiple bugs in CREATE TABLE LIKE command [SPARK-17353] [SPARK-16943] [SPARK-16942] [BACKPORT-2.0] [SQL] Fix multiple bugs in CREATE TABLE LIKE command Sep 6, 2016
asfgit pushed a commit that referenced this pull request Sep 6, 2016
…le bugs in CREATE TABLE LIKE command

### What changes were proposed in this pull request?
This PR is to backport #14531.

The existing `CREATE TABLE LIKE` command has multiple issues:

- The generated table is non-empty when the source table is a data source table. The major reason is the data source table is using the table property `path` to store the location of table contents. Currently, we keep it unchanged. Thus, we still create the same table with the same location.

- The table type of the generated table is `EXTERNAL` when the source table is an external Hive Serde table. Currently, we explicitly set it to `MANAGED`, but Hive is checking the table property `EXTERNAL` to decide whether the table is `EXTERNAL` or not. (See https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1407-L1408) Thus, the created table is still `EXTERNAL`.

- When the source table is a `VIEW`, the metadata of the generated table contains the original view text and view original text. So far, this does not break anything, but it could cause something wrong in Hive. (For example, https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1405-L1406)

- The issue regarding the table `comment`. To follow what Hive does, the table comment should be cleaned, but the column comments should be still kept.

- The `INDEX` table is not supported. Thus, we should throw an exception in this case.

- `owner` should not be retained. `ToHiveTable` set it [here](https://github.com/apache/spark/blob/e679bc3c1cd418ef0025d2ecbc547c9660cac433/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L793) no matter which value we set in `CatalogTable`. We set it to an empty string for avoiding the confusing output in Explain.

- Add a support for temp tables

- Like Hive, we should not copy the table properties from the source table to the created table, especially for the statistics-related properties, which could be wrong in the created table.

- `unsupportedFeatures` should not be copied from the source table. The created table does not have these unsupported features.

- When the type of source table is a view, the target table is using the default format of data source tables: `spark.sql.sources.default`.

This PR is to fix the above issues.

### How was this patch tested?
Improve the test coverage by adding more test cases

Author: gatorsmile <gatorsmile@gmail.com>

Closes #14946 from gatorsmile/createTableLike20.
@cloud-fan
Copy link
Contributor

LGTM, merging to 2.0!

@gatorsmile
Copy link
Member Author

Thanks!

@gatorsmile gatorsmile closed this Sep 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants