[SPARK-19152][SQL]DataFrameWriter.saveAsTable support hive append #16552

windpiger · 2017-01-11T16:10:51Z

What changes were proposed in this pull request?

After SPARK-19107, we now can treat hive as a data source and create hive tables with DataFrameWriter and Catalog. However, the support is not completed, there are still some cases we do not support.

This PR implement:
DataFrameWriter.saveAsTable work with hive format with append mode

How was this patch tested?

unit test added

SparkQA · 2017-01-11T17:45:03Z

Test build #71220 has finished for PR 16552 at commit 25b39fa.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-12T04:02:32Z

Test build #71239 has finished for PR 16552 at commit b463ac7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

windpiger · 2017-01-16T11:16:47Z

retest this please

SparkQA · 2017-01-16T13:55:35Z

Test build #71443 has finished for PR 16552 at commit 29e1ee2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-16T13:58:07Z

Test build #71444 has finished for PR 16552 at commit 429a0ab.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-18T13:46:18Z

Test build #71599 has finished for PR 16552 at commit 21c5e3f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-01-19T03:33:39Z

The overall idea is to use InsertIntable to implement appending to hive table, but this approach is too hacky, we should follow the way how we deal with data source table, e.g. DataFrameWriter.saveAsTable just build a CreateTable plan, rule AnalyzeCreateTable do some checking and normalization, and another rule turn CreateTable into CreateDataSourceTableAsSelectCommand.

SparkQA · 2017-01-19T06:52:43Z

Test build #71639 has started for PR 16552 at commit 2bf67c7.

SparkQA · 2017-01-19T11:32:17Z

Test build #71654 has finished for PR 16552 at commit 0b9dc3a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class QualifiedTableName(database: String, name: String)
class FindHiveSerdeTable(session: SparkSession) extends Rule[LogicalPlan]

SparkQA · 2017-01-19T11:58:11Z

Test build #71652 has finished for PR 16552 at commit 6b8f625.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2017-01-19T12:06:07Z

Test build #71653 has finished for PR 16552 at commit 1145e52.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2017-01-19T14:43:15Z

Test build #71659 has finished for PR 16552 at commit 2f542ff.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-01-20T08:48:29Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala

      // Check if the specified data source match the data source of the existing table.
-      val existingProvider = DataSource.lookupDataSource(existingTable.provider.get)


We have HiveFileFormat, and we can make it implement DataSourceRegister, then DataSource.lookupDataSource("hive") can work.

cloud-fan · 2017-01-20T08:56:31Z

...hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala

@@ -69,7 +68,7 @@ case class CreateHiveTableAsSelectCommand(
        withFormat
      }

-      sparkSession.sessionState.catalog.createTable(withSchema, ignoreIfExists = false)
+      sparkSession.sessionState.catalog.createTable(withSchema, ignoreIfExists = true)


looks like we don't need to build withSchema anymore, the schema will be set in AnalyzeCreateTable

windpiger · 2017-01-23T23:40:18Z

retest this please

SparkQA · 2017-01-23T23:51:32Z

Test build #71886 has finished for PR 16552 at commit 6c09477.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class HiveFileFormat(fileSinkDesc: FileSinkDesc)

gatorsmile · 2017-01-24T00:08:32Z

You need to fetch the upstream and merge it to your local branch. Some changes were made and merged to the upstream/master, although they did not introduce the conflicts. The changes caused the compilation errors in your PR.

SparkQA · 2017-01-24T02:44:13Z

Test build #71888 has finished for PR 16552 at commit 98ec55a.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class HiveFileFormat(fileSinkConf: FileSinkDesc)

cloud-fan · 2017-01-24T03:34:29Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala

+           |USING hive
+         """.stripMargin)
+      val tempView = spark.sessionState.catalog.getTempView(tableName)
+      assert(tempView.isDefined, "create a temp view using hive should success")


hmmm, it's not expected. let's add a check in CreateTempViewUsing, and throw exception for hive provider, e.g. if (DDLUtils.isHiveTable(t)) throw ...

cloud-fan · 2017-01-24T06:19:39Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala

-          "not supported yet. Please use the insertInto() API as an alternative.")
-      }
-
-      // Check if the specified data source match the data source of the existing table.


why remove this line?

cloud-fan · 2017-01-24T06:20:47Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala

@@ -65,6 +65,10 @@ case class CreateTempViewUsing(
  }

  def run(sparkSession: SparkSession): Seq[Row] = {
+    if (provider.toLowerCase == DDLUtils.HIVE_PROVIDER) {
+      throw new AnalysisException("Currently Hive data source can not be created as a view")


Hive data source can only be used with tables, you cannot use it with CREATE TEMP VIEW USING

cloud-fan · 2017-01-24T06:22:04Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala

@@ -1461,6 +1461,25 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
    })
  }

+  test("run sql directly on files - hive") {
+    withTable("t") {


you don't need to create a table

withTempPath { path => spark.range(100).toDF.write.parquet(path.getAbsolutePath) ... sql(s"select id from hive.`${path.getAbsolutePath}`") }

SparkQA · 2017-01-24T06:43:42Z

Test build #71918 has started for PR 16552 at commit 7bf5b50.

windpiger · 2017-01-24T06:43:43Z

retest this please

SparkQA · 2017-01-24T06:47:41Z

Test build #71920 has started for PR 16552 at commit 7bf5b50.

SparkQA · 2017-01-24T07:45:32Z

Test build #71910 has finished for PR 16552 at commit f34ab6d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-01-24T08:12:04Z

retest this please

cloud-fan · 2017-01-24T08:12:33Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala

@@ -65,6 +65,11 @@ case class CreateTempViewUsing(
  }

  def run(sparkSession: SparkSession): Seq[Row] = {
+    if (provider.toLowerCase == DDLUtils.HIVE_PROVIDER) {
+      throw new AnalysisException("Hive data source can not be used with tables," +


can only be used

and please add a space after ,

windpiger · 2017-01-24T08:18:38Z

retest this please

SparkQA · 2017-01-24T10:46:41Z

Test build #71923 has finished for PR 16552 at commit 7bf5b50.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-24T10:59:17Z

Test build #71925 has finished for PR 16552 at commit 59db8e4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-01-24T12:41:02Z

thanks, merging to master!

## What changes were proposed in this pull request? After [SPARK-19107](https://issues.apache.org/jira/browse/SPARK-19107), we now can treat hive as a data source and create hive tables with DataFrameWriter and Catalog. However, the support is not completed, there are still some cases we do not support. This PR implement: DataFrameWriter.saveAsTable work with hive format with append mode ## How was this patch tested? unit test added Author: windpiger <songjun@outlook.com> Closes apache#16552 from windpiger/saveAsTableWithHiveAppend.

## What changes were proposed in this pull request? After apache#16552 , `CreateHiveTableAsSelectCommand` becomes very similar to `CreateDataSourceTableAsSelectCommand`, and we can further simplify it by only creating table in the table-not-exist branch. This PR also adds hive provider checking in DataStream reader/writer, which is missed in apache#16552 ## How was this patch tested? N/A Author: Wenchen Fan <wenchen@databricks.com> Closes apache#16693 from cloud-fan/minor.

## What changes were proposed in this pull request? After [SPARK-19107](https://issues.apache.org/jira/browse/SPARK-19107), we now can treat hive as a data source and create hive tables with DataFrameWriter and Catalog. However, the support is not completed, there are still some cases we do not support. This PR implement: DataFrameWriter.saveAsTable work with hive format with append mode ## How was this patch tested? unit test added Author: windpiger <songjun@outlook.com> Closes apache#16552 from windpiger/saveAsTableWithHiveAppend.

## What changes were proposed in this pull request? After apache#16552 , `CreateHiveTableAsSelectCommand` becomes very similar to `CreateDataSourceTableAsSelectCommand`, and we can further simplify it by only creating table in the table-not-exist branch. This PR also adds hive provider checking in DataStream reader/writer, which is missed in apache#16552 ## How was this patch tested? N/A Author: Wenchen Fan <wenchen@databricks.com> Closes apache#16693 from cloud-fan/minor.

[WIP][SPARK-19152][SQL]DataFrameWriter.saveAsTable support hive append

25b39fa

fix type mismatch

b463ac7

merge with mater, and add a testcase

29e1ee2

windpiger changed the title ~~[WIP][SPARK-19152][SQL]DataFrameWriter.saveAsTable support hive append~~ [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hive append Jan 16, 2017

remove a empty line

429a0ab

windpiger changed the title ~~[SPARK-19152][SQL]DataFrameWriter.saveAsTable support hive append~~ [WIP][SPARK-19152][SQL]DataFrameWriter.saveAsTable support hive append Jan 16, 2017

fix test failed

21c5e3f

windpiger changed the title ~~[WIP][SPARK-19152][SQL]DataFrameWriter.saveAsTable support hive append~~ [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hive append Jan 18, 2017

modify the append logic

2bf67c7

windpiger changed the title ~~[SPARK-19152][SQL]DataFrameWriter.saveAsTable support hive append~~ [WIP][SPARK-19152][SQL]DataFrameWriter.saveAsTable support hive append Jan 19, 2017

windpiger added 3 commits January 19, 2017 17:50

modify the append logic

6b8f625

fix a tc failed

1145e52

merge with master to fix conflict

0b9dc3a

fix test failed

2f542ff

windpiger changed the title ~~[WIP][SPARK-19152][SQL]DataFrameWriter.saveAsTable support hive append~~ [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hive append Jan 19, 2017

cloud-fan reviewed Jan 20, 2017

View reviewed changes

Merge branch 'master' into saveAsTableWithHiveAppend

cb7a1be

fix tc failed

98ec55a

cloud-fan reviewed Jan 24, 2017

View reviewed changes

fix some logic for unsupported hive datasource

f34ab6d

cloud-fan reviewed Jan 24, 2017

View reviewed changes

windpiger added 2 commits January 24, 2017 14:38

fix some review comment

722ad76

fix a tc

7bf5b50

cloud-fan reviewed Jan 24, 2017

View reviewed changes

fix a comment

59db8e4

asfgit closed this in 3c86fdd Jan 24, 2017

cloud-fan mentioned this pull request Jan 24, 2017

[SPARK-19152][SQL][followup] simplify CreateHiveTableAsSelectCommand #16693

Closed

		// Check if the specified data source match the data source of the existing table.
		val existingProvider = DataSource.lookupDataSource(existingTable.provider.get)

[SPARK-19152][SQL]DataFrameWriter.saveAsTable support hive append #16552

[SPARK-19152][SQL]DataFrameWriter.saveAsTable support hive append #16552

Uh oh!

Conversation

windpiger commented Jan 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jan 11, 2017

Uh oh!

SparkQA commented Jan 12, 2017

Uh oh!

windpiger commented Jan 16, 2017

Uh oh!

SparkQA commented Jan 16, 2017

Uh oh!

SparkQA commented Jan 16, 2017

Uh oh!

SparkQA commented Jan 18, 2017

Uh oh!

cloud-fan commented Jan 19, 2017

Uh oh!

SparkQA commented Jan 19, 2017

Uh oh!

SparkQA commented Jan 19, 2017

Uh oh!

SparkQA commented Jan 19, 2017

Uh oh!

SparkQA commented Jan 19, 2017

Uh oh!

SparkQA commented Jan 19, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

windpiger commented Jan 23, 2017

Uh oh!

SparkQA commented Jan 23, 2017

Uh oh!

gatorsmile commented Jan 24, 2017

Uh oh!

SparkQA commented Jan 24, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 24, 2017

Uh oh!

windpiger commented Jan 24, 2017

Uh oh!

SparkQA commented Jan 24, 2017

Uh oh!

SparkQA commented Jan 24, 2017

Uh oh!

cloud-fan commented Jan 24, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

windpiger commented Jan 24, 2017

Uh oh!

SparkQA commented Jan 24, 2017

Uh oh!

SparkQA commented Jan 24, 2017

Uh oh!

cloud-fan commented Jan 24, 2017

Uh oh!

Uh oh!

windpiger commented Jan 11, 2017 •

edited

Loading