[SPARK-19152][SQL][followup] simplify CreateHiveTableAsSelectCommand #16693

cloud-fan · 2017-01-24T13:44:15Z

What changes were proposed in this pull request?

After #16552 , CreateHiveTableAsSelectCommand becomes very similar to CreateDataSourceTableAsSelectCommand, and we can further simplify it by only creating table in the table-not-exist branch.

This PR also adds hive provider checking in DataStream reader/writer, which is missed in #16552

How was this patch tested?

N/A

cloud-fan · 2017-01-24T13:44:43Z

cc @gatorsmile @windpiger

SparkQA · 2017-01-24T16:10:06Z

Test build #71933 has finished for PR 16693 at commit db00cf9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-01-25T03:55:46Z

sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala

@@ -221,6 +222,11 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
   * @since 2.0.0
   */
  def start(): StreamingQuery = {
+    if (source.toLowerCase == DDLUtils.HIVE_PROVIDER) {
+      throw new AnalysisException("Hive data source can only be used with tables, you can not " +
+        "read files of Hive data source directly.")


This is not to read but write the results to Hive tables, right?

gatorsmile · 2017-01-25T03:56:18Z

sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala

@@ -116,6 +117,11 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
   * @since 2.0.0
   */
  def load(): DataFrame = {
+    if (source.toLowerCase == DDLUtils.HIVE_PROVIDER) {
+      throw new AnalysisException("Hive data source can only be used with tables, you can not " +
+        "write files of Hive data source directly.")


This is to read the streaming data from Hive tables, right? I think we need to fix the error message.

gatorsmile · 2017-01-25T04:41:34Z

...hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala

@@ -89,12 +55,30 @@ case class CreateHiveTableAsSelectCommand(
        // Since the table already exists and the save mode is Ignore, we will just return.
        return Seq.empty
      }
-      sparkSession.sessionState.executePlan(InsertIntoTable(
-        metastoreRelation, Map(), query, overwrite = false, ifNotExists = false)).toRdd


uh... Previously, we try to create the table even if the table still exists. A good change!

gatorsmile · 2017-01-25T04:53:28Z

...hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala

-      val withSchema = if (withFormat.schema.isEmpty) {
-        tableDesc.copy(schema = query.schema)
-      } else {
-        withFormat


To the other reviewers, this is not needed, because the schema is always empty when we need to create a table. See the assert here..

gatorsmile · 2017-01-25T04:54:27Z

...hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala

-            tableDesc.storage.outputFormat
-              .orElse(Some(classOf[HiveIgnoreKeyTextOutputFormat[Text, Text]].getName)),
-          serde = tableDesc.storage.serde.orElse(Some(classOf[LazySimpleSerDe].getName)),
-          compressed = tableDesc.storage.compressed)


Actually, after the code refactoring, this is always ensured in the rule DetermineHiveSerde.

gatorsmile · 2017-01-25T04:55:01Z

LGTM except two minor comments in the error messages.

cloud-fan · 2017-01-26T11:26:21Z

@gatorsmile thanks for adding comments about why the cleanup is safe!

SparkQA · 2017-01-26T13:51:58Z

Test build #72022 has finished for PR 16693 at commit f4a9342.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-01-26T18:12:03Z

retest this please

gatorsmile · 2017-01-28T16:27:03Z

retest this please

gatorsmile · 2017-01-28T16:27:23Z

ok to test

SparkQA · 2017-01-28T18:51:38Z

Test build #72109 has finished for PR 16693 at commit f4a9342.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-01-29T04:38:51Z

Thanks! Merging to master.

## What changes were proposed in this pull request? After apache#16552 , `CreateHiveTableAsSelectCommand` becomes very similar to `CreateDataSourceTableAsSelectCommand`, and we can further simplify it by only creating table in the table-not-exist branch. This PR also adds hive provider checking in DataStream reader/writer, which is missed in apache#16552 ## How was this patch tested? N/A Author: Wenchen Fan <wenchen@databricks.com> Closes apache#16693 from cloud-fan/minor.

simplify CreateHiveTableAsSelectCommand

db00cf9

gatorsmile reviewed Jan 25, 2017

View reviewed changes

fix message

f4a9342

asfgit closed this in f7c07db Jan 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-19152][SQL][followup] simplify CreateHiveTableAsSelectCommand #16693

[SPARK-19152][SQL][followup] simplify CreateHiveTableAsSelectCommand #16693

Uh oh!

cloud-fan commented Jan 24, 2017

Uh oh!

cloud-fan commented Jan 24, 2017

Uh oh!

SparkQA commented Jan 24, 2017

Uh oh!

gatorsmile Jan 25, 2017

Uh oh!

gatorsmile Jan 25, 2017

Uh oh!

gatorsmile Jan 25, 2017

Uh oh!

gatorsmile Jan 25, 2017

Uh oh!

gatorsmile Jan 25, 2017

Uh oh!

gatorsmile commented Jan 25, 2017

Uh oh!

cloud-fan commented Jan 26, 2017

Uh oh!

SparkQA commented Jan 26, 2017

Uh oh!

gatorsmile commented Jan 26, 2017

Uh oh!

gatorsmile commented Jan 28, 2017

Uh oh!

gatorsmile commented Jan 28, 2017

Uh oh!

SparkQA commented Jan 28, 2017

Uh oh!

gatorsmile commented Jan 29, 2017

Uh oh!

Uh oh!

[SPARK-19152][SQL][followup] simplify CreateHiveTableAsSelectCommand #16693

[SPARK-19152][SQL][followup] simplify CreateHiveTableAsSelectCommand #16693

Uh oh!

Conversation

cloud-fan commented Jan 24, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan commented Jan 24, 2017

Uh oh!

SparkQA commented Jan 24, 2017

Uh oh!

gatorsmile Jan 25, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jan 25, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jan 25, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jan 25, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile Jan 25, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Jan 25, 2017

Uh oh!

cloud-fan commented Jan 26, 2017

Uh oh!

SparkQA commented Jan 26, 2017

Uh oh!

gatorsmile commented Jan 26, 2017

Uh oh!

gatorsmile commented Jan 28, 2017

Uh oh!

gatorsmile commented Jan 28, 2017

Uh oh!

SparkQA commented Jan 28, 2017

Uh oh!

gatorsmile commented Jan 29, 2017

Uh oh!

Uh oh!