[SPARK-22934] [SQL] Make optional clauses order insensitive for CREATE TABLE SQL statement #20133

gatorsmile · 2018-01-01T07:06:42Z

What changes were proposed in this pull request?

Currently, our CREATE TABLE syntax require the EXACT order of clauses. It is pretty hard to remember the exact order. Thus, this PR is to make optional clauses order insensitive for CREATE TABLE SQL statement.

CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db_name.]table_name
    [(col_name1 col_type1 [COMMENT col_comment1], ...)]
    USING datasource
    [OPTIONS (key1=val1, key2=val2, ...)]
    [PARTITIONED BY (col_name1, col_name2, ...)]
    [CLUSTERED BY (col_name3, col_name4, ...) INTO num_buckets BUCKETS]
    [LOCATION path]
    [COMMENT table_comment]
    [TBLPROPERTIES (key1=val1, key2=val2, ...)]
    [AS select_statement]

The proposal is to make the following clauses order insensitive.

    [OPTIONS (key1=val1, key2=val2, ...)]
    [PARTITIONED BY (col_name1, col_name2, ...)]
    [CLUSTERED BY (col_name3, col_name4, ...) INTO num_buckets BUCKETS]
    [LOCATION path]
    [COMMENT table_comment]
    [TBLPROPERTIES (key1=val1, key2=val2, ...)]

The same idea is also applicable to Create Hive Table.

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
    [(col_name1[:] col_type1 [COMMENT col_comment1], ...)]
    [COMMENT table_comment]
    [PARTITIONED BY (col_name2[:] col_type2 [COMMENT col_comment2], ...)]
    [ROW FORMAT row_format]
    [STORED AS file_format]
    [LOCATION path]
    [TBLPROPERTIES (key1=val1, key2=val2, ...)]
    [AS select_statement]

The proposal is to make the following clauses order insensitive.

    [COMMENT table_comment]
    [PARTITIONED BY (col_name2[:] col_type2 [COMMENT col_comment2], ...)]
    [ROW FORMAT row_format]
    [STORED AS file_format]
    [LOCATION path]
    [TBLPROPERTIES (key1=val1, key2=val2, ...)]

How was this patch tested?

Added test cases

SparkQA · 2018-01-01T08:57:00Z

Test build #85576 has finished for PR 20133 at commit 8ae8f18.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2018-01-01T14:16:44Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala

@@ -39,6 +41,17 @@ object ParserUtils {
    throw new ParseException(s"Operation not allowed: $message", ctx)
  }

+  def duplicateClausesNotAllowed(message: String, ctx: ParserRuleContext): Nothing = {
+    throw new ParseException(s"Found duplicate clauses: $message", ctx)


We cannot merge these two functions to check the duplication?
e.g.,

def checkDuplicateClauses[T](nodes: util.List[T], clauseName: String, ctx: ParserRuleContext): Unit = { if (nodes.size() > 1) { throw new ParseException(s"Found duplicate clauses: $clauseName", ctx) } }

Sounds good to me!

SparkQA · 2018-01-01T15:29:25Z

Test build #85578 has finished for PR 20133 at commit 0894f5e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2018-01-01T22:00:11Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

@@ -408,9 +417,17 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) {
        .map(visitIdentifierList(_).toArray)
        .getOrElse(Array.empty[String])
    val properties = Option(ctx.tableProps).map(visitPropertyKeyValues).getOrElse(Map.empty)
-    val bucketSpec = Option(ctx.bucketSpec()).map(visitBucketSpec)
+    val bucketSpec = if (ctx.bucketSpec().size > 1) {
+      duplicateClausesNotAllowed("CLUSTERED BY", ctx)


Can you split the validation logic and the extraction logic? In this case I'd move the check to line 411 and do the extract on line 420.

viirya · 2018-01-01T07:39:47Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

   *   [[AS] select_statement];
+   *
+   *   create_table_clauses (order insensitive):
+   *     [PARTITIONED BY (col_name, col_name, ...)]


Isn't [OPTIONS table_property_list] one of create_table_clauses?

dongjoon-hyun · 2018-01-02T01:49:42Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala

             |LOCATION "${dir.toURI}"
+             |PARTITIONED BY(a, b)


Is it a relevant change? Since the PR is about ORDER-INSENSITIVENESS, can we keep the original code instead of making an irrelevant change like this?

This is an end-to-end test for ORDER-INSENSITIVENESS. I do not want to introduce a new one for it

Oh, I see. +1.

dongjoon-hyun · 2018-01-02T01:51:11Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

+          |CLUSTERED BY(id)
+          |SORTED BY(id, name) INTO 1024 BUCKETS
+          |PARTITIONED BY (ds string)
+        """.stripMargin)


Can we keep the original HiveDDLSuite.scala file, too?

The same here.

dongjoon-hyun · 2018-01-02T01:52:00Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLParserSuite.scala

        |COMMENT 'This is the staging page view table'
        |STORED AS RCFILE
        |LOCATION '/user/external/page_view'
        |TBLPROPERTIES ('p1'='v1', 'p2'='v2')
-        |AS SELECT * FROM src""".stripMargin
+        |AS SELECT * FROM src
+       """.stripMargin


nit. extra space before """.

SparkQA · 2018-01-02T06:10:18Z

Test build #85584 has finished for PR 20133 at commit 68170bb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-01-02T06:19:09Z

Test build #85585 has finished for PR 20133 at commit 9818ab5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-01-03T01:40:49Z

cc @cloud-fan @hvanhovell

cloud-fan · 2018-01-03T06:07:39Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

    val dataCols = Option(ctx.columns).map(visitColTypeList).getOrElse(Nil)
    val partitionCols = Option(ctx.partitionColumns).map(visitColTypeList).getOrElse(Nil)
-    val properties = Option(ctx.tablePropertyList).map(visitPropertyKeyValues).getOrElse(Map.empty)
+    val properties = Option(ctx.tableProps).map(visitPropertyKeyValues).getOrElse(Map.empty)


what's the meaning of ctx.tableProps now? the union of all TABLE PROPERTY list?

The last one, if we have multiple clauses. However, we blocks this in the above checks.

cloud-fan · 2018-01-03T06:08:07Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

+      createFileFormatCtx: Seq[CreateFileFormatContext],
+      parentCtx: ParserRuleContext): Unit = {
+    if (rowFormatCtx.size == 1 && createFileFormatCtx.size == 1) {
+      validateRowFormatFileFormat(rowFormatCtx.head, createFileFormatCtx.head, parentCtx)


shall we just combine this method and the old validateRowFormatFileFormat?

Will do it in a follow-up PR

cloud-fan · 2018-01-03T06:08:30Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

@@ -1180,7 +1202,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) {
            ctx)
        }

-        val hasStorageProperties = (ctx.createFileFormat != null) || (ctx.rowFormat != null)
+        val hasStorageProperties = (ctx.createFileFormat.size != 0) || (ctx.rowFormat.size != 0)


shall we use > 0 to be consistent with other places?

cloud-fan · 2018-01-03T06:11:06Z

LGTM

gatorsmile · 2018-01-03T14:07:54Z

Thanks! Merged to master and 2.3.

… TABLE SQL statement ## What changes were proposed in this pull request? Currently, our CREATE TABLE syntax require the EXACT order of clauses. It is pretty hard to remember the exact order. Thus, this PR is to make optional clauses order insensitive for `CREATE TABLE` SQL statement. ``` CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db_name.]table_name [(col_name1 col_type1 [COMMENT col_comment1], ...)] USING datasource [OPTIONS (key1=val1, key2=val2, ...)] [PARTITIONED BY (col_name1, col_name2, ...)] [CLUSTERED BY (col_name3, col_name4, ...) INTO num_buckets BUCKETS] [LOCATION path] [COMMENT table_comment] [TBLPROPERTIES (key1=val1, key2=val2, ...)] [AS select_statement] ``` The proposal is to make the following clauses order insensitive. ``` [OPTIONS (key1=val1, key2=val2, ...)] [PARTITIONED BY (col_name1, col_name2, ...)] [CLUSTERED BY (col_name3, col_name4, ...) INTO num_buckets BUCKETS] [LOCATION path] [COMMENT table_comment] [TBLPROPERTIES (key1=val1, key2=val2, ...)] ``` The same idea is also applicable to Create Hive Table. ``` CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name [(col_name1[:] col_type1 [COMMENT col_comment1], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name2[:] col_type2 [COMMENT col_comment2], ...)] [ROW FORMAT row_format] [STORED AS file_format] [LOCATION path] [TBLPROPERTIES (key1=val1, key2=val2, ...)] [AS select_statement] ``` The proposal is to make the following clauses order insensitive. ``` [COMMENT table_comment] [PARTITIONED BY (col_name2[:] col_type2 [COMMENT col_comment2], ...)] [ROW FORMAT row_format] [STORED AS file_format] [LOCATION path] [TBLPROPERTIES (key1=val1, key2=val2, ...)] ``` ## How was this patch tested? Added test cases Author: gatorsmile <gatorsmile@gmail.com> Closes #20133 from gatorsmile/createDataSourceTableDDL. (cherry picked from commit 1a87a16) Signed-off-by: gatorsmile <gatorsmile@gmail.com>

fix

8ae8f18

fix.

0894f5e

maropu reviewed Jan 1, 2018

View reviewed changes

hvanhovell reviewed Jan 1, 2018

View reviewed changes

viirya reviewed Jan 1, 2018

View reviewed changes

dongjoon-hyun reviewed Jan 2, 2018

View reviewed changes

gatorsmile added 2 commits January 2, 2018 10:53

fix.

68170bb

nit

9818ab5

cloud-fan reviewed Jan 3, 2018

View reviewed changes

asfgit closed this in 1a87a16 Jan 3, 2018

[SPARK-22934] [SQL] Make optional clauses order insensitive for CREATE TABLE SQL statement #20133

[SPARK-22934] [SQL] Make optional clauses order insensitive for CREATE TABLE SQL statement #20133

Uh oh!

Conversation

gatorsmile commented Jan 1, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jan 1, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 1, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 2, 2018

Uh oh!

SparkQA commented Jan 2, 2018

Uh oh!

gatorsmile commented Jan 3, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jan 3, 2018

Uh oh!

gatorsmile commented Jan 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

gatorsmile commented Jan 3, 2018 •

edited

Loading