-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-22934] [SQL] Make optional clauses order insensitive for CREATE TABLE SQL statement #20133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #85576 has finished for PR 20133 at commit
|
@@ -39,6 +41,17 @@ object ParserUtils { | |||
throw new ParseException(s"Operation not allowed: $message", ctx) | |||
} | |||
|
|||
def duplicateClausesNotAllowed(message: String, ctx: ParserRuleContext): Nothing = { | |||
throw new ParseException(s"Found duplicate clauses: $message", ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot merge these two functions to check the duplication?
e.g.,
def checkDuplicateClauses[T](nodes: util.List[T], clauseName: String, ctx: ParserRuleContext): Unit = {
if (nodes.size() > 1) {
throw new ParseException(s"Found duplicate clauses: $clauseName", ctx)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me!
Test build #85578 has finished for PR 20133 at commit
|
@@ -408,9 +417,17 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { | |||
.map(visitIdentifierList(_).toArray) | |||
.getOrElse(Array.empty[String]) | |||
val properties = Option(ctx.tableProps).map(visitPropertyKeyValues).getOrElse(Map.empty) | |||
val bucketSpec = Option(ctx.bucketSpec()).map(visitBucketSpec) | |||
val bucketSpec = if (ctx.bucketSpec().size > 1) { | |||
duplicateClausesNotAllowed("CLUSTERED BY", ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you split the validation logic and the extraction logic? In this case I'd move the check to line 411 and do the extract on line 420.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
* [[AS] select_statement]; | ||
* | ||
* create_table_clauses (order insensitive): | ||
* [PARTITIONED BY (col_name, col_name, ...)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't [OPTIONS table_property_list]
one of create_table_clauses
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forgot it.
|LOCATION "${dir.toURI}" | ||
|PARTITIONED BY(a, b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a relevant change? Since the PR is about ORDER-INSENSITIVENESS, can we keep the original code instead of making an irrelevant change like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an end-to-end test for ORDER-INSENSITIVENESS
. I do not want to introduce a new one for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. +1.
|CLUSTERED BY(id) | ||
|SORTED BY(id, name) INTO 1024 BUCKETS | ||
|PARTITIONED BY (ds string) | ||
""".stripMargin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep the original HiveDDLSuite.scala
file, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same here.
|COMMENT 'This is the staging page view table' | ||
|STORED AS RCFILE | ||
|LOCATION '/user/external/page_view' | ||
|TBLPROPERTIES ('p1'='v1', 'p2'='v2') | ||
|AS SELECT * FROM src""".stripMargin | ||
|AS SELECT * FROM src | ||
""".stripMargin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. extra space before """
.
Test build #85584 has finished for PR 20133 at commit
|
Test build #85585 has finished for PR 20133 at commit
|
val dataCols = Option(ctx.columns).map(visitColTypeList).getOrElse(Nil) | ||
val partitionCols = Option(ctx.partitionColumns).map(visitColTypeList).getOrElse(Nil) | ||
val properties = Option(ctx.tablePropertyList).map(visitPropertyKeyValues).getOrElse(Map.empty) | ||
val properties = Option(ctx.tableProps).map(visitPropertyKeyValues).getOrElse(Map.empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the meaning of ctx.tableProps
now? the union of all TABLE PROPERTY list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last one, if we have multiple clauses. However, we blocks this in the above checks.
createFileFormatCtx: Seq[CreateFileFormatContext], | ||
parentCtx: ParserRuleContext): Unit = { | ||
if (rowFormatCtx.size == 1 && createFileFormatCtx.size == 1) { | ||
validateRowFormatFileFormat(rowFormatCtx.head, createFileFormatCtx.head, parentCtx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we just combine this method and the old validateRowFormatFileFormat
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do it in a follow-up PR
@@ -1180,7 +1202,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { | |||
ctx) | |||
} | |||
|
|||
val hasStorageProperties = (ctx.createFileFormat != null) || (ctx.rowFormat != null) | |||
val hasStorageProperties = (ctx.createFileFormat.size != 0) || (ctx.rowFormat.size != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we use > 0
to be consistent with other places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
LGTM |
Thanks! Merged to master and 2.3. |
… TABLE SQL statement ## What changes were proposed in this pull request? Currently, our CREATE TABLE syntax require the EXACT order of clauses. It is pretty hard to remember the exact order. Thus, this PR is to make optional clauses order insensitive for `CREATE TABLE` SQL statement. ``` CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db_name.]table_name [(col_name1 col_type1 [COMMENT col_comment1], ...)] USING datasource [OPTIONS (key1=val1, key2=val2, ...)] [PARTITIONED BY (col_name1, col_name2, ...)] [CLUSTERED BY (col_name3, col_name4, ...) INTO num_buckets BUCKETS] [LOCATION path] [COMMENT table_comment] [TBLPROPERTIES (key1=val1, key2=val2, ...)] [AS select_statement] ``` The proposal is to make the following clauses order insensitive. ``` [OPTIONS (key1=val1, key2=val2, ...)] [PARTITIONED BY (col_name1, col_name2, ...)] [CLUSTERED BY (col_name3, col_name4, ...) INTO num_buckets BUCKETS] [LOCATION path] [COMMENT table_comment] [TBLPROPERTIES (key1=val1, key2=val2, ...)] ``` The same idea is also applicable to Create Hive Table. ``` CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name [(col_name1[:] col_type1 [COMMENT col_comment1], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name2[:] col_type2 [COMMENT col_comment2], ...)] [ROW FORMAT row_format] [STORED AS file_format] [LOCATION path] [TBLPROPERTIES (key1=val1, key2=val2, ...)] [AS select_statement] ``` The proposal is to make the following clauses order insensitive. ``` [COMMENT table_comment] [PARTITIONED BY (col_name2[:] col_type2 [COMMENT col_comment2], ...)] [ROW FORMAT row_format] [STORED AS file_format] [LOCATION path] [TBLPROPERTIES (key1=val1, key2=val2, ...)] ``` ## How was this patch tested? Added test cases Author: gatorsmile <gatorsmile@gmail.com> Closes #20133 from gatorsmile/createDataSourceTableDDL. (cherry picked from commit 1a87a16) Signed-off-by: gatorsmile <gatorsmile@gmail.com>
What changes were proposed in this pull request?
Currently, our CREATE TABLE syntax require the EXACT order of clauses. It is pretty hard to remember the exact order. Thus, this PR is to make optional clauses order insensitive for
CREATE TABLE
SQL statement.The proposal is to make the following clauses order insensitive.
The same idea is also applicable to Create Hive Table.
The proposal is to make the following clauses order insensitive.
How was this patch tested?
Added test cases