[SPARK-26976][SQL] Forbid reserved keywords as identifiers when ANSI mode is on #23880

maropu · 2019-02-24T04:01:31Z

What changes were proposed in this pull request?

This pr added code to forbid reserved keywords as identifiers when ANSI mode is on.
This is a follow-up of SPARK-26215(#23259).

How was this patch tested?

Added tests in TableIdentifierParserSuite.

maropu · 2019-02-24T04:11:53Z

The Spark SQL parser uses identifier in a lot of parsing rules, e.g., table name, column name, view name, function name, brabrabra. Since this is an open question about which rule we should check or not, this pr currently checks if tableIdentifier has a reserved keyword only.

SparkQA · 2019-02-24T08:05:01Z

Test build #102718 has finished for PR 23880 at commit c957252.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-02-25T00:43:32Z

retest this please

SparkQA · 2019-02-25T03:54:38Z

Test build #102727 has finished for PR 23880 at commit c957252.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-02-25T04:20:51Z

retest this please

cloud-fan · 2019-02-25T06:19:07Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

+    val keyword = ctx.getText
+    if (ctx.ansiReserved() != null) {
+      throw new ParseException(
+        s"'$keyword' is reserved and you cannot use this keyword as an identifier.", ctx)


AFAIK reserved keywords can't be used as identifiers for all kinds, like table name/alias, column name/alias, etc. Shall we follow?

maybe a simpler change is, in the antlr file, remove (ansi)? ansiReservedKeywords from identifiers.

If we just remove the (ansi)? ansiReservedKeywords from identifier, all the rule using identifier can't accept reserved keywords. Is this expected?
For example, we already current_timestamp/current_date as bulit-in functions in FunctionRegistry, but these keywords are reserved when ansi=true.

spark/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

Line 529 in 2d2fb34

functionTable

But this is the ansi way to get the current timestamp, isn't it?

SELECT CURRENT_TIMESTAMP is valid.
SELECT CURRENT_TIMESTAMP() is invalid.

Oh, I see. But, if we just remove the rule, the two cases throw a parse error since both colunm name rule and function name rule use identifier;

scala> sql("SET spark.sql.parser.ansi.enabled=false") scala> sql("SELECT CURRENT_TIMESTAMP").show +--------------------+ | current_timestamp()| +--------------------+ |2019-02-25 16:26:...| +--------------------+ scala> sql("SELECT CURRENT_TIMESTAMP()").show +--------------------+ | current_timestamp()| +--------------------+ |2019-02-25 16:26:...| +--------------------+ scala> sql("SET spark.sql.parser.ansi.enabled=true") scala> sql("SELECT CURRENT_TIMESTAMP").show org.apache.spark.sql.catalyst.parser.ParseException: no viable alternative at input 'CURRENT_TIMESTAMP'(line 1, pos 7) == SQL == SELECT CURRENT_TIMESTAMP -------^^^ scala> sql("SELECT CURRENT_TIMESTAMP()").show rg.apache.spark.sql.catalyst.parser.ParseException: no viable alternative at input 'CURRENT_TIMESTAMP'(line 1, pos 7) == SQL == SELECT CURRENT_TIMESTAMP() -------^^^

To be honest, I'm not 100% sure about what the ANSI reserved exactly means (IIUC all the docs I checked doesn't define what it means clearly...).
For example, both postgresql and mysql reserve current_timestamp, but the behviour is different;

postgres=# SELECT CURRENT_TIMESTAMP; now ------------------------------- 2019-02-25 16:44:26.320698+09 (1 row) postgres=# SELECT CURRENT_TIMESTAMP(); ERROR: syntax error at or near ")" LINE 1: SELECT CURRENT_TIMESTAMP(); ^ mysql> SELECT CURRENT_TIMESTAMP; +---------------------+ | CURRENT_TIMESTAMP | +---------------------+ | 2019-02-25 16:45:02 | +---------------------+ mysql> SELECT CURRENT_TIMESTAMP(); +---------------------+ | CURRENT_TIMESTAMP() | +---------------------+ | 2019-02-25 16:45:04 | +---------------------+

ah good point!

But if don't forbid CURRENT_TIMESTAMP as column name, SELECT CURRENT_TIMESTAMP FROM t is ambiguous.

It looks to me that we should create an entry for CURRENT_TIMESTAMP directly in the antlr file, and return the corresponding function for it in AstBuilder.scala

Thanks for the suggestion! I'll try to fix in that way.

@cloud-fan How about the latest fix? (I'll resolve the conflicts later)

SparkQA · 2019-02-25T08:05:02Z

Test build #102735 has finished for PR 23880 at commit c957252.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

dilipbiswal · 2019-02-25T08:15:07Z

retest this please

SparkQA · 2019-02-25T12:31:27Z

Test build #102741 has finished for PR 23880 at commit c957252.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-03-08T12:29:08Z

Test build #103211 has finished for PR 23880 at commit c9ea48f.

This patch fails Java style tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2019-03-08T14:14:18Z

Test build #103212 has finished for PR 23880 at commit 38dd79b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-03-08T18:57:22Z

Test build #103215 has finished for PR 23880 at commit e0c8e61.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-03-09T04:00:22Z

Test build #103235 has finished for PR 23880 at commit fdcd5fe.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-03-09T23:59:03Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

@@ -733,9 +738,13 @@ qualifiedName
    : identifier ('.' identifier)*
    ;

+columnIdentifier
+    : identifier
+    | ansiReservedFunctionName


I'll remove this after #24039 merged.

cloud-fan · 2019-03-11T16:01:32Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

@@ -736,7 +741,6 @@ qualifiedName

 identifier
    : strictIdentifier
-    | {ansi}? ansiReserved


This is the only code change I expect. What's the rationale of the other changes? I think select current_date() should not be supported in ansi mode.

or, we can find some more examples to advocate it. AFAIK presto supports select current_date() although it's not SQL standard. Do you know of any other systems that support it?

yeah, it seems current_timestamp() is not a topic of the ANSI standard. So, I'll drop it in this pr.

But, some databases support current_timestamp() and this is implementation-specific.
For example, postgresql/oracle support current_timestamp(precision) as follows;

postgres=# select CURRENT_TIMESTAMP; now ------------------------------- 2019-03-12 20:22:52.065108+09 (1 row) postgres=# select CURRENT_TIMESTAMP(1); timestamptz -------------------------- 2019-03-12 20:22:56.2+09 (1 row) postgres=# select CURRENT_TIMESTAMP(); ERROR: syntax error at or near ")" at character 26 STATEMENT: select CURRENT_TIMESTAMP(); ERROR: syntax error at or near ")" LINE 1: select CURRENT_TIMESTAMP(); ^

So, it might be worth supporting this function for better portability even when ansi mode enabled (this is future work though...).

As for the ANSI starndard, we need to support these functions below for datetime, too;

postgres=# select CURRENT_TIME; timetz -------------------- 20:33:08.179954+09 (1 row) postgres=# select LOCALTIME; time ----------------- 20:33:54.281054 (1 row) postgres=# select LOCALTIMESTAMP; timestamp --------------------------- 2019-03-12 20:33:57.85737 (1 row)

I'll file a jira later.

SparkQA · 2019-03-11T17:35:54Z

Test build #103325 has finished for PR 23880 at commit d3367ad.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-03-12T12:48:49Z

can you update the PR title? it's not only for table identifiers but all identifiers

maropu · 2019-03-12T13:43:09Z

ok, done

SparkQA · 2019-03-12T22:19:10Z

Test build #103380 has finished for PR 23880 at commit 441a12a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-03-13T02:21:32Z

Thanks! Merged to master.

…mode is on ## What changes were proposed in this pull request? This pr added code to forbid reserved keywords as identifiers when ANSI mode is on. This is a follow-up of SPARK-26215(#23259). ## How was this patch tested? Added tests in `TableIdentifierParserSuite`. Closes #23880 from maropu/SPARK-26976. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>

cloud-fan reviewed Feb 25, 2019

View reviewed changes

maropu force-pushed the SPARK-26976 branch 2 times, most recently from 2232eac to 38dd79b Compare March 8, 2019 12:12

maropu force-pushed the SPARK-26976 branch 2 times, most recently from 1bf7625 to e0c8e61 Compare March 8, 2019 14:47

maropu force-pushed the SPARK-26976 branch from e0c8e61 to fdcd5fe Compare March 8, 2019 23:31

maropu commented Mar 9, 2019

View reviewed changes

maropu added 2 commits March 11, 2019 22:00

Fix

6242e0d

Fix

d3367ad

maropu force-pushed the SPARK-26976 branch from fdcd5fe to d3367ad Compare March 11, 2019 13:07

cloud-fan reviewed Mar 11, 2019

View reviewed changes

Fix

441a12a

cloud-fan approved these changes Mar 12, 2019

View reviewed changes

maropu changed the title ~~[SPARK-26976][SQL] Forbid reserved keywords as table identifiers when ANSI mode is on~~ [SPARK-26976][SQL] Forbid reserved keywords as identifiers when ANSI mode is on Mar 12, 2019

maropu closed this Mar 13, 2019

[SPARK-26976][SQL] Forbid reserved keywords as identifiers when ANSI mode is on #23880

[SPARK-26976][SQL] Forbid reserved keywords as identifiers when ANSI mode is on #23880

Uh oh!

Conversation

maropu commented Feb 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

maropu commented Feb 24, 2019

Uh oh!

SparkQA commented Feb 24, 2019

Uh oh!

maropu commented Feb 25, 2019

Uh oh!

SparkQA commented Feb 25, 2019

Uh oh!

maropu commented Feb 25, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu Feb 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu Feb 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu Mar 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 25, 2019

Uh oh!

dilipbiswal commented Feb 25, 2019

Uh oh!

SparkQA commented Feb 25, 2019

Uh oh!

SparkQA commented Mar 8, 2019

Uh oh!

SparkQA commented Mar 8, 2019

Uh oh!

SparkQA commented Mar 8, 2019

Uh oh!

SparkQA commented Mar 9, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 11, 2019

Uh oh!

cloud-fan commented Mar 12, 2019

Uh oh!

maropu commented Mar 12, 2019

Uh oh!

SparkQA commented Mar 12, 2019

Uh oh!

maropu commented Mar 13, 2019

Uh oh!

Uh oh!

maropu commented Feb 24, 2019 •

edited

Loading

maropu Feb 25, 2019 •

edited

Loading

maropu Feb 26, 2019 •

edited

Loading

maropu Mar 8, 2019 •

edited

Loading