Skip to content

[SPARK-26976][SQL] Forbid reserved keywords as identifiers when ANSI mode is on #23880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

maropu
Copy link
Member

@maropu maropu commented Feb 24, 2019

What changes were proposed in this pull request?

This pr added code to forbid reserved keywords as identifiers when ANSI mode is on.
This is a follow-up of SPARK-26215(#23259).

How was this patch tested?

Added tests in TableIdentifierParserSuite.

@maropu
Copy link
Member Author

maropu commented Feb 24, 2019

The Spark SQL parser uses identifier in a lot of parsing rules, e.g., table name, column name, view name, function name, brabrabra. Since this is an open question about which rule we should check or not, this pr currently checks if tableIdentifier has a reserved keyword only.

@SparkQA
Copy link

SparkQA commented Feb 24, 2019

Test build #102718 has finished for PR 23880 at commit c957252.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Feb 25, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Feb 25, 2019

Test build #102727 has finished for PR 23880 at commit c957252.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Feb 25, 2019

retest this please

val keyword = ctx.getText
if (ctx.ansiReserved() != null) {
throw new ParseException(
s"'$keyword' is reserved and you cannot use this keyword as an identifier.", ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK reserved keywords can't be used as identifiers for all kinds, like table name/alias, column name/alias, etc. Shall we follow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a simpler change is, in the antlr file, remove (ansi)? ansiReservedKeywords from identifiers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we just remove the (ansi)? ansiReservedKeywords from identifier, all the rule using identifier can't accept reserved keywords. Is this expected?
For example, we already current_timestamp/current_date as bulit-in functions in FunctionRegistry, but these keywords are reserved when ansi=true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is the ansi way to get the current timestamp, isn't it?

SELECT CURRENT_TIMESTAMP is valid.
SELECT CURRENT_TIMESTAMP() is invalid.

Copy link
Member Author

@maropu maropu Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. But, if we just remove the rule, the two cases throw a parse error since both colunm name rule and function name rule use identifier;

scala> sql("SET spark.sql.parser.ansi.enabled=false")
scala> sql("SELECT CURRENT_TIMESTAMP").show
+--------------------+                                                          
| current_timestamp()|
+--------------------+
|2019-02-25 16:26:...|
+--------------------+

scala> sql("SELECT CURRENT_TIMESTAMP()").show
+--------------------+                                                          
| current_timestamp()|
+--------------------+
|2019-02-25 16:26:...|
+--------------------+

scala> sql("SET spark.sql.parser.ansi.enabled=true")
scala> sql("SELECT CURRENT_TIMESTAMP").show
org.apache.spark.sql.catalyst.parser.ParseException:
no viable alternative at input 'CURRENT_TIMESTAMP'(line 1, pos 7)

== SQL ==
SELECT CURRENT_TIMESTAMP
-------^^^

scala> sql("SELECT CURRENT_TIMESTAMP()").show
rg.apache.spark.sql.catalyst.parser.ParseException:
no viable alternative at input 'CURRENT_TIMESTAMP'(line 1, pos 7)

== SQL ==
SELECT CURRENT_TIMESTAMP()
-------^^^

To be honest, I'm not 100% sure about what the ANSI reserved exactly means (IIUC all the docs I checked doesn't define what it means clearly...).
For example, both postgresql and mysql reserve current_timestamp, but the behviour is different;

postgres=# SELECT CURRENT_TIMESTAMP;
              now              
-------------------------------
 2019-02-25 16:44:26.320698+09
(1 row)

postgres=# SELECT CURRENT_TIMESTAMP();
ERROR:  syntax error at or near ")"
LINE 1: SELECT CURRENT_TIMESTAMP();
                                 ^
mysql> SELECT CURRENT_TIMESTAMP;
+---------------------+
| CURRENT_TIMESTAMP   |
+---------------------+
| 2019-02-25 16:45:02 |
+---------------------+

mysql> SELECT CURRENT_TIMESTAMP();
+---------------------+
| CURRENT_TIMESTAMP() |
+---------------------+
| 2019-02-25 16:45:04 |
+---------------------+

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good point!

But if don't forbid CURRENT_TIMESTAMP as column name, SELECT CURRENT_TIMESTAMP FROM t is ambiguous.

It looks to me that we should create an entry for CURRENT_TIMESTAMP directly in the antlr file, and return the corresponding function for it in AstBuilder.scala

Copy link
Member Author

@maropu maropu Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I'll try to fix in that way.

Copy link
Member Author

@maropu maropu Mar 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan How about the latest fix? (I'll resolve the conflicts later)

@SparkQA
Copy link

SparkQA commented Feb 25, 2019

Test build #102735 has finished for PR 23880 at commit c957252.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Feb 25, 2019

Test build #102741 has finished for PR 23880 at commit c957252.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu force-pushed the SPARK-26976 branch 2 times, most recently from 2232eac to 38dd79b Compare March 8, 2019 12:12
@SparkQA
Copy link

SparkQA commented Mar 8, 2019

Test build #103211 has finished for PR 23880 at commit c9ea48f.

  • This patch fails Java style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 8, 2019

Test build #103212 has finished for PR 23880 at commit 38dd79b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu force-pushed the SPARK-26976 branch 2 times, most recently from 1bf7625 to e0c8e61 Compare March 8, 2019 14:47
@SparkQA
Copy link

SparkQA commented Mar 8, 2019

Test build #103215 has finished for PR 23880 at commit e0c8e61.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 9, 2019

Test build #103235 has finished for PR 23880 at commit fdcd5fe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -733,9 +738,13 @@ qualifiedName
: identifier ('.' identifier)*
;

columnIdentifier
: identifier
| ansiReservedFunctionName
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this after #24039 merged.

@@ -736,7 +741,6 @@ qualifiedName

identifier
: strictIdentifier
| {ansi}? ansiReserved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only code change I expect. What's the rationale of the other changes? I think select current_date() should not be supported in ansi mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or, we can find some more examples to advocate it. AFAIK presto supports select current_date() although it's not SQL standard. Do you know of any other systems that support it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it seems current_timestamp() is not a topic of the ANSI standard. So, I'll drop it in this pr.

But, some databases support current_timestamp() and this is implementation-specific.
For example, postgresql/oracle support current_timestamp(precision) as follows;

postgres=# select CURRENT_TIMESTAMP;
              now              
-------------------------------
 2019-03-12 20:22:52.065108+09
(1 row)

postgres=# select CURRENT_TIMESTAMP(1);
       timestamptz        
--------------------------
 2019-03-12 20:22:56.2+09
(1 row)

postgres=# select CURRENT_TIMESTAMP();
ERROR:  syntax error at or near ")" at character 26
STATEMENT:  select CURRENT_TIMESTAMP();
ERROR:  syntax error at or near ")"
LINE 1: select CURRENT_TIMESTAMP();
                                 ^

So, it might be worth supporting this function for better portability even when ansi mode enabled (this is future work though...).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the ANSI starndard, we need to support these functions below for datetime, too;

postgres=# select CURRENT_TIME;
       timetz       
--------------------
 20:33:08.179954+09
(1 row)

postgres=# select LOCALTIME;
      time       
-----------------
 20:33:54.281054
(1 row)

postgres=# select LOCALTIMESTAMP;
         timestamp         
---------------------------
 2019-03-12 20:33:57.85737
(1 row)

I'll file a jira later.

@SparkQA
Copy link

SparkQA commented Mar 11, 2019

Test build #103325 has finished for PR 23880 at commit d3367ad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

can you update the PR title? it's not only for table identifiers but all identifiers

@maropu maropu changed the title [SPARK-26976][SQL] Forbid reserved keywords as table identifiers when ANSI mode is on [SPARK-26976][SQL] Forbid reserved keywords as identifiers when ANSI mode is on Mar 12, 2019
@maropu
Copy link
Member Author

maropu commented Mar 12, 2019

ok, done

@SparkQA
Copy link

SparkQA commented Mar 12, 2019

Test build #103380 has finished for PR 23880 at commit 441a12a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Mar 13, 2019

Thanks! Merged to master.

@maropu maropu closed this Mar 13, 2019
maropu added a commit that referenced this pull request Mar 13, 2019
…mode is on

## What changes were proposed in this pull request?
This pr added code to forbid reserved keywords as identifiers when ANSI mode is on.
This is a follow-up of SPARK-26215(#23259).

## How was this patch tested?
Added tests in `TableIdentifierParserSuite`.

Closes #23880 from maropu/SPARK-26976.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants