-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-26976][SQL] Forbid reserved keywords as identifiers when ANSI mode is on #23880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The Spark SQL parser uses |
Test build #102718 has finished for PR 23880 at commit
|
retest this please |
Test build #102727 has finished for PR 23880 at commit
|
retest this please |
val keyword = ctx.getText | ||
if (ctx.ansiReserved() != null) { | ||
throw new ParseException( | ||
s"'$keyword' is reserved and you cannot use this keyword as an identifier.", ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK reserved keywords can't be used as identifiers for all kinds, like table name/alias, column name/alias, etc. Shall we follow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a simpler change is, in the antlr file, remove (ansi)? ansiReservedKeywords
from identifiers
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we just remove the (ansi)? ansiReservedKeywords
from identifier
, all the rule using identifier
can't accept reserved keywords. Is this expected?
For example, we already current_timestamp
/current_date
as bulit-in functions in FunctionRegistry
, but these keywords are reserved when ansi=true.
spark/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
Line 529 in 2d2fb34
functionTable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is the ansi way to get the current timestamp, isn't it?
SELECT CURRENT_TIMESTAMP
is valid.
SELECT CURRENT_TIMESTAMP()
is invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. But, if we just remove the rule, the two cases throw a parse error since both colunm name rule and function name rule use identifier
;
scala> sql("SET spark.sql.parser.ansi.enabled=false")
scala> sql("SELECT CURRENT_TIMESTAMP").show
+--------------------+
| current_timestamp()|
+--------------------+
|2019-02-25 16:26:...|
+--------------------+
scala> sql("SELECT CURRENT_TIMESTAMP()").show
+--------------------+
| current_timestamp()|
+--------------------+
|2019-02-25 16:26:...|
+--------------------+
scala> sql("SET spark.sql.parser.ansi.enabled=true")
scala> sql("SELECT CURRENT_TIMESTAMP").show
org.apache.spark.sql.catalyst.parser.ParseException:
no viable alternative at input 'CURRENT_TIMESTAMP'(line 1, pos 7)
== SQL ==
SELECT CURRENT_TIMESTAMP
-------^^^
scala> sql("SELECT CURRENT_TIMESTAMP()").show
rg.apache.spark.sql.catalyst.parser.ParseException:
no viable alternative at input 'CURRENT_TIMESTAMP'(line 1, pos 7)
== SQL ==
SELECT CURRENT_TIMESTAMP()
-------^^^
To be honest, I'm not 100% sure about what the ANSI reserved exactly means (IIUC all the docs I checked doesn't define what it means clearly...).
For example, both postgresql and mysql reserve current_timestamp
, but the behviour is different;
postgres=# SELECT CURRENT_TIMESTAMP;
now
-------------------------------
2019-02-25 16:44:26.320698+09
(1 row)
postgres=# SELECT CURRENT_TIMESTAMP();
ERROR: syntax error at or near ")"
LINE 1: SELECT CURRENT_TIMESTAMP();
^
mysql> SELECT CURRENT_TIMESTAMP;
+---------------------+
| CURRENT_TIMESTAMP |
+---------------------+
| 2019-02-25 16:45:02 |
+---------------------+
mysql> SELECT CURRENT_TIMESTAMP();
+---------------------+
| CURRENT_TIMESTAMP() |
+---------------------+
| 2019-02-25 16:45:04 |
+---------------------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good point!
But if don't forbid CURRENT_TIMESTAMP as column name, SELECT CURRENT_TIMESTAMP FROM t
is ambiguous.
It looks to me that we should create an entry for CURRENT_TIMESTAMP
directly in the antlr file, and return the corresponding function for it in AstBuilder.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I'll try to fix in that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan How about the latest fix? (I'll resolve the conflicts later)
Test build #102735 has finished for PR 23880 at commit
|
retest this please |
Test build #102741 has finished for PR 23880 at commit
|
2232eac
to
38dd79b
Compare
Test build #103211 has finished for PR 23880 at commit
|
Test build #103212 has finished for PR 23880 at commit
|
1bf7625
to
e0c8e61
Compare
Test build #103215 has finished for PR 23880 at commit
|
Test build #103235 has finished for PR 23880 at commit
|
@@ -733,9 +738,13 @@ qualifiedName | |||
: identifier ('.' identifier)* | |||
; | |||
|
|||
columnIdentifier | |||
: identifier | |||
| ansiReservedFunctionName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove this after #24039 merged.
@@ -736,7 +741,6 @@ qualifiedName | |||
|
|||
identifier | |||
: strictIdentifier | |||
| {ansi}? ansiReserved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only code change I expect. What's the rationale of the other changes? I think select current_date()
should not be supported in ansi mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or, we can find some more examples to advocate it. AFAIK presto supports select current_date()
although it's not SQL standard. Do you know of any other systems that support it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it seems current_timestamp()
is not a topic of the ANSI standard. So, I'll drop it in this pr.
But, some databases support current_timestamp()
and this is implementation-specific.
For example, postgresql/oracle support current_timestamp(precision)
as follows;
postgres=# select CURRENT_TIMESTAMP;
now
-------------------------------
2019-03-12 20:22:52.065108+09
(1 row)
postgres=# select CURRENT_TIMESTAMP(1);
timestamptz
--------------------------
2019-03-12 20:22:56.2+09
(1 row)
postgres=# select CURRENT_TIMESTAMP();
ERROR: syntax error at or near ")" at character 26
STATEMENT: select CURRENT_TIMESTAMP();
ERROR: syntax error at or near ")"
LINE 1: select CURRENT_TIMESTAMP();
^
So, it might be worth supporting this function for better portability even when ansi mode enabled (this is future work though...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the ANSI starndard, we need to support these functions below for datetime, too;
postgres=# select CURRENT_TIME;
timetz
--------------------
20:33:08.179954+09
(1 row)
postgres=# select LOCALTIME;
time
-----------------
20:33:54.281054
(1 row)
postgres=# select LOCALTIMESTAMP;
timestamp
---------------------------
2019-03-12 20:33:57.85737
(1 row)
I'll file a jira later.
Test build #103325 has finished for PR 23880 at commit
|
can you update the PR title? it's not only for table identifiers but all identifiers |
ok, done |
Test build #103380 has finished for PR 23880 at commit
|
Thanks! Merged to master. |
…mode is on ## What changes were proposed in this pull request? This pr added code to forbid reserved keywords as identifiers when ANSI mode is on. This is a follow-up of SPARK-26215(#23259). ## How was this patch tested? Added tests in `TableIdentifierParserSuite`. Closes #23880 from maropu/SPARK-26976. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
What changes were proposed in this pull request?
This pr added code to forbid reserved keywords as identifiers when ANSI mode is on.
This is a follow-up of SPARK-26215(#23259).
How was this patch tested?
Added tests in
TableIdentifierParserSuite
.