-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-31102][SQL] Spark-sql fails to parse when contains comment. #27920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ok to test |
Test build #119825 has finished for PR 27920 at commit
|
"""SELECT concat('test', 'comment') -- someone's comment here \\ | ||
| comment continues here with single ' quote \\ | ||
| extra ' \\ | ||
| ;""".stripMargin -> "testcomment" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why you did you remove the existing tests instead of adding new tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @maropu !
The SQL parser does not recognize line-continuity per se.
scala> sql(s"""SELECT concat('test', 'comment') -- someone's comment here \\\ncomment continues here with single ' quote \\\nextra ' \\""")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'continues' expecting {<EOF>, ',', 'CLUSTER', 'DISTRIBUTE', 'EXCEPT', 'FROM', 'GROUP', 'HAVING', 'INTERSECT', 'LATERAL', 'LIMIT', 'ORDER', 'MINUS', 'SORT', 'UNION', 'WHERE', 'WINDOW', '-'}(line 2, pos 8)
== SQL ==
SELECT concat('test', 'comment') -- someone's comment here \
comment continues here with single ' quote \
--------^^^
extra ' \
It works just fine for inline comments included backslash:
scala> sql(s"""SELECT concat('test', 'comment') -- someone's comment here \\\n,2""") show
+---------------------+---+
|concat(test, comment)| 2|
+---------------------+---+
| testcomment| 2|
+---------------------+---+
But does not work outside the inline comment(the backslash):
sql(s"""SELECT concat('test', 'comment') -- someone's comment here \n,2\\\n""")
org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input '\' expecting <EOF>(line 2, pos 2)
== SQL ==
SELECT concat('test', 'comment') -- someone's comment here
,2\
--^^^
Previously worked fine because of this very bug, the insideComment flag ignored everything until the end of the string. But the spark SQL parser does not recognize the backslashes. Line-continuity can be added to the CLI. But I think that feature should be added directly to the SQL parser to avoid confusion.
Let me know your thoughts 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can, the fix in SqlBase.g4
(SIMPLE_COMENT
) looks fine to me and I think the queries above should work in Spark SQL: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4#L1811 Could you try?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu I have added the fix. Let me know what you think :)
cc: @wangyum |
@javierivanov Any update? |
@maropu I am extremly sorry, I will commit soon :) |
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
Outdated
Show resolved
Hide resolved
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
Outdated
Show resolved
Hide resolved
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
Show resolved
Hide resolved
Test build #121162 has finished for PR 27920 at commit
|
retest this please |
Test build #121181 has finished for PR 27920 at commit
|
Test build #121211 has finished for PR 27920 at commit
|
retest this please |
Could you check this? @wangyum |
Test build #121243 has finished for PR 27920 at commit
|
retest this please |
Test build #121260 has finished for PR 27920 at commit
|
@@ -1814,7 +1814,7 @@ fragment LETTER | |||
; | |||
|
|||
SIMPLE_COMMENT | |||
: '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN) | |||
: '--' ('\\\n' | ~[\r\n])* '\r'? '\n'? -> channel(HIDDEN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, one more comment; could you add tests in sql-tests/inputs/comments.sql
, too?
retest this please |
cc: @cloud-fan |
runCliWithin(1.minute)( | ||
"""SELECT concat('test', 'comment') -- someone's comment here \\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so double-slash doesn't work any more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a previous mistake since using Scala multi-line strings it auto escape chars.
val plan = table("a").select(star()) | ||
assertEqual("-- single comment\nSELECT * FROM a", plan) | ||
} | ||
|
||
test("single comment case two") { | ||
val plan = table("a").select(star()) | ||
assertEqual("-- single comment\\\nwith line continuity\nSELECT * FROM a", plan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how to interpret \\\n
? An escaped slash and a new-line symbol?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats correct. Inline strings need to be escaped.
Test build #122383 has finished for PR 27920 at commit
|
thanks, merging to master! |
it conflicts with 3.0, @javierivanov can you open a new PR for 3.0? Thanks! |
@javierivanov kindly ping: #27920 (comment) |
This PR introduces a change to false for the insideComment flag on a newline. Fixing the issue introduced by SPARK-30049. Previously on SPARK-30049 a comment containing an unclosed quote produced the following issue: ``` spark-sql> SELECT 1 -- someone's comment here > ; Error in query: extraneous input ';' expecting <EOF>(line 2, pos 0) == SQL == SELECT 1 -- someone's comment here ; ^^^ ``` This was caused because there was no flag for comment sections inside the splitSemiColon method to ignore quotes. SPARK-30049 added that flag and fixed the issue, but introduced the follwoing problem: ``` spark-sql> select > 1, > -- two > 2; Error in query: mismatched input '<EOF>' expecting {'(', 'ADD', 'AFTER', 'ALL', 'ALTER', ...}(line 3, pos 2) == SQL == select 1, --^^^ ``` This issue is generated by a missing turn-off for the insideComment flag with a newline. No - For previous tests using line-continuity(`\`) it was added a line-continuity rule in the SqlBase.g4 file to add the functionality to the SQL context. - A new test for inline comments was added. Closes apache#27920 from javierivanov/SPARK-31102. Authored-by: Javier Fuentes <j.fuentes.m@icloud.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This PR introduces a change to false for the insideComment flag on a newline. Fixing the issue introduced by SPARK-30049. Backport to 3.0 from #27920 Previously on SPARK-30049 a comment containing an unclosed quote produced the following issue: ``` spark-sql> SELECT 1 -- someone's comment here > ; Error in query: extraneous input ';' expecting <EOF>(line 2, pos 0) == SQL == SELECT 1 -- someone's comment here ; ^^^ ``` This was caused because there was no flag for comment sections inside the splitSemiColon method to ignore quotes. SPARK-30049 added that flag and fixed the issue, but introduced the follwoing problem: ``` spark-sql> select > 1, > -- two > 2; Error in query: mismatched input '<EOF>' expecting {'(', 'ADD', 'AFTER', 'ALL', 'ALTER', ...}(line 3, pos 2) == SQL == select 1, --^^^ ``` This issue is generated by a missing turn-off for the insideComment flag with a newline. No - For previous tests using line-continuity(`\`) it was added a line-continuity rule in the SqlBase.g4 file to add the functionality to the SQL context. - A new test for inline comments was added. Closes #27920 from javierivanov/SPARK-31102. Authored-by: Javier Fuentes <j.fuentes.micloud.com> Signed-off-by: Wenchen Fan <wenchendatabricks.com> ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #28565 from javierivanov/SPARK-3.0-31102. Authored-by: Javier Fuentes <j.fuentes.m@icloud.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
This PR introduces a change to false for the insideComment flag on a newline. Fixing the issue introduced by SPARK-30049. Previously on SPARK-30049 a comment containing an unclosed quote produced the following issue: ``` spark-sql> SELECT 1 -- someone's comment here > ; Error in query: extraneous input ';' expecting <EOF>(line 2, pos 0) == SQL == SELECT 1 -- someone's comment here ; ^^^ ``` This was caused because there was no flag for comment sections inside the splitSemiColon method to ignore quotes. SPARK-30049 added that flag and fixed the issue, but introduced the follwoing problem: ``` spark-sql> select > 1, > -- two > 2; Error in query: mismatched input '<EOF>' expecting {'(', 'ADD', 'AFTER', 'ALL', 'ALTER', ...}(line 3, pos 2) == SQL == select 1, --^^^ ``` This issue is generated by a missing turn-off for the insideComment flag with a newline. No - For previous tests using line-continuity(`\`) it was added a line-continuity rule in the SqlBase.g4 file to add the functionality to the SQL context. - A new test for inline comments was added. Closes apache#27920 from javierivanov/SPARK-31102. Authored-by: Javier Fuentes <j.fuentes.m@icloud.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This PR introduces a change to false for the insideComment flag on a newline. Fixing the issue introduced by SPARK-30049.
Why are the changes needed?
Previously on SPARK-30049 a comment containing an unclosed quote produced the following issue:
This was caused because there was no flag for comment sections inside the splitSemiColon method to ignore quotes. SPARK-30049 added that flag and fixed the issue, but introduced the follwoing problem:
This issue is generated by a missing turn-off for the insideComment flag with a newline.
Does this PR introduce any user-facing change?
No
How was this patch tested?
\
) it was added a line-continuity rule in the SqlBase.g4 file to add the functionality to the SQL context.