Skip to content

[SPARK-31102][SQL] Spark-sql fails to parse when contains comment. #27920

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1814,7 +1814,7 @@ fragment LETTER
;

SIMPLE_COMMENT
: '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN)
: '--' ('\\\n' | ~[\r\n])* '\r'? '\n'? -> channel(HIDDEN)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, one more comment; could you add tests in sql-tests/inputs/comments.sql, too?

;

BRACKETED_COMMENT
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,11 +55,16 @@ class PlanParserSuite extends AnalysisTest {
With(plan, ctes)
}

test("single comment") {
test("single comment case one") {
val plan = table("a").select(star())
assertEqual("-- single comment\nSELECT * FROM a", plan)
}

test("single comment case two") {
val plan = table("a").select(star())
assertEqual("-- single comment\\\nwith line continuity\nSELECT * FROM a", plan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to interpret \\\n? An escaped slash and a new-line symbol?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats correct. Inline strings need to be escaped.

}

test("bracketed comment case one") {
val plan = table("a").select(star())
assertEqual(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -513,7 +513,6 @@ private[hive] class SparkSQLCLIDriver extends CliDriver with Logging {
var insideComment = false
var escape = false
var beginIndex = 0
var endIndex = line.length
val ret = new JArrayList[String]

for (index <- 0 until line.length) {
Expand All @@ -539,8 +538,6 @@ private[hive] class SparkSQLCLIDriver extends CliDriver with Logging {
} else if (hasNext && line.charAt(index + 1) == '-') {
// ignore quotes and ;
insideComment = true
// ignore eol
endIndex = index
}
} else if (line.charAt(index) == ';') {
if (insideSingleQuote || insideDoubleQuote || insideComment) {
Expand All @@ -550,8 +547,11 @@ private[hive] class SparkSQLCLIDriver extends CliDriver with Logging {
ret.add(line.substring(beginIndex, index))
beginIndex = index + 1
}
} else {
// nothing to do
} else if (line.charAt(index) == '\n') {
// with a new line the inline comment should end.
if (!escape) {
insideComment = false
}
}
// set the escape
if (escape) {
Expand All @@ -560,7 +560,7 @@ private[hive] class SparkSQLCLIDriver extends CliDriver with Logging {
escape = true
}
}
ret.add(line.substring(beginIndex, endIndex))
ret.add(line.substring(beginIndex))
ret
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -460,18 +460,20 @@ class CliSuite extends SparkFunSuite with BeforeAndAfterAll with BeforeAndAfterE
)
}

test("SPARK-30049 Should not complain for quotes in commented with multi-lines") {
test("SPARK-31102 spark-sql fails to parse when contains comment") {
runCliWithin(1.minute)(
"""SELECT concat('test', 'comment') -- someone's comment here \\
| comment continues here with single ' quote \\
| extra ' \\
|;""".stripMargin -> "testcomment"
"""SELECT concat('test', 'comment'),
| -- someone's comment here
| 2;""".stripMargin -> "testcomment"
)
}

test("SPARK-30049 Should not complain for quotes in commented with multi-lines") {
runCliWithin(1.minute)(
"""SELECT concat('test', 'comment') -- someone's comment here \\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so double-slash doesn't work any more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a previous mistake since using Scala multi-line strings it auto escape chars.

| comment continues here with single ' quote \\
| extra ' \\
| ;""".stripMargin -> "testcomment"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you did you remove the existing tests instead of adding new tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @maropu !
The SQL parser does not recognize line-continuity per se.

scala> sql(s"""SELECT concat('test', 'comment') -- someone's comment here \\\ncomment continues here with single ' quote \\\nextra ' \\""")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'continues' expecting {<EOF>, ',', 'CLUSTER', 'DISTRIBUTE', 'EXCEPT', 'FROM', 'GROUP', 'HAVING', 'INTERSECT', 'LATERAL', 'LIMIT', 'ORDER', 'MINUS', 'SORT', 'UNION', 'WHERE', 'WINDOW', '-'}(line 2, pos 8)

== SQL ==
SELECT concat('test', 'comment') -- someone's comment here \
comment continues here with single ' quote \
--------^^^
extra ' \

It works just fine for inline comments included backslash:

scala> sql(s"""SELECT concat('test', 'comment') -- someone's comment here \\\n,2""") show
+---------------------+---+
|concat(test, comment)|  2|
+---------------------+---+
|          testcomment|  2|
+---------------------+---+

But does not work outside the inline comment(the backslash):

 sql(s"""SELECT concat('test', 'comment') -- someone's comment here \n,2\\\n""")
org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input '\' expecting <EOF>(line 2, pos 2)

== SQL ==
SELECT concat('test', 'comment') -- someone's comment here
,2\
--^^^

Previously worked fine because of this very bug, the insideComment flag ignored everything until the end of the string. But the spark SQL parser does not recognize the backslashes. Line-continuity can be added to the CLI. But I think that feature should be added directly to the SQL parser to avoid confusion.

Let me know your thoughts 👍

Copy link
Member

@maropu maropu Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can, the fix in SqlBase.g4 (SIMPLE_COMENT) looks fine to me and I think the queries above should work in Spark SQL: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4#L1811 Could you try?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu I have added the fix. Let me know what you think :)

"""SELECT concat('test', 'comment') -- someone's comment here \
| comment continues here with single ' quote \
| extra ' \
|;""".stripMargin -> "testcomment"
)
}
}