Skip to content

[SPARK-27209][SQL] Split parsing of SELECT and INSERT into two top-level rules in the grammar file. #24150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,8 @@ singleTableSchema

statement
: query #statementDefault
| insertStatement #insertStatementDefault
| multiSelectStatement #multiSelectStatementDefault
| USE db=identifier #use
| CREATE database (IF NOT EXISTS)? identifier
(COMMENT comment=STRING)? locationSpec?
Expand Down Expand Up @@ -358,9 +360,14 @@ resource
: identifier STRING
;

insertStatement
: (ctes)? insertInto queryTerm queryOrganization #singleInsertQuery
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move (ctes)? to the line 80 like | (ctes)? insertStatement ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu then i may have to implement/override another visit method to apply the cte to an insert statement, okay ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the duplicate patten?

queryToDesc
    : queryTerm queryOrganization
    ;

Copy link
Contributor Author

@dilipbiswal dilipbiswal Mar 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu Thats right. After this PR is in, i will have a follow-up to allow CTEs for DESCRIBE QUERY (currently it is a limitation) and that would remove this pattern all together (i hope :-) ).

| (ctes)? fromClause multiInsertQueryBody+ #multiInsertQuery
Copy link
Contributor

@cloud-fan cloud-fan Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we change the multiInsertQueryBody and make the insertInto non-optional?

Copy link
Contributor Author

@dilipbiswal dilipbiswal Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan did you mean make insertInto non-optional in multiInsertQueryBody ? If so, i did try it and we need it to be optional to parse statement referenced in test
Also , we we can have a mixture of insert and selects in the multiInsertQueryBody. Thats why we have it under a "multi insert" rule where insert is optional. Not sure, but i think multiStatementBody may be a better name for the rule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan If you don't mind, could you please paste the block or line here ? When i click on that link, it shows me pretty much all the changes so i am not sure which one i should focus on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dilipbiswal . He means line 365. And, that's the same question with the below comment, https://github.com/apache/spark/pull/24150/files#r267283726.

| fromClause querySpecification queryOrganization                           #queryWithFrom

Line 365 is required for FROM ... SELECT ... statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun Thanks for helping out. When i clicked.. it took me to AstBuilder ... so i was confused.

@cloud-fan So we need the rule to parse statement i mentioned earlier.

from (from test select transform(....) using .. as ...) T select ...

It is parsed sort of recursively in the queryNoWith rule .. calling queryNoWith from relationPrimary:aliasedQuery rule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fromClause multiInsertQueryBody+ -> fromClause (insertInto? querySpecification queryOrganization)+ -> fromClause (querySpecification queryOrganization)+

and in queryNoWith, we have fromClause querySpecification queryOrganization. Seems we should change it to fromClause (querySpecification queryOrganization)+

Copy link
Contributor Author

@dilipbiswal dilipbiswal Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Currently after this change, fromClause and multiple querySpecification queryOrganization will be handled by 2nd alternative of the insertStatement. I know that this may look confusing. In this pass, i wanted to handle single queries in one rule which is query. This would help us catch quite a few errors i.e when a user could plugin an insert in any part of the query. The rest is currently handled by insertStatement which includes multi insert, multi select etc. Please let me know what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan instead of name insertStatement, should we change to insertOrMultiQueries to clear the confusion ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think why we keep insertInto as optional in multiInsertQueryBody after this change, if we want to split select and insert to individual rule.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot write it like this?


insertStatement
    : ...
    | (ctes)? fromClause insertInto someName (insertInto? someName)      #multiInsertQuery
    ;

someName
    : querySpecification queryOrganization
    ;

;

queryNoWith
: insertInto? queryTerm queryOrganization #singleInsertQuery
| fromClause multiInsertQueryBody+ #multiInsertQuery
: queryTerm queryOrganization #noWithQuery
Copy link
Member

@dongjoon-hyun dongjoon-hyun Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, queryNoWith has these two patterns. And both don't have With. Can we revise this name #noWithQuery more narrowly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun would you have any suggestion ? I was thinking selectQuery but this matches select, table query, inline table etc. So a little confused :-)

| fromClause selectStatement #queryWithFrom
;

queryOrganization
Expand All @@ -373,9 +380,15 @@ queryOrganization
;

multiInsertQueryBody
: insertInto?
querySpecification
queryOrganization
: insertInto selectStatement
;

multiSelectStatement
: (ctes)? fromClause selectStatement+ #multiSelect
;

selectStatement
: querySpecification queryOrganization
;

queryTerm
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -112,21 +112,36 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
val query = plan(ctx.queryNoWith)

// Apply CTEs
query.optional(ctx.ctes) {
val ctes = ctx.ctes.namedQuery.asScala.map { nCtx =>
val namedQuery = visitNamedQuery(nCtx)
(namedQuery.alias, namedQuery)
}
// Check for duplicate names.
checkDuplicateKeys(ctes, ctx)
With(query, ctes)
query.optionalMap(ctx.ctes)(withCTE)
}

private def withCTE(ctx: CtesContext, plan: LogicalPlan): LogicalPlan = {
val ctes = ctx.namedQuery.asScala.map { nCtx =>
val namedQuery = visitNamedQuery(nCtx)
(namedQuery.alias, namedQuery)
}
// Check for duplicate names.
checkDuplicateKeys(ctes, ctx)
With(plan, ctes)
}

override def visitQueryToDesc(ctx: QueryToDescContext): LogicalPlan = withOrigin(ctx) {
plan(ctx.queryTerm).optionalMap(ctx.queryOrganization)(withQueryResultClauses)
}

override def visitQueryWithFrom(ctx: QueryWithFromContext): LogicalPlan = withOrigin(ctx) {
val from = visitFromClause(ctx.fromClause)
validate(ctx.selectStatement.querySpecification.fromClause == null,
"Individual select statement can not have FROM cause as its already specified in the" +
" outer query block", ctx)
withQuerySpecification(ctx.selectStatement.querySpecification, from).
optionalMap(ctx.selectStatement.queryOrganization)(withQueryResultClauses)
}

override def visitNoWithQuery(ctx: NoWithQueryContext): LogicalPlan = withOrigin(ctx) {
plan(ctx.queryTerm).optionalMap(ctx.queryOrganization)(withQueryResultClauses)
}

/**
* Create a named logical plan.
*
Expand Down Expand Up @@ -156,36 +171,60 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
val from = visitFromClause(ctx.fromClause)

// Build the insert clauses.
val inserts = ctx.multiInsertQueryBody.asScala.map {
val inserts = ctx.multiInsertQueryBody().asScala.map {
body =>
validate(body.querySpecification.fromClause == null,
validate(body.selectStatement.querySpecification.fromClause == null,
"Multi-Insert queries cannot have a FROM clause in their individual SELECT statements",
body)

withInsertInto(body.insertInto,
withQuerySpecification(body.selectStatement.querySpecification, from).
// Add organization statements.
optionalMap(body.selectStatement.queryOrganization)(withQueryResultClauses))
}

// If there are multiple INSERTS just UNION them together into one query.
val insertPlan = inserts match {
case Seq(query) => query
case queries => Union(queries)
}
// Apply CTEs
insertPlan.optionalMap(ctx.ctes)(withCTE)
}

override def visitMultiSelect(ctx: MultiSelectContext): LogicalPlan = withOrigin(ctx) {
val from = visitFromClause(ctx.fromClause)

// Build the insert clauses.
val selects = ctx.selectStatement.asScala.map {
body =>
validate(body.querySpecification.fromClause == null,
"Multi-select queries cannot have a FROM clause in their individual SELECT statements",
body)

withQuerySpecification(body.querySpecification, from).
// Add organization statements.
optionalMap(body.queryOrganization)(withQueryResultClauses).
// Add insert.
optionalMap(body.insertInto())(withInsertInto)
optionalMap(body.queryOrganization)(withQueryResultClauses)
}

// If there are multiple INSERTS just UNION them together into one query.
inserts match {
val selectUnionPlan = selects match {
case Seq(query) => query
case queries => Union(queries)
}
// Apply CTEs
selectUnionPlan.optionalMap(ctx.ctes)(withCTE)
}

/**
* Create a logical plan for a regular (single-insert) query.
*/
override def visitSingleInsertQuery(
ctx: SingleInsertQueryContext): LogicalPlan = withOrigin(ctx) {
plan(ctx.queryTerm).
// Add organization statements.
optionalMap(ctx.queryOrganization)(withQueryResultClauses).
// Add insert.
optionalMap(ctx.insertInto())(withInsertInto)
val insertPlan = withInsertInto(ctx.insertInto(),
plan(ctx.queryTerm).optionalMap(ctx.queryOrganization)(withQueryResultClauses))
// Apply CTEs
insertPlan.optionalMap(ctx.ctes)(withCTE)
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,11 @@ class PlanParserSuite extends AnalysisTest {
table("a").select(star()).union(table("a").where('s < 10).select(star())))
intercept(
"from a select * select * from x where a.s < 10",
"Multi-Insert queries cannot have a FROM clause in their individual SELECT statements")
"Multi-select queries cannot have a FROM clause in their individual SELECT statements")
intercept(
"from a select * from b",
"Individual select statement can not have FROM cause as its already specified in " +
"the outer query block")
assertEqual(
"from a insert into tbl1 select * insert into tbl2 select * where s < 10",
table("a").select(star()).insertInto("tbl1").union(
Expand Down Expand Up @@ -754,4 +758,47 @@ class PlanParserSuite extends AnalysisTest {
assertEqual(query2, Distinct(a.union(b)).except(c.intersect(d, isAll = true), isAll = true))
}
}

test("create/alter view as insert into table") {
val m1 = intercept[ParseException] {
parsePlan("CREATE VIEW testView AS INSERT INTO jt VALUES(1, 1)")
}.getMessage
assert(m1.contains("mismatched input 'INSERT' expecting"))
// Multi insert query
val m2 = intercept[ParseException] {
parsePlan(
"""
|CREATE VIEW testView AS FROM jt
|INSERT INTO tbl1 SELECT * WHERE jt.id < 5
|INSERT INTO tbl2 SELECT * WHERE jt.id > 4
""".stripMargin)
}.getMessage
assert(m2.contains("mismatched input 'INSERT' expecting"))
val m3 = intercept[ParseException] {
parsePlan("ALTER VIEW testView AS INSERT INTO jt VALUES(1, 1)")
}.getMessage
assert(m3.contains("mismatched input 'INSERT' expecting"))
// Multi insert query
val m4 = intercept[ParseException] {
parsePlan(
"""
|ALTER VIEW testView AS FROM jt
|INSERT INTO tbl1 SELECT * WHERE jt.id < 5
|INSERT INTO tbl2 SELECT * WHERE jt.id > 4
""".stripMargin
)
}.getMessage
assert(m4.contains("mismatched input 'INSERT' expecting"))
}

test("Invalid insert constructs in the query") {
val m1 = intercept[ParseException] {
parsePlan("SELECT * FROM (INSERT INTO BAR VALUES (2))")
}.getMessage
assert(m1.contains("mismatched input 'FROM' expecting"))
val m2 = intercept[ParseException] {
parsePlan("SELECT * FROM S WHERE C1 IN (INSERT INTO T VALUES (2))")
}.getMessage
assert(m2.contains("mismatched input 'FROM' expecting"))
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -1257,15 +1257,6 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) {
if (ctx.identifierList != null) {
operationNotAllowed("CREATE VIEW ... PARTITIONED ON", ctx)
} else {
// CREATE VIEW ... AS INSERT INTO is not allowed.
ctx.query.queryNoWith match {
case s: SingleInsertQueryContext if s.insertInto != null =>
operationNotAllowed("CREATE VIEW ... AS INSERT INTO", ctx)
case _: MultiInsertQueryContext =>
operationNotAllowed("CREATE VIEW ... AS FROM ... [INSERT INTO ...]+", ctx)
case _ => // OK
}

val userSpecifiedColumns = Option(ctx.identifierCommentList).toSeq.flatMap { icl =>
icl.identifierComment.asScala.map { ic =>
ic.identifier.getText -> Option(ic.STRING).map(string)
Expand Down Expand Up @@ -1302,14 +1293,6 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) {
* }}}
*/
override def visitAlterViewQuery(ctx: AlterViewQueryContext): LogicalPlan = withOrigin(ctx) {
// ALTER VIEW ... AS INSERT INTO is not allowed.
ctx.query.queryNoWith match {
case s: SingleInsertQueryContext if s.insertInto != null =>
operationNotAllowed("ALTER VIEW ... AS INSERT INTO", ctx)
case _: MultiInsertQueryContext =>
operationNotAllowed("ALTER VIEW ... AS FROM ... [INSERT INTO ...]+", ctx)
case _ => // OK
}
AlterViewAsCommand(
name = visitTableIdentifier(ctx.tableIdentifier),
originalText = source(ctx.query),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -215,17 +215,6 @@ class SparkSqlParserSuite extends AnalysisTest {
"no viable alternative at input")
}

test("create view as insert into table") {
// Single insert query
intercept("CREATE VIEW testView AS INSERT INTO jt VALUES(1, 1)",
"Operation not allowed: CREATE VIEW ... AS INSERT INTO")

// Multi insert query
intercept("CREATE VIEW testView AS FROM jt INSERT INTO tbl1 SELECT * WHERE jt.id < 5 " +
"INSERT INTO tbl2 SELECT * WHERE jt.id > 4",
"Operation not allowed: CREATE VIEW ... AS FROM ... [INSERT INTO ...]+")
}

test("SPARK-17328 Fix NPE with EXPLAIN DESCRIBE TABLE") {
assertEqual("describe t",
DescribeTableCommand(TableIdentifier("t"), Map.empty, isExtended = false))
Expand Down Expand Up @@ -369,17 +358,6 @@ class SparkSqlParserSuite extends AnalysisTest {
Project(UnresolvedAlias(concat) :: Nil, UnresolvedRelation(TableIdentifier("t"))))
}

test("SPARK-25046 Fix Alter View ... As Insert Into Table") {
// Single insert query
intercept("ALTER VIEW testView AS INSERT INTO jt VALUES(1, 1)",
"Operation not allowed: ALTER VIEW ... AS INSERT INTO")

// Multi insert query
intercept("ALTER VIEW testView AS FROM jt INSERT INTO tbl1 SELECT * WHERE jt.id < 5 " +
"INSERT INTO tbl2 SELECT * WHERE jt.id > 4",
"Operation not allowed: ALTER VIEW ... AS FROM ... [INSERT INTO ...]+")
}

test("database and schema tokens are interchangeable") {
assertEqual("CREATE DATABASE foo", parser.parsePlan("CREATE SCHEMA foo"))
assertEqual("DROP DATABASE foo", parser.parsePlan("DROP SCHEMA foo"))
Expand Down