Skip to content

Commit 71133e1

Browse files
AngersZhuuuucloud-fan
authored andcommitted
[SPARK-35070][SQL] TRANSFORM not support alias in inputs
### What changes were proposed in this pull request? Normal function parameters should not support alias, hive not support too ![image](https://user-images.githubusercontent.com/46485123/114645556-4a7ff400-9d0c-11eb-91eb-bc679ea0039a.png) In this pr we forbid use alias in `TRANSFORM`'s inputs ### Why are the changes needed? Fix bug ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT Closes apache#32165 from AngersZhuuuu/SPARK-35070. Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
1 parent 767ea86 commit 71133e1

File tree

5 files changed

+125
-66
lines changed

5 files changed

+125
-66
lines changed

docs/sql-migration-guide.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,8 @@ license: |
7777

7878
- In Spark 3.2, `CREATE TABLE .. LIKE ..` command can not use reserved properties. You need their specific clauses to specify them, for example, `CREATE TABLE test1 LIKE test LOCATION 'some path'`. You can set `spark.sql.legacy.notReserveProperties` to `true` to ignore the `ParseException`, in this case, these properties will be silently removed, for example: `TBLPROPERTIES('owner'='yao')` will have no effect. In Spark version 3.1 and below, the reserved properties can be used in `CREATE TABLE .. LIKE ..` command but have no side effects, for example, `TBLPROPERTIES('location'='/tmp')` does not change the location of the table but only create a headless property just like `'a'='b'`.
7979

80+
- In Spark 3.2, `TRANSFORM` operator can't support alias in inputs. In Spark 3.1 and earlier, we can write script transform like `SELECT TRANSFORM(a AS c1, b AS c2) USING 'cat' FROM TBL`.
81+
8082
## Upgrading from Spark SQL 3.0 to 3.1
8183

8284
- In Spark 3.1, statistical aggregation function includes `std`, `stddev`, `stddev_samp`, `variance`, `var_samp`, `skewness`, `kurtosis`, `covar_samp`, `corr` will return `NULL` instead of `Double.NaN` when `DivideByZero` occurs during expression evaluation, for example, when `stddev_samp` applied on a single element set. In Spark version 3.0 and earlier, it will return `Double.NaN` in such case. To restore the behavior before Spark 3.1, you can set `spark.sql.legacy.statisticalAggregate` to `true`.

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -524,9 +524,9 @@ querySpecification
524524
;
525525

526526
transformClause
527-
: (SELECT kind=TRANSFORM '(' setQuantifier? namedExpressionSeq ')'
528-
| kind=MAP setQuantifier? namedExpressionSeq
529-
| kind=REDUCE setQuantifier? namedExpressionSeq)
527+
: (SELECT kind=TRANSFORM '(' setQuantifier? expressionSeq ')'
528+
| kind=MAP setQuantifier? expressionSeq
529+
| kind=REDUCE setQuantifier? expressionSeq)
530530
inRowFormat=rowFormat?
531531
(RECORDWRITER recordWriter=STRING)?
532532
USING script=STRING
@@ -774,6 +774,10 @@ expression
774774
: booleanExpression
775775
;
776776

777+
expressionSeq
778+
: expression (',' expression)*
779+
;
780+
777781
booleanExpression
778782
: NOT booleanExpression #logicalNot
779783
| EXISTS '(' query ')' #exists

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -627,6 +627,12 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
627627
.map(typedVisit[Expression])
628628
}
629629

630+
override def visitExpressionSeq(ctx: ExpressionSeqContext): Seq[Expression] = {
631+
Option(ctx).toSeq
632+
.flatMap(_.expression.asScala)
633+
.map(typedVisit[Expression])
634+
}
635+
630636
/**
631637
* Create a logical plan using a having clause.
632638
*/
@@ -680,8 +686,8 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
680686

681687
val plan = visitCommonSelectQueryClausePlan(
682688
relation,
689+
visitExpressionSeq(transformClause.expressionSeq),
683690
lateralView,
684-
transformClause.namedExpressionSeq,
685691
whereClause,
686692
aggregationClause,
687693
havingClause,
@@ -726,8 +732,8 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
726732

727733
val plan = visitCommonSelectQueryClausePlan(
728734
relation,
735+
visitNamedExpressionSeq(selectClause.namedExpressionSeq),
729736
lateralView,
730-
selectClause.namedExpressionSeq,
731737
whereClause,
732738
aggregationClause,
733739
havingClause,
@@ -740,8 +746,8 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
740746

741747
def visitCommonSelectQueryClausePlan(
742748
relation: LogicalPlan,
749+
expressions: Seq[Expression],
743750
lateralView: java.util.List[LateralViewContext],
744-
namedExpressionSeq: NamedExpressionSeqContext,
745751
whereClause: WhereClauseContext,
746752
aggregationClause: AggregationClauseContext,
747753
havingClause: HavingClauseContext,
@@ -753,8 +759,6 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
753759
// Add where.
754760
val withFilter = withLateralView.optionalMap(whereClause)(withWhereClause)
755761

756-
val expressions = visitNamedExpressionSeq(namedExpressionSeq)
757-
758762
// Add aggregation or a project.
759763
val namedExpressions = expressions.map {
760764
case e: NamedExpression => e

sql/core/src/test/resources/sql-tests/inputs/transform.sql

Lines changed: 30 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,7 @@ FROM script_trans
206206
LIMIT 1;
207207

208208
SELECT TRANSFORM(
209-
b AS d5, a,
209+
b, a,
210210
CASE
211211
WHEN c > 100 THEN 1
212212
WHEN c < 100 THEN 2
@@ -225,45 +225,45 @@ SELECT TRANSFORM(*)
225225
FROM script_trans
226226
WHERE a <= 4;
227227

228-
SELECT TRANSFORM(b AS d, MAX(a) as max_a, CAST(SUM(c) AS STRING))
228+
SELECT TRANSFORM(b, MAX(a), CAST(SUM(c) AS STRING))
229229
USING 'cat' AS (a, b, c)
230230
FROM script_trans
231231
WHERE a <= 4
232232
GROUP BY b;
233233

234-
SELECT TRANSFORM(b AS d, MAX(a) FILTER (WHERE a > 3) AS max_a, CAST(SUM(c) AS STRING))
234+
SELECT TRANSFORM(b, MAX(a) FILTER (WHERE a > 3), CAST(SUM(c) AS STRING))
235235
USING 'cat' AS (a,b,c)
236236
FROM script_trans
237237
WHERE a <= 4
238238
GROUP BY b;
239239

240-
SELECT TRANSFORM(b, MAX(a) as max_a, CAST(sum(c) AS STRING))
240+
SELECT TRANSFORM(b, MAX(a), CAST(sum(c) AS STRING))
241241
USING 'cat' AS (a, b, c)
242242
FROM script_trans
243243
WHERE a <= 2
244244
GROUP BY b;
245245

246-
SELECT TRANSFORM(b, MAX(a) as max_a, CAST(SUM(c) AS STRING))
246+
SELECT TRANSFORM(b, MAX(a), CAST(SUM(c) AS STRING))
247247
USING 'cat' AS (a, b, c)
248248
FROM script_trans
249249
WHERE a <= 4
250250
GROUP BY b
251-
HAVING max_a > 0;
251+
HAVING MAX(a) > 0;
252252

253-
SELECT TRANSFORM(b, MAX(a) as max_a, CAST(SUM(c) AS STRING))
253+
SELECT TRANSFORM(b, MAX(a), CAST(SUM(c) AS STRING))
254254
USING 'cat' AS (a, b, c)
255255
FROM script_trans
256256
WHERE a <= 4
257257
GROUP BY b
258-
HAVING max(a) > 1;
258+
HAVING MAX(a) > 1;
259259

260-
SELECT TRANSFORM(b, MAX(a) OVER w as max_a, CAST(SUM(c) OVER w AS STRING))
260+
SELECT TRANSFORM(b, MAX(a) OVER w, CAST(SUM(c) OVER w AS STRING))
261261
USING 'cat' AS (a, b, c)
262262
FROM script_trans
263263
WHERE a <= 4
264264
WINDOW w AS (PARTITION BY b ORDER BY a);
265265

266-
SELECT TRANSFORM(b, MAX(a) as max_a, CAST(SUM(c) AS STRING), myCol, myCol2)
266+
SELECT TRANSFORM(b, MAX(a), CAST(SUM(c) AS STRING), myCol, myCol2)
267267
USING 'cat' AS (a, b, c, d, e)
268268
FROM script_trans
269269
LATERAL VIEW explode(array(array(1,2,3))) myTable AS myCol
@@ -280,7 +280,7 @@ FROM(
280280
SELECT a + 1;
281281

282282
FROM(
283-
SELECT TRANSFORM(a, SUM(b) b)
283+
SELECT TRANSFORM(a, SUM(b))
284284
USING 'cat' AS (`a` INT, b STRING)
285285
FROM script_trans
286286
GROUP BY a
@@ -308,14 +308,6 @@ HAVING true;
308308

309309
SET spark.sql.legacy.parser.havingWithoutGroupByAsWhere=false;
310310

311-
SET spark.sql.parser.quotedRegexColumnNames=true;
312-
313-
SELECT TRANSFORM(`(a|b)?+.+`)
314-
USING 'cat' AS (c)
315-
FROM script_trans;
316-
317-
SET spark.sql.parser.quotedRegexColumnNames=false;
318-
319311
-- SPARK-34634: self join using CTE contains transform
320312
WITH temp AS (
321313
SELECT TRANSFORM(a) USING 'cat' AS (b string) FROM t
@@ -331,3 +323,22 @@ SELECT TRANSFORM(ALL b, a, c)
331323
USING 'cat' AS (a, b, c)
332324
FROM script_trans
333325
WHERE a <= 4;
326+
327+
-- SPARK-35070: TRANSFORM not support alias in inputs
328+
SELECT TRANSFORM(b AS b_1, MAX(a), CAST(sum(c) AS STRING))
329+
USING 'cat' AS (a, b, c)
330+
FROM script_trans
331+
WHERE a <= 2
332+
GROUP BY b;
333+
334+
SELECT TRANSFORM(b b_1, MAX(a), CAST(sum(c) AS STRING))
335+
USING 'cat' AS (a, b, c)
336+
FROM script_trans
337+
WHERE a <= 2
338+
GROUP BY b;
339+
340+
SELECT TRANSFORM(b, MAX(a) AS max_a, CAST(sum(c) AS STRING))
341+
USING 'cat' AS (a, b, c)
342+
FROM script_trans
343+
WHERE a <= 2
344+
GROUP BY b;

sql/core/src/test/resources/sql-tests/results/transform.sql.out

Lines changed: 77 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -376,7 +376,7 @@ struct<a:int,b:int>
376376

377377
-- !query
378378
SELECT TRANSFORM(
379-
b AS d5, a,
379+
b, a,
380380
CASE
381381
WHEN c > 100 THEN 1
382382
WHEN c < 100 THEN 2
@@ -416,7 +416,7 @@ struct<a:string,b:string,c:string>
416416

417417

418418
-- !query
419-
SELECT TRANSFORM(b AS d, MAX(a) as max_a, CAST(SUM(c) AS STRING))
419+
SELECT TRANSFORM(b, MAX(a), CAST(SUM(c) AS STRING))
420420
USING 'cat' AS (a, b, c)
421421
FROM script_trans
422422
WHERE a <= 4
@@ -429,7 +429,7 @@ struct<a:string,b:string,c:string>
429429

430430

431431
-- !query
432-
SELECT TRANSFORM(b AS d, MAX(a) FILTER (WHERE a > 3) AS max_a, CAST(SUM(c) AS STRING))
432+
SELECT TRANSFORM(b, MAX(a) FILTER (WHERE a > 3), CAST(SUM(c) AS STRING))
433433
USING 'cat' AS (a,b,c)
434434
FROM script_trans
435435
WHERE a <= 4
@@ -442,7 +442,7 @@ struct<a:string,b:string,c:string>
442442

443443

444444
-- !query
445-
SELECT TRANSFORM(b, MAX(a) as max_a, CAST(sum(c) AS STRING))
445+
SELECT TRANSFORM(b, MAX(a), CAST(sum(c) AS STRING))
446446
USING 'cat' AS (a, b, c)
447447
FROM script_trans
448448
WHERE a <= 2
@@ -454,12 +454,12 @@ struct<a:string,b:string,c:string>
454454

455455

456456
-- !query
457-
SELECT TRANSFORM(b, MAX(a) as max_a, CAST(SUM(c) AS STRING))
457+
SELECT TRANSFORM(b, MAX(a), CAST(SUM(c) AS STRING))
458458
USING 'cat' AS (a, b, c)
459459
FROM script_trans
460460
WHERE a <= 4
461461
GROUP BY b
462-
HAVING max_a > 0
462+
HAVING MAX(a) > 0
463463
-- !query schema
464464
struct<a:string,b:string,c:string>
465465
-- !query output
@@ -468,20 +468,20 @@ struct<a:string,b:string,c:string>
468468

469469

470470
-- !query
471-
SELECT TRANSFORM(b, MAX(a) as max_a, CAST(SUM(c) AS STRING))
471+
SELECT TRANSFORM(b, MAX(a), CAST(SUM(c) AS STRING))
472472
USING 'cat' AS (a, b, c)
473473
FROM script_trans
474474
WHERE a <= 4
475475
GROUP BY b
476-
HAVING max(a) > 1
476+
HAVING MAX(a) > 1
477477
-- !query schema
478478
struct<a:string,b:string,c:string>
479479
-- !query output
480480
5 4 6
481481

482482

483483
-- !query
484-
SELECT TRANSFORM(b, MAX(a) OVER w as max_a, CAST(SUM(c) OVER w AS STRING))
484+
SELECT TRANSFORM(b, MAX(a) OVER w, CAST(SUM(c) OVER w AS STRING))
485485
USING 'cat' AS (a, b, c)
486486
FROM script_trans
487487
WHERE a <= 4
@@ -494,7 +494,7 @@ struct<a:string,b:string,c:string>
494494

495495

496496
-- !query
497-
SELECT TRANSFORM(b, MAX(a) as max_a, CAST(SUM(c) AS STRING), myCol, myCol2)
497+
SELECT TRANSFORM(b, MAX(a), CAST(SUM(c) AS STRING), myCol, myCol2)
498498
USING 'cat' AS (a, b, c, d, e)
499499
FROM script_trans
500500
LATERAL VIEW explode(array(array(1,2,3))) myTable AS myCol
@@ -527,7 +527,7 @@ struct<(a + 1):int>
527527

528528
-- !query
529529
FROM(
530-
SELECT TRANSFORM(a, SUM(b) b)
530+
SELECT TRANSFORM(a, SUM(b))
531531
USING 'cat' AS (`a` INT, b STRING)
532532
FROM script_trans
533533
GROUP BY a
@@ -600,34 +600,6 @@ struct<key:string,value:string>
600600
spark.sql.legacy.parser.havingWithoutGroupByAsWhere false
601601

602602

603-
-- !query
604-
SET spark.sql.parser.quotedRegexColumnNames=true
605-
-- !query schema
606-
struct<key:string,value:string>
607-
-- !query output
608-
spark.sql.parser.quotedRegexColumnNames true
609-
610-
611-
-- !query
612-
SELECT TRANSFORM(`(a|b)?+.+`)
613-
USING 'cat' AS (c)
614-
FROM script_trans
615-
-- !query schema
616-
struct<c:string>
617-
-- !query output
618-
3
619-
6
620-
9
621-
622-
623-
-- !query
624-
SET spark.sql.parser.quotedRegexColumnNames=false
625-
-- !query schema
626-
struct<key:string,value:string>
627-
-- !query output
628-
spark.sql.parser.quotedRegexColumnNames false
629-
630-
631603
-- !query
632604
WITH temp AS (
633605
SELECT TRANSFORM(a) USING 'cat' AS (b string) FROM t
@@ -679,3 +651,69 @@ SELECT TRANSFORM(ALL b, a, c)
679651
USING 'cat' AS (a, b, c)
680652
FROM script_trans
681653
WHERE a <= 4
654+
655+
656+
-- !query
657+
SELECT TRANSFORM(b AS b_1, MAX(a), CAST(sum(c) AS STRING))
658+
USING 'cat' AS (a, b, c)
659+
FROM script_trans
660+
WHERE a <= 2
661+
GROUP BY b
662+
-- !query schema
663+
struct<>
664+
-- !query output
665+
org.apache.spark.sql.catalyst.parser.ParseException
666+
667+
no viable alternative at input 'SELECT TRANSFORM(b AS'(line 1, pos 19)
668+
669+
== SQL ==
670+
SELECT TRANSFORM(b AS b_1, MAX(a), CAST(sum(c) AS STRING))
671+
-------------------^^^
672+
USING 'cat' AS (a, b, c)
673+
FROM script_trans
674+
WHERE a <= 2
675+
GROUP BY b
676+
677+
678+
-- !query
679+
SELECT TRANSFORM(b b_1, MAX(a), CAST(sum(c) AS STRING))
680+
USING 'cat' AS (a, b, c)
681+
FROM script_trans
682+
WHERE a <= 2
683+
GROUP BY b
684+
-- !query schema
685+
struct<>
686+
-- !query output
687+
org.apache.spark.sql.catalyst.parser.ParseException
688+
689+
no viable alternative at input 'SELECT TRANSFORM(b b_1'(line 1, pos 19)
690+
691+
== SQL ==
692+
SELECT TRANSFORM(b b_1, MAX(a), CAST(sum(c) AS STRING))
693+
-------------------^^^
694+
USING 'cat' AS (a, b, c)
695+
FROM script_trans
696+
WHERE a <= 2
697+
GROUP BY b
698+
699+
700+
-- !query
701+
SELECT TRANSFORM(b, MAX(a) AS max_a, CAST(sum(c) AS STRING))
702+
USING 'cat' AS (a, b, c)
703+
FROM script_trans
704+
WHERE a <= 2
705+
GROUP BY b
706+
-- !query schema
707+
struct<>
708+
-- !query output
709+
org.apache.spark.sql.catalyst.parser.ParseException
710+
711+
no viable alternative at input 'SELECT TRANSFORM(b, MAX(a) AS'(line 1, pos 27)
712+
713+
== SQL ==
714+
SELECT TRANSFORM(b, MAX(a) AS max_a, CAST(sum(c) AS STRING))
715+
---------------------------^^^
716+
USING 'cat' AS (a, b, c)
717+
FROM script_trans
718+
WHERE a <= 2
719+
GROUP BY b

0 commit comments

Comments
 (0)