Skip to content

[SPARK-4296][SQL] Trims aliases when resolving and checking aggregate expressions #3910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -209,10 +209,10 @@ class Analyzer(catalog: Catalog,

aggregateExprs.find { e =>
!isValidAggregateExpression(e.transform {
// Should trim aliases around `GetField`s. These aliases are introduced while
// resolving struct field accesses, because `GetField` is not a `NamedExpression`.
// (Should we just turn `GetField` into a `NamedExpression`?)
case Alias(g: GetField, _) => g
// Should trim aliases. These aliases can only be introduced while resolving unnamed
// expressions like `GetField` and UDF calls, because GROUP BY clause doesn't allow
// aliasing.
case a: Alias => a.child
})
}.foreach { e =>
throw new TreeNodeException(plan, s"Expression not in GROUP BY: $e")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -153,11 +153,11 @@ object PartialAggregation {
partialEvaluations(new TreeNodeRef(e)).finalEvaluation

case e: Expression =>
// Should trim aliases around `GetField`s. These aliases are introduced while
// resolving struct field accesses, because `GetField` is not a `NamedExpression`.
// (Should we just turn `GetField` into a `NamedExpression`?)
namedGroupingExpressions
.get(e.transform { case Alias(g: GetField, _) => g })
// Should trim aliases. These aliases can only be introduced while resolving unnamed
// expressions like `GetField` and UDF calls, because GROUP BY clause doesn't allow
// aliasing.
.get(e.transform { case a: Alias => a.child })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this line can remove needed aliases. For example, in subq2.q

SELECT a.k, a.c
FROM (SELECT b.key as k, count(1) as c FROM src b GROUP BY b.key) a
WHERE a.k >= 90;

We have the following plans.

[info]   == Parsed Logical Plan ==
[info]   'Project ['a.k,'a.c]
[info]    'Filter ('a.k >= 90)
[info]     'Subquery a
[info]      'Aggregate ['b.key], ['b.key AS k#84130,COUNT(1) AS c#84131L]
[info]       'UnresolvedRelation [src], Some(b)
[info]   
[info]   == Analyzed Logical Plan ==
[info]   Project [k#84130,c#84131L]
[info]    Filter (k#84130 >= 90)
[info]     Aggregate [key#84132], [key#84132 AS k#84130,COUNT(1) AS c#84131L]
[info]      MetastoreRelation default, src, Some(b)
[info]   
[info]   == Optimized Logical Plan ==
[info]   Filter (k#84130 >= 90)
[info]    Aggregate [key#84132], [key#84132 AS k#84130,COUNT(1) AS c#84131L]
[info]     Project [key#84132]
[info]      MetastoreRelation default, src, Some(b)
[info]   
[info]   == Physical Plan ==
[info]   !Filter (k#84130 >= 90)
[info]    Aggregate false, [key#84132], [key#84132,Coalesce(SUM(PartialCount#84135L),0) AS c#84131L]
[info]     Exchange (HashPartitioning [key#84132], 2)
[info]      Aggregate true, [key#84132], [key#84132,COUNT(1) AS PartialCount#84135L]
[info]       HiveTableScan [key#84132], (MetastoreRelation default, src, Some(b)), None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this issue puzzled me for a loooong time! I realized this solution is wrong. But I didn't figure out the right one until you pointed out that #3987 (accidentally?) fixes this. Thanks!

.map(_.toAttribute)
.getOrElse(e)
}).asInstanceOf[Seq[NamedExpression]]
Expand Down
11 changes: 11 additions & 0 deletions sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -980,6 +980,17 @@ class SQLQuerySuite extends QueryTest with BeforeAndAfterAll {
dropTempTable("data")
}

test("SPARK-4296 Grouping field with UDF as sub expression") {
registerFunction("triple", (_: Int) * 3)
jsonRDD(sparkContext.makeRDD("""{"a": 1}""" :: Nil)).registerTempTable("data")
checkAnswer(sql("SELECT triple(a) FROM data GROUP BY triple(a)"), 3)
dropTempTable("data")

jsonRDD(sparkContext.makeRDD("""{"a": 1}""" :: Nil)).registerTempTable("data")
checkAnswer(sql("SELECT triple(a) + 1 FROM data GROUP BY triple(a) + 1"), 4)
dropTempTable("data")
}

test("SPARK-4432 Fix attribute reference resolution error when using ORDER BY") {
checkAnswer(
sql("SELECT a + b FROM testData2 ORDER BY a"),
Expand Down