-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING #44352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
6aa6fc7
eaafeed
f423022
108d742
111ec3c
6c85cf2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -102,12 +102,11 @@ Project [udf(b)#x, udf(c)#x] | |
SELECT udf(b), udf(c) FROM test_having | ||
GROUP BY b, c HAVING udf(b) = 3 ORDER BY udf(b), udf(c) | ||
-- !query analysis | ||
Project [udf(b)#x, udf(c)#x] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do you know why the plan is changed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the previous resolution matches item 4 of
With this patch, it should match item 3
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh so the plan is actually more efficient now? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think so, the plan shows it eliminates some unnecessary column propagation across operators There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it is an analyzed plan, the optimized plan should be same with pr ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. both Analyzed plan and Optimized plan are changed :) before:
after:
|
||
+- Sort [cast(udf(cast(b#x as string)) as int) ASC NULLS FIRST, cast(udf(cast(c#x as string)) as string) ASC NULLS FIRST], true | ||
+- Filter (udf(b)#x = 3) | ||
+- Aggregate [b#x, c#x], [cast(udf(cast(b#x as string)) as int) AS udf(b)#x, cast(udf(cast(c#x as string)) as string) AS udf(c)#x, b#x, c#x] | ||
+- SubqueryAlias spark_catalog.default.test_having | ||
+- Relation spark_catalog.default.test_having[a#x,b#x,c#x,d#x] parquet | ||
Sort [udf(b)#x ASC NULLS FIRST, udf(c)#x ASC NULLS FIRST], true | ||
+- Filter (udf(b)#x = 3) | ||
+- Aggregate [b#x, c#x], [cast(udf(cast(b#x as string)) as int) AS udf(b)#x, cast(udf(cast(c#x as string)) as string) AS udf(c)#x] | ||
+- SubqueryAlias spark_catalog.default.test_having | ||
+- Relation spark_catalog.default.test_having[a#x,b#x,c#x,d#x] parquet | ||
|
||
|
||
-- !query | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we obtain the example?