[SPARK-48006][SQL]add SortOrder for window function which has no orde… #46243
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
I am doing Hive SQL to switch to Spark SQL.
In Hive SQL
hive> explain select *,row_number() over (partition by day) rn from testdb.zeropart_db;
OK
Explain
In Spark SQL
spark-sql> explain select *,row_number() over (partition by age ) rn from testdb.zeropart_db;
plan
== Physical Plan ==
org.apache.spark.sql.AnalysisException: Window function row_number() requires window to be ordered, please add ORDER BY clause. For example SELECT row_number()(value_expr) OVER (PARTITION BY window_partition ORDER BY window_ordering) from table
Time taken: 0.172 seconds, Fetched 1 row(s)
For better compatibility with migration. For better compatibility with migration, new parameters are added to ensure compatibility with the same behavior as Hive SQL
Why are the changes needed?
For better compatibility with migration.
Does this PR introduce any user-facing change?
before this pr:
spark-sql> explain select *,row_number() over (partition by age ) rn from testdb.zeropart_db;
plan
== Physical Plan ==
org.apache.spark.sql.AnalysisException: Window function row_number() requires window to be ordered, please add ORDER BY clause. For example SELECT row_number()(value_expr) OVER (PARTITION BY window_partition ORDER BY window_ordering) from table
Time taken: 0.172 seconds, Fetched 1 row(s)
after this pr:
spark-sql> explain select *,row_number() over (partition by age ) rn from testdb.zeropart_db;
plan
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- Window [row_number() windowspecdefinition(age#37, age#37 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS rn#30], [age#37], [age#37 ASC NULLS FIRST]
+- Sort [age#37 ASC NULLS FIRST, age#37 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(age#37, 1000), ENSURE_REQUIREMENTS, [id=#53]
+- Scan hive testdb.zeropart_db [age#37, sex#38, name#39, day#40], HiveTableRelation [
bigdata_qa
.zeropart_db
, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#37, sex#38, name#39], Partition Cols: [day#40]]Time taken: 0.154 seconds, Fetched 1 row(s)
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?