Skip to content

[SPARK-48006][SQL]add SortOrder for window function which has no orde… #46243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

guixiaowen
Copy link
Contributor

What changes were proposed in this pull request?

I am doing Hive SQL to switch to Spark SQL.

In Hive SQL

hive> explain select *,row_number() over (partition by day) rn from testdb.zeropart_db;

OK
Explain

In Spark SQL

spark-sql> explain select *,row_number() over (partition by age ) rn from testdb.zeropart_db;

plan

== Physical Plan ==

org.apache.spark.sql.AnalysisException: Window function row_number() requires window to be ordered, please add ORDER BY clause. For example SELECT row_number()(value_expr) OVER (PARTITION BY window_partition ORDER BY window_ordering) from table

Time taken: 0.172 seconds, Fetched 1 row(s)

For better compatibility with migration. For better compatibility with migration, new parameters are added to ensure compatibility with the same behavior as Hive SQL

Why are the changes needed?

For better compatibility with migration.

Does this PR introduce any user-facing change?

before this pr:

spark-sql> explain select *,row_number() over (partition by age ) rn from testdb.zeropart_db;

plan

== Physical Plan ==

org.apache.spark.sql.AnalysisException: Window function row_number() requires window to be ordered, please add ORDER BY clause. For example SELECT row_number()(value_expr) OVER (PARTITION BY window_partition ORDER BY window_ordering) from table

Time taken: 0.172 seconds, Fetched 1 row(s)

after this pr:

spark-sql> explain select *,row_number() over (partition by age ) rn from testdb.zeropart_db;
plan
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- Window [row_number() windowspecdefinition(age#37, age#37 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS rn#30], [age#37], [age#37 ASC NULLS FIRST]
+- Sort [age#37 ASC NULLS FIRST, age#37 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(age#37, 1000), ENSURE_REQUIREMENTS, [id=#53]
+- Scan hive testdb.zeropart_db [age#37, sex#38, name#39, day#40], HiveTableRelation [bigdata_qa.zeropart_db, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#37, sex#38, name#39], Partition Cols: [day#40]]

Time taken: 0.154 seconds, Fetched 1 row(s)

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

@github-actions github-actions bot added the SQL label Apr 26, 2024
@guixiaowen
Copy link
Contributor Author

@dongjoon-hyun hi, Can you help me review this PR?

@guixiaowen
Copy link
Contributor Author

@yaooqinn hi, Can you help me review this PR?

@guixiaowen
Copy link
Contributor Author

@dongjoon-hyun hi, Can you help me review this PR?

Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Aug 21, 2024
@github-actions github-actions bot closed this Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant