-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-17895] Improve doc for rangeBetween and rowsBetween #15727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17895] Improve doc for rangeBetween and rowsBetween #15727
Conversation
* {{{ | ||
* import org.apache.spark.sql.expressions.Window | ||
* val df = Seq((1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, "b")).toDF("id", "category") | ||
* df.withColumn("sum", sum('id) over Window.partitionBy('category).orderBy('id).rowsBetween(0,1)).show |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add () to show, i.e. show()
@david-weiluo-ren this already has a conflict. Can you update it? |
@@ -91,7 +91,7 @@ object Window { | |||
* {{{ | |||
* import org.apache.spark.sql.expressions.Window | |||
* val df = Seq((1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, "b")).toDF("id", "category") | |||
* df.withColumn("sum", sum('id) over Window.partitionBy('category).orderBy('id).rowsBetween(0,1)).show | |||
* df.withColumn("sum", sum('id) over Window.partitionBy('category).orderBy('id).rowsBetween(0,1)).show() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw this is over 100 chars long and will fail style checker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also you should probably use .over(...)
rather than infix notation here
Test build #3395 has finished for PR 15727 at commit
|
## What changes were proposed in this pull request? Copied description for row and range based frame boundary from https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala#L56 Added examples to show different behavior of rangeBetween and rowsBetween when involving duplicate values. Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request. Author: buzhihuojie <ren.weiluo@gmail.com> Closes #15727 from david-weiluo-ren/improveDocForRangeAndRowsBetween. (cherry picked from commit 742e0fe) Signed-off-by: Reynold Xin <rxin@databricks.com>
Thanks - merging in master/branch-2.1. |
## What changes were proposed in this pull request? Copied description for row and range based frame boundary from https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala#L56 Added examples to show different behavior of rangeBetween and rowsBetween when involving duplicate values. Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request. Author: buzhihuojie <ren.weiluo@gmail.com> Closes apache#15727 from david-weiluo-ren/improveDocForRangeAndRowsBetween.
What changes were proposed in this pull request?
Copied description for row and range based frame boundary from https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala#L56
Added examples to show different behavior of rangeBetween and rowsBetween when involving duplicate values.
Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request.