Skip to content

[SPARK-31390][SQL][DOCS] Document Window Function in SQL Syntax Section #28220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

huaxingao
Copy link
Contributor

@huaxingao huaxingao commented Apr 15, 2020

What changes were proposed in this pull request?

Document Window Function in SQL syntax

Why are the changes needed?

Make SQL Reference complete

Does this PR introduce any user-facing change?

Yes

Screen Shot 2020-04-16 at 9 13 34 PM

Screen Shot 2020-04-16 at 9 14 12 PM

Screen Shot 2020-04-16 at 9 14 45 PM

Screen Shot 2020-04-16 at 9 15 10 PM

Screen Shot 2020-04-16 at 9 15 25 PM

How was this patch tested?

Manually build and check

@SparkQA
Copy link

SparkQA commented Apr 15, 2020

Test build #121296 has finished for PR 28220 at commit 572f7d7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@huaxingao
Copy link
Contributor Author

cc @maropu

@maropu
Copy link
Member

maropu commented Apr 15, 2020

also cc: @viirya

**This page is under construction**
### Description

Similarly to aggregate functions, window functions operate on a group of rows. However, unlike aggregate functions, window functions perform aggregation without reducing, calculating a return value for each row in the group. Window functions are useful for processing tasks such as calculating a moving average, computing a cumulative, or accessing the value of rows given the relative position of the current row. Spark SQL supports three types of window functions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about Similarly to aggregate functions, window functions operate on a group of rows. -> A window function operates on a group of rows and this is comparable to aggregate functions. ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

window functions perform aggregation without reducing, calculating a return value for each row in the group. is not clear. This means window functions do not compute a single aggregated value. Instead, they can generate multiple aggregated values for each group?

@SparkQA
Copy link

SparkQA commented Apr 15, 2020

Test build #121331 has finished for PR 28220 at commit 5fadfae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

**This page is under construction**
### Description

Similarly to aggregate functions, window functions operate on a group of rows. However, unlike aggregate functions, window functions perform aggregation without reducing, calculating an aggregated value for each row in the specified window. Window functions are useful for processing tasks such as calculating a moving average, computing a cumulative, or accessing the value of rows given the relative position of the current row. Spark SQL supports three types of window functions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"without reducing"? Sounds confusing. How about "without reducing the number of rows"?

And "but calculating an aggregated value for each row in the specified window."

@SparkQA
Copy link

SparkQA commented Apr 15, 2020

Test build #121333 has finished for PR 28220 at commit 52922f7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

### Description

Window functions operate on a group of rows, referred to as a window, and calculate an aggregated value for each row based on the specified window. Window functions are useful for processing tasks such as calculating a moving average, computing a cumulative, or accessing the value of rows given the relative position of the current row. Spark SQL supports three types of window functions:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this look better now? @maropu @viirya

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also cc @srowen
Please feel free to rephrase. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

computing a cumulative -> computing a cumulative sum (or anything similar: average, statistic)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it looks better. How about putting the last statement in a new line?;

...the current row. 

Spark SQL supports three types of window functions:

  * Ranking Functions
  * Analytic Functions
  * Aggregate Functions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this list here? The Syntax section has the same list.

@SparkQA
Copy link

SparkQA commented Apr 15, 2020

Test build #121337 has finished for PR 28220 at commit 3b0c0a8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Specifies a comma separated list of key and value pairs for partitions.<br><br>
<b>Syntax:</b><br>
<code>
{ PARTITION | DISTRIBUTE } BY partition_col_name = partition_col_val ( [ , ... ] )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I found the double spaces in this line.

MAX | MIN | COUNT | SUM | AVG | ...
</code>
<br>
Please refer <a href="api/sql/">here</a> for a complete list of Spark Aggregate Functions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: here -> the Built-in Function document?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Spark Aggregate Functions. -> Spark aggregate functions.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will put sql-ref-functions-builtin.html as the link for Built-in Function document. It's broken now but will work after your PR is in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, could you revert the link back? I'm currently not sure that my PR is target at 3.0.

### Description

Window functions operate on a group of rows, referred to as a window, and calculate an aggregated value for each row based on the specified window. Window functions are useful for processing tasks such as calculating a moving average, computing a cumulative, or accessing the value of rows given the relative position of the current row. Spark SQL supports three types of window functions:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it looks better. How about putting the last statement in a new line?;

...the current row. 

Spark SQL supports three types of window functions:

  * Ranking Functions
  * Analytic Functions
  * Aggregate Functions

@SparkQA
Copy link

SparkQA commented Apr 16, 2020

Test build #121342 has finished for PR 28220 at commit ea8ee10.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 16, 2020

Test build #121344 has finished for PR 28220 at commit 116f403.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

<dt><code><em>window_function</em></code></dt>
<dd>
<ul>
<li> Ranking Functions </li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: <li> Ranking Functions </li> -> <li>Ranking Functions</li>

MAX | MIN | COUNT | SUM | AVG | ...
</code>
<br>
Please refer to the <a href="sql-ref-functions-builtin.html">Built-in Function</a> document for a complete list of Spark aggregate functions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Built-in Function -> Built-in Functions by referring to the title in the doc: https://spark.apache.org/docs/latest/api/sql/index.html

Specifies an ordering of the rows.<br><br>
<b>Syntax:</b><br>
<code>
{ ORDER | SORT } BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ... ] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move ORDER BY and PARTITION BY caluses into the Syntax section like the Pg doc one?

[ existing_window_name ]
[ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ frame_clause ]

https://www.postgresql.org/docs/current/sql-select.html

**This page is under construction**
### Description

Window functions operate on a group of rows, referred to as a window, and calculate an aggregated value for each row based on the specified window. Window functions are useful for processing tasks such as calculating a moving average, computing a cumulative statistic, or accessing the value of rows given the relative position of the current row.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... calculate a return value for each row based on a group of rows"

@SparkQA
Copy link

SparkQA commented Apr 16, 2020

Test build #121352 has finished for PR 28220 at commit 6af2eba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

UNBOUNDED { PRECEDING | FOLLOWING }
| CURRENT ROW
| boolean_expression { PRECEDING | FOLLOWING }
</code> <br><br>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to describe what these clauses (RANGE, ROWS, BETWEEN, ...) are.

@SparkQA
Copy link

SparkQA commented Apr 16, 2020

Test build #121371 has finished for PR 28220 at commit 3fb73f0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

### Examples

{% highlight sql %}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove this blank.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu
Copy link
Member

maropu commented Apr 17, 2020

Could you update the screenshot in the description, too?

@SparkQA
Copy link

SparkQA commented Apr 17, 2020

Test build #121393 has finished for PR 28220 at commit 747cfef.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@huaxingao
Copy link
Contributor Author

@maropu I have addressed the last two comments and updated the screenshots in description. Thanks for reviewing!

@huaxingao
Copy link
Contributor Author

cc @srowen for final sign off.

+-----+-----------+------+-----+

SELECT name, salary,
LAG(salary) OVER (PARTITION BY dept ORDER BY salary) as lag,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as -> AS
but definitely don't change it just for that. Looks fine. I'll merge shortly

+-----+-----------+------+----------+

SELECT name, dept, age, CUME_DIST() OVER (PARTITION BY dept ORDER BY age
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as cume_dist FROM employees;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: as -> AS here, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix this

@SparkQA
Copy link

SparkQA commented Apr 18, 2020

Test build #121433 has finished for PR 28220 at commit 6a3d475.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu closed this in 142f436 Apr 18, 2020
maropu pushed a commit that referenced this pull request Apr 18, 2020
### What changes were proposed in this pull request?
Document Window Function in SQL syntax

### Why are the changes needed?
Make SQL Reference complete

### Does this PR introduce any user-facing change?
Yes

<img width="1050" alt="Screen Shot 2020-04-16 at 9 13 34 PM" src="https://user-images.githubusercontent.com/13592258/79531509-7bf5af00-8027-11ea-8291-a91b2e97a1b5.png">

<img width="1050" alt="Screen Shot 2020-04-16 at 9 14 12 PM" src="https://user-images.githubusercontent.com/13592258/79531514-7e580900-8027-11ea-8761-4c5a888c476f.png">

<img width="1050" alt="Screen Shot 2020-04-16 at 9 14 45 PM" src="https://user-images.githubusercontent.com/13592258/79531518-82842680-8027-11ea-876f-6375aa5b5ead.png">

<img width="1050" alt="Screen Shot 2020-04-16 at 9 15 10 PM" src="https://user-images.githubusercontent.com/13592258/79531521-844dea00-8027-11ea-8948-712f054d42ee.png">

<img width="1050" alt="Screen Shot 2020-04-16 at 9 15 25 PM" src="https://user-images.githubusercontent.com/13592258/79531528-8748da80-8027-11ea-9dae-a465286982ac.png">

### How was this patch tested?
Manually build and check

Closes #28220 from huaxingao/sql-win-fun.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(cherry picked from commit 142f436)
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
@maropu
Copy link
Member

maropu commented Apr 18, 2020

Thanks! Merged to master/3.0.

@huaxingao
Copy link
Contributor Author

Thanks, all!

@huaxingao huaxingao deleted the sql-win-fun branch April 18, 2020 00:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants