Skip to content

[SPARK-31369][SQL][DOCS] Documentation for JSON Functions #28170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

iRakson
Copy link
Contributor

@iRakson iRakson commented Apr 9, 2020

What changes were proposed in this pull request?

Documentation for JSON functions

Why are the changes needed?

To complete the SQL references.

Does this PR introduce any user-facing change?

Yes. Now users can access JSON functions doc.

How was this patch tested?

Manually

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@iRakson
Copy link
Contributor Author

iRakson commented Apr 9, 2020

@maropu @huaxingao
Kindly Review this PR

@HyukjinKwon
Copy link
Member

Hm, I think we don't need this - they will be automatically documented by ExpressionDescription you used (see https://spark.apache.org/docs/latest/api/sql/index.html)

@iRakson
Copy link
Contributor Author

iRakson commented Apr 9, 2020

Hm, I think we don't need this - they will be automatically documented by ExpressionDescription you used (see https://spark.apache.org/docs/latest/api/sql/index.html)

Actually @maropu asked to add this. If it is not required, i will close it.

https://issues.apache.org/jira/browse/SPARK-31009?focusedCommentId=17054251&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17054251

@maropu
Copy link
Member

maropu commented Apr 10, 2020

I asked the same question in #28099 (comment).
(I'm still wondering if these document pages are really necessary though) Probably, the categorized document pages only for popular built-in function groups (e.g., aggregate, array, and date/time functions) might be more useful for users than a full flatten list of the functions in api/sql/. For example the PostgreSQL documents have the categorized pages for the builtin functions:

Probably, I think this is an issue of maintainance cost vs document quality.

cc: @huaxingao @kevinyu98 @gatorsmile

@huaxingao
Copy link
Contributor

When I did #28157, I actually changed the built-in function page to the following

Screen Shot 2020-04-09 at 11 58 36 PM

I think aggregation function and window function are worthwhile to document separately. We need to document Window function for sure because in query section, we have
[ WINDOW { named_window [ , WINDOW named_window, ... ] } ]
So we need an introduction page to explain what window function is, how to use it, and what window functions Spark supports. Aggregation function is very important, and it can be used as window function as well, so it would be worth having a page to document this too.

@gatorsmile WDYT?

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Apr 10, 2020

I actually discussed this with @gatorsmile and @cloud-fan offline. I think we should basically reuse ExpressionDescription and generate the function documentation there .

We can add some more tags or fields into ExpressionDescription, and generate the documentation just like we do for SQL configuration (see also #27459, #24259). I am sure we can generate similar pages by grouping, updating, etc. in the Python script (e.g., gen-sql-api-docs.py and gen-sql-config-docs.py).

Also, we can remove the documentation when it's wrapped ExpressionDescription. For instance, I think we can remove all Scaladoc in the case like this (and merge the doc into ExpressionDescription):

/**
* Convert a num from one base to another
*
* @param numExpr the number to be converted
* @param fromBaseExpr from which base
* @param toBaseExpr to which base
*/
@ExpressionDescription(
usage = "_FUNC_(num, from_base, to_base) - Convert `num` from `from_base` to `to_base`.",
examples = """
Examples:
> SELECT _FUNC_('100', 2, 10);
4
> SELECT _FUNC_(-10, 16, -10);
-16
""")

Also, the examples in ExpressionDescription are tested in the PR builders as of #25942 so we won't have to update the examples.

@iRakson
Copy link
Contributor Author

iRakson commented Apr 10, 2020

@HyukjinKwon @maropu
Should i close this one then ?

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Apr 10, 2020

I think somebody can do the base job to add a field into ExpressionDescription and infra to generate the doc like the links I pointed out.

Then we could add the documentations accordingly in this PR, #28157 and #28120.

@maropu, @huaxingao, @kevinyu98, I am very sorry I happened to check the ongoing on efforts in function documentation late, and raise another way late like this. Could we maybe try a way of reusing https://spark.apache.org/docs/latest/api/sql/index.html?

@maropu
Copy link
Member

maropu commented Apr 10, 2020

nvm, @HyukjinKwon. The proposed approach looks the best to me.
Btw, is it ok that the target of the ExpressionDescription improvement is 3.1? I think we don't have much time until RC2 started and there are the other left SQL document writing tasks with higher priorities for 3.0.

@maropu
Copy link
Member

maropu commented Apr 10, 2020

@iRakson Yea, closing this looks fine to me. Anyway, thanks for work.

@HyukjinKwon
Copy link
Member

Yes, I am good with any target version.

@iRakson iRakson closed this Apr 10, 2020
HyukjinKwon pushed a commit that referenced this pull request Apr 14, 2020
### What changes were proposed in this pull request?

This PR intends to drop the built-in function pages from SQL references. We've already had a complete list of built-in functions in the API documents.

See related discussions for more details:
#28170 (comment)

### Why are the changes needed?

For better SQL documents.

### Does this PR introduce any user-facing change?

![functions](https://user-images.githubusercontent.com/692303/79109009-793e5400-7db2-11ea-8cb7-4c3cf31ccb77.png)

### How was this patch tested?

Manually checked.

Closes #28203 from maropu/DropBuiltinFunctionDocs.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
HyukjinKwon pushed a commit that referenced this pull request Apr 14, 2020
### What changes were proposed in this pull request?

This PR intends to drop the built-in function pages from SQL references. We've already had a complete list of built-in functions in the API documents.

See related discussions for more details:
#28170 (comment)

### Why are the changes needed?

For better SQL documents.

### Does this PR introduce any user-facing change?

![functions](https://user-images.githubusercontent.com/692303/79109009-793e5400-7db2-11ea-8cb7-4c3cf31ccb77.png)

### How was this patch tested?

Manually checked.

Closes #28203 from maropu/DropBuiltinFunctionDocs.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit 853c6c9)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?

This PR intends to drop the built-in function pages from SQL references. We've already had a complete list of built-in functions in the API documents.

See related discussions for more details:
apache#28170 (comment)

### Why are the changes needed?

For better SQL documents.

### Does this PR introduce any user-facing change?

![functions](https://user-images.githubusercontent.com/692303/79109009-793e5400-7db2-11ea-8cb7-4c3cf31ccb77.png)

### How was this patch tested?

Manually checked.

Closes apache#28203 from maropu/DropBuiltinFunctionDocs.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants