-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page #32546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Kubernetes integration test starting |
Kubernetes integration test status failure |
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
Outdated
Show resolved
Hide resolved
Test build #138548 has finished for PR 32546 at commit
|
Test build #138632 has finished for PR 32546 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #138635 has finished for PR 32546 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Jenkins, retest this please |
Kubernetes integration test unable to build dist. exiting with code: 1 |
Test build #138654 has finished for PR 32546 at commit
|
Kubernetes integration test status failure |
Kubernetes integration test starting |
Test build #138738 has finished for PR 32546 at commit
|
Test build #138742 has finished for PR 32546 at commit
|
Kubernetes integration test status failure |
Test build #138750 has finished for PR 32546 at commit
|
Looks pretty good otherwise. Don't forgot to update Pr description as well. cc @dongjoon-hyun FYI |
Kubernetes integration test starting |
Kubernetes integration test status success |
Thank you for pinging me, @HyukjinKwon . |
<td>write</td> | ||
</tr> | ||
</table> | ||
Other generic options can be found in <a href="https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html"> Generic File Source Options</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I know that this is inherited, https://spark.apache.org/docs/latest/
looks fragile to me because it is going to be a broken link when we cut branch-3.2
on July 1st. In branch-3.2
, it should point 3.2
document only. Shall we use a relative link instead of /latest/
?
Like this PR, we don't know what refactoring happens in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @dongjoon-hyun .
I took a look for that but seems tricky to create a link for each release in Scaladoc ..
I created a JIRA to track it separately here: SPARK-35481.
I will take a separate look if that's fine to you too!
---------------- | ||
Extra options | ||
For the extra options, refer to | ||
`Data Source Option <https://spark.apache.org/docs/latest/sql-data-sources-orc.html#data-source-option>`_ # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto. Can we have a more robust link here?
* </ul> | ||
* ORC-specific option(s) for reading ORC files can be found in | ||
* <a href= | ||
* "https://spark.apache.org/docs/latest/sql-data-sources-orc.html#data-source-option"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
Thank you, @itholic and @HyukjinKwon . The refactoring idea looks good to me. I commented only a technical issue about the link usage. I'll leave this to @HyukjinKwon . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay looks fine with https://github.com/apache/spark/pull/32546/files#r636625094. Please update Pr description.
Thanks, @HyukjinKwon . |
Test build #138776 has finished for PR 32546 at commit
|
Merged to master. |
<tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr> | ||
<tr> | ||
<td><code>mergeSchema</code></td> | ||
<td>None</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@itholic it has the same issue. The default value isn't None
but false
.
</tr> | ||
<tr> | ||
<td><code>compression</code></td> | ||
<td>None</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case when the default value doesn't exist, you can follow https://spark.apache.org/docs/latest/configuration.html#runtime-sql-configuration (none)
.
* `DataFrameReader` | ||
* `DataFrameWriter` | ||
* `DataStreamReader` | ||
* `DataStreamWriter` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also mention:
* `OPTIONS` clause at [CREATE TABLE USING DATA_SOURCE](sql-ref-syntax-ddl-create-table-datasource.html)
What changes were proposed in this pull request?
This PR proposes move ORC data source options from Python, Scala and Java into a single page.
Why are the changes needed?
So far, the documentation for ORC data source options is separated into different pages for each language API documents. However, this makes managing many options inconvenient, so it is efficient to manage all options in a single page and provide a link to that page in the API of each language.
Does this PR introduce any user-facing change?
Yes, the documents will be shown below after this change:
"ORC Files" page

Python

Scala

Java

How was this patch tested?
Manually build docs and confirm the page.