Skip to content

[SPARK-30505][DOCS] Deprecate Avro option ignoreExtension in sql-data-sources-avro.md #27194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jan 13, 2020

What changes were proposed in this pull request?

Updated docs/sql-data-sources-avro.md, and added a few sentences about already deprecated in code Avro option ignoreExtension.

Screen Shot 2020-01-15 at 10 24 14

Closes #27174

Why are the changes needed?

To make users doc consistent to the code where ignoreExtension has been already deprecated, see

logWarning(s"Option ${AvroOptions.ignoreExtensionKey} is deprecated. Please use the " +
"general data source option pathGlobFilter for filtering file names.")

Does this PR introduce any user-facing change?

No

How was this patch tested?

by building docs

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 13, 2020

This follows up #27174 (comment)

@@ -230,7 +230,7 @@ Data source options of Avro can be set via:
<tr>
<td><code>ignoreExtension</code></td>
<td>true</td>
<td>The option controls ignoring of files without <code>.avro</code> extensions in read.<br> If the option is enabled, all files (with and without <code>.avro</code> extension) are loaded.</td>
<td>The option controls ignoring of files without <code>.avro</code> extensions in read.<br> If the option is enabled, all files (with and without <code>.avro</code> extension) are loaded.<br> The option has been already deprecated, and it will be removed in the future releases. Please use the general data source option <code>pathGlobFilter</code> for filtering file names.</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work.
Could you also create a doc entry for the option pathGlobFilter and link to it here in this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it here

@SparkQA
Copy link

SparkQA commented Jan 13, 2020

Test build #116659 has finished for PR 27194 at commit 5016556.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 13, 2020

Test build #116660 has finished for PR 27194 at commit 91a1dee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member

Hi @MaxGekk

Sorry, I meant a new document for the option pathGlobFilter, which is not for Avro only.

I have come up with a good idea: https://issues.apache.org/jira/browse/SPARK-30506
With the new documentation, we can link the option pathGlobFilter in the Avro data source doc.

Are you interested in writing the new doc? It should be very helpful for users.

@HyukjinKwon
Copy link
Member

Yeah, discussed with @gengliangwang offline. I think that's a better option.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 14, 2020

I described the global option because it has been already described in each load method - json, text, parquet ... in DataFrameReader, for example

* <li>`pathGlobFilter`: an optional glob pattern to only include files with paths matching

Are you interested in writing the new doc?

No, I am not.

If you think, description of the option pathGlobFilter is useless in sql-data-sources-avro.md. Let me know, I will revert it from the PR but I would leave it as in this PR.

@gengliangwang
Copy link
Member

@MaxGekk I see. I didn't know that the option pathGlobFilter is documented in DataFrameReader.scala.

I think reverting the description of the option pathGlobFilter in sql-data-sources-avro.md makes sense.

Still, we need to document the options in SPARK-30506. I will find someone else or do it myself.

@SparkQA
Copy link

SparkQA commented Jan 14, 2020

Test build #116687 has finished for PR 27194 at commit bfbaf2b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the PR description. The following looks incorrect. It's ORC related PR which is already closed.

Closes #27179

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 14, 2020

@dongjoon-hyun I have fixed the PR number related to Avro

@@ -230,7 +230,7 @@ Data source options of Avro can be set via:
<tr>
<td><code>ignoreExtension</code></td>
<td>true</td>
<td>The option controls ignoring of files without <code>.avro</code> extensions in read.<br> If the option is enabled, all files (with and without <code>.avro</code> extension) are loaded.</td>
<td>The option controls ignoring of files without <code>.avro</code> extensions in read.<br> If the option is enabled, all files (with and without <code>.avro</code> extension) are loaded.<br> The option has been already deprecated, and it will be removed in the future releases. Please use the general data source option <code>pathGlobFilter</code> for filtering file names.</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

been already deprecated -> been deprecated.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please provide the new screenshot of the updated page at PR description.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 15, 2020

I have attached the screenshot.

@SparkQA
Copy link

SparkQA commented Jan 15, 2020

Test build #116757 has finished for PR 27194 at commit 7561342.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I am just going to merge this - for other file-based source options, we could do it separately.

@dongjoon-hyun
Copy link
Member

+1, late LGTM.

@MaxGekk MaxGekk deleted the avro-doc-deprecation-ignoreExtension branch June 5, 2020 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants