-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-17213][SQL] Disable Parquet filter push-down for string and binary columns due to PARQUET-686 #16106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Test build #69497 has finished for PR 16106 at commit
|
Test build #69511 has finished for PR 16106 at commit
|
retest this please |
The last build failure doesn't seem to be relevant. |
Test build #69523 has finished for PR 16106 at commit
|
Merging in master/branch-2.1. Thanks. |
asfgit
pushed a commit
that referenced
this pull request
Dec 2, 2016
…nary columns due to PARQUET-686 This PR targets to both master and branch-2.1. ## What changes were proposed in this pull request? Due to PARQUET-686, Parquet doesn't do string comparison correctly while doing filter push-down for string columns. This PR disables filter push-down for both string and binary columns to work around this issue. Binary columns are also affected because some Parquet data models (like Hive) may store string columns as a plain Parquet `binary` instead of a `binary (UTF8)`. ## How was this patch tested? New test case added in `ParquetFilterSuite`. Author: Cheng Lian <lian@databricks.com> Closes #16106 from liancheng/spark-17213-bad-string-ppd. (cherry picked from commit ca63916) Signed-off-by: Reynold Xin <rxin@databricks.com>
robert3005
pushed a commit
to palantir/spark
that referenced
this pull request
Dec 2, 2016
…nary columns due to PARQUET-686 This PR targets to both master and branch-2.1. ## What changes were proposed in this pull request? Due to PARQUET-686, Parquet doesn't do string comparison correctly while doing filter push-down for string columns. This PR disables filter push-down for both string and binary columns to work around this issue. Binary columns are also affected because some Parquet data models (like Hive) may store string columns as a plain Parquet `binary` instead of a `binary (UTF8)`. ## How was this patch tested? New test case added in `ParquetFilterSuite`. Author: Cheng Lian <lian@databricks.com> Closes apache#16106 from liancheng/spark-17213-bad-string-ppd.
robert3005
pushed a commit
to palantir/spark
that referenced
this pull request
Dec 15, 2016
…nary columns due to PARQUET-686 This PR targets to both master and branch-2.1. ## What changes were proposed in this pull request? Due to PARQUET-686, Parquet doesn't do string comparison correctly while doing filter push-down for string columns. This PR disables filter push-down for both string and binary columns to work around this issue. Binary columns are also affected because some Parquet data models (like Hive) may store string columns as a plain Parquet `binary` instead of a `binary (UTF8)`. ## How was this patch tested? New test case added in `ParquetFilterSuite`. Author: Cheng Lian <lian@databricks.com> Closes apache#16106 from liancheng/spark-17213-bad-string-ppd.
uzadude
pushed a commit
to uzadude/spark
that referenced
this pull request
Jan 27, 2017
…nary columns due to PARQUET-686 This PR targets to both master and branch-2.1. ## What changes were proposed in this pull request? Due to PARQUET-686, Parquet doesn't do string comparison correctly while doing filter push-down for string columns. This PR disables filter push-down for both string and binary columns to work around this issue. Binary columns are also affected because some Parquet data models (like Hive) may store string columns as a plain Parquet `binary` instead of a `binary (UTF8)`. ## How was this patch tested? New test case added in `ParquetFilterSuite`. Author: Cheng Lian <lian@databricks.com> Closes apache#16106 from liancheng/spark-17213-bad-string-ppd.
asfgit
pushed a commit
that referenced
this pull request
Feb 6, 2017
…y and string ## What changes were proposed in this pull request? This PR proposes to enable the tests for Parquet filter pushdown with binary and string. This was disabled in #16106 due to Parquet's issue but it is now revived in #16791 after upgrading Parquet to 1.8.2. ## How was this patch tested? Manually tested `ParquetFilterSuite` via IDE. Author: hyukjinkwon <gurwls223@gmail.com> Closes #16817 from HyukjinKwon/SPARK-17213.
cmonkey
pushed a commit
to cmonkey/spark
that referenced
this pull request
Feb 15, 2017
…y and string ## What changes were proposed in this pull request? This PR proposes to enable the tests for Parquet filter pushdown with binary and string. This was disabled in apache#16106 due to Parquet's issue but it is now revived in apache#16791 after upgrading Parquet to 1.8.2. ## How was this patch tested? Manually tested `ParquetFilterSuite` via IDE. Author: hyukjinkwon <gurwls223@gmail.com> Closes apache#16817 from HyukjinKwon/SPARK-17213.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR targets to both master and branch-2.1.
What changes were proposed in this pull request?
Due to PARQUET-686, Parquet doesn't do string comparison correctly while doing filter push-down for string columns. This PR disables filter push-down for both string and binary columns to work around this issue. Binary columns are also affected because some Parquet data models (like Hive) may store string columns as a plain Parquet
binary
instead of abinary (UTF8)
.How was this patch tested?
New test case added in
ParquetFilterSuite
.