Skip to content

[SPARK-17213][SQL] Disable Parquet filter push-down for string and binary columns due to PARQUET-686 #16106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

liancheng
Copy link
Contributor

@liancheng liancheng commented Dec 1, 2016

This PR targets to both master and branch-2.1.

What changes were proposed in this pull request?

Due to PARQUET-686, Parquet doesn't do string comparison correctly while doing filter push-down for string columns. This PR disables filter push-down for both string and binary columns to work around this issue. Binary columns are also affected because some Parquet data models (like Hive) may store string columns as a plain Parquet binary instead of a binary (UTF8).

How was this patch tested?

New test case added in ParquetFilterSuite.

@SparkQA
Copy link

SparkQA commented Dec 1, 2016

Test build #69497 has finished for PR 16106 at commit ce71cca.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 2, 2016

Test build #69511 has finished for PR 16106 at commit 7b12415.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

retest this please

@liancheng
Copy link
Contributor Author

The last build failure doesn't seem to be relevant.

@SparkQA
Copy link

SparkQA commented Dec 2, 2016

Test build #69523 has finished for PR 16106 at commit 7b12415.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Dec 2, 2016

Merging in master/branch-2.1. Thanks.

asfgit pushed a commit that referenced this pull request Dec 2, 2016
…nary columns due to PARQUET-686

This PR targets to both master and branch-2.1.

## What changes were proposed in this pull request?

Due to PARQUET-686, Parquet doesn't do string comparison correctly while doing filter push-down for string columns. This PR disables filter push-down for both string and binary columns to work around this issue. Binary columns are also affected because some Parquet data models (like Hive) may store string columns as a plain Parquet `binary` instead of a `binary (UTF8)`.

## How was this patch tested?

New test case added in `ParquetFilterSuite`.

Author: Cheng Lian <lian@databricks.com>

Closes #16106 from liancheng/spark-17213-bad-string-ppd.

(cherry picked from commit ca63916)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@asfgit asfgit closed this in ca63916 Dec 2, 2016
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 2, 2016
…nary columns due to PARQUET-686

This PR targets to both master and branch-2.1.

## What changes were proposed in this pull request?

Due to PARQUET-686, Parquet doesn't do string comparison correctly while doing filter push-down for string columns. This PR disables filter push-down for both string and binary columns to work around this issue. Binary columns are also affected because some Parquet data models (like Hive) may store string columns as a plain Parquet `binary` instead of a `binary (UTF8)`.

## How was this patch tested?

New test case added in `ParquetFilterSuite`.

Author: Cheng Lian <lian@databricks.com>

Closes apache#16106 from liancheng/spark-17213-bad-string-ppd.
@liancheng liancheng deleted the spark-17213-bad-string-ppd branch December 2, 2016 21:33
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 15, 2016
…nary columns due to PARQUET-686

This PR targets to both master and branch-2.1.

## What changes were proposed in this pull request?

Due to PARQUET-686, Parquet doesn't do string comparison correctly while doing filter push-down for string columns. This PR disables filter push-down for both string and binary columns to work around this issue. Binary columns are also affected because some Parquet data models (like Hive) may store string columns as a plain Parquet `binary` instead of a `binary (UTF8)`.

## How was this patch tested?

New test case added in `ParquetFilterSuite`.

Author: Cheng Lian <lian@databricks.com>

Closes apache#16106 from liancheng/spark-17213-bad-string-ppd.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…nary columns due to PARQUET-686

This PR targets to both master and branch-2.1.

## What changes were proposed in this pull request?

Due to PARQUET-686, Parquet doesn't do string comparison correctly while doing filter push-down for string columns. This PR disables filter push-down for both string and binary columns to work around this issue. Binary columns are also affected because some Parquet data models (like Hive) may store string columns as a plain Parquet `binary` instead of a `binary (UTF8)`.

## How was this patch tested?

New test case added in `ParquetFilterSuite`.

Author: Cheng Lian <lian@databricks.com>

Closes apache#16106 from liancheng/spark-17213-bad-string-ppd.
asfgit pushed a commit that referenced this pull request Feb 6, 2017
…y and string

## What changes were proposed in this pull request?

This PR proposes to enable the tests for Parquet filter pushdown with binary and string.

This was disabled in #16106 due to Parquet's issue but it is now revived in #16791 after upgrading Parquet to 1.8.2.

## How was this patch tested?

Manually tested `ParquetFilterSuite` via IDE.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #16817 from HyukjinKwon/SPARK-17213.
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
…y and string

## What changes were proposed in this pull request?

This PR proposes to enable the tests for Parquet filter pushdown with binary and string.

This was disabled in apache#16106 due to Parquet's issue but it is now revived in apache#16791 after upgrading Parquet to 1.8.2.

## How was this patch tested?

Manually tested `ParquetFilterSuite` via IDE.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes apache#16817 from HyukjinKwon/SPARK-17213.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants