Enable Parquet Row and Page Filtering by default (WIP)#3828
Closed
alamb wants to merge 2 commits intoapache:mainfrom
Closed
Enable Parquet Row and Page Filtering by default (WIP)#3828alamb wants to merge 2 commits intoapache:mainfrom
alamb wants to merge 2 commits intoapache:mainfrom
Conversation
6 tasks
a0cb27c to
e37e7b9
Compare
e37e7b9 to
9359dfa
Compare
Contributor
Author
|
A small update here is that when I ran the tpch benchmarks against the default parquet files created by the benchmark I did not see any improvement. Also, there was some sort of error with the page index code which I need to track down |
Contributor
Author
|
Specifically made the parquet files like this: And then ran FYI @Ted-Jiang -- haven't had a chance to file this as a ticket or look more carefully into it |
Member
Thanks for testing this, i will try to figure it out tomorrow. |
Member
9359dfa to
c249b07
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft until
ConfigOptions#3822 is mergedWhich issue does this PR close?
Closes #3463
closes #4085
re #3462
Rationale for this change
This PR turns on parquet scan predicate pushdown (see #3462) by default -- I am putting it up early as part of the testing process (so we can work through any issues it may uncover)
This feature promises to be one of the most significant performance improvements for DataFusion reading from parquet in a while. All the hard work was done by @Ted-Jiang @thinkharderdev and @tustvold
What changes are included in this PR?
Enable pushing filters into the scan directly
Note this feature can be disabled by setting the
datafusion.execution.parquet.pushdown_filtersconfiguration setting to false.Are there any user-facing changes?
Hopefully faster performance