-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
minor: Split parquet reader up into smaller modules #4099
Conversation
builder = builder.with_row_selection(final_selection.into()); | ||
} | ||
} | ||
// page index pruning: if all data on individual pages can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core change of this PR is to refactor build_page_filter
into a function and move its code into the page_filter
module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
use super::metrics::ParquetFileMetrics; | ||
|
||
/// Create a RowSelection that may rule out ranges of rows based on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I polished up some of the existing doc comments to try and explain what this module is doing
@@ -2193,65 +1765,6 @@ mod tests { | |||
ParquetFileMetrics::new(0, "file.parquet", &metrics) | |||
} | |||
|
|||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to page_filter
module
This needs #4101 to pass Ci so marking as draft until then |
0d1abb7
to
e920171
Compare
builder = builder.with_row_selection(final_selection.into()); | ||
} | ||
} | ||
// page index pruning: if all data on individual pages can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Looks more readable! Thanks @alamb 👍 |
Benchmark runs are scheduled for baseline = fc03001 and contender = f61b43a. f61b43a is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
re #3462
Rationale for this change
There is a lot more parquet reading code now after @Ted-Jiang 's great work to add filter pushdown in #3780 and #3967
Let's move it into smaller modules so it is easier to work with
What changes are included in this PR?
physical_plan/file_format/row_filter.rs
tophysical_plan/file_format/parquet/row_filter.rs
to make it clear this is part of the parquet format implementationdatafusion/core/src/physical_plan/file_format/parquet/page_filter.rs
datafusion/core/src/physical_plan/file_format/parquet/metrics.rs
Are there any user-facing changes?
No
There should be no functional change (none is intended)