Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metadata_size_hint for optimistic fetching of parquet metadata #2946

Merged
merged 6 commits into from
Jul 21, 2022
Prev Previous commit
Next Next commit
Update datafusion/core/src/datasource/file_format/parquet.rs
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
  • Loading branch information
thinkharderdev and alamb committed Jul 20, 2022
commit 97249e17dc8a73ef0ac51fc50cbd4bcdfd25aa8d
2 changes: 2 additions & 0 deletions datafusion/core/src/datasource/file_format/parquet.rs
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,8 @@ pub(crate) async fn fetch_parquet_metadata(
)));
}

// If a size hint is provided, read more than the minimum size
// to try and avoid a second fetch.
let footer_start = if let Some(size_hint) = size_hint {
thinkharderdev marked this conversation as resolved.
Show resolved Hide resolved
meta.size - size_hint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if size_hit is larger than the file size?

Maybe we could used saturating_sub here instead: https://doc.rust-lang.org/std/primitive.usize.html#method.saturating_sub

} else {
Expand Down