Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[parquet] Avoid read parquet index when there is no filter pushdown. #6317

Merged
merged 4 commits into from
May 11, 2023

Conversation

Ted-Jiang
Copy link
Member

Which issue does this PR close?

Closes #6315

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@Ted-Jiang Ted-Jiang requested a review from alamb May 10, 2023 08:02
@github-actions github-actions bot added the core Core DataFusion crate label May 10, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Ted-Jiang -- this is looking good. I think we should remove the unwrap to avoid panics but otherwise I think this is looking good

datafusion/core/src/physical_plan/file_format/parquet.rs Outdated Show resolved Hide resolved
@@ -222,6 +222,11 @@ impl PagePruningPredicate {
file_metrics.page_index_rows_filtered.add(total_skip);
Ok(Some(final_selection))
}

/// Returns the number of filters in the [`PagePruningPredicate`]
pub fn filter_number(&self) -> usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the use of this code, I think using is_empty() rather than len() would be clearer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think need the number of filters in future 😆

.unwrap();

// Without filter will not read pageIndex.
assert!(bytes_scanned_with_filter > bytes_scanned_without_filter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice test

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Ted-Jiang -- this is looking good. I think we should remove the unwrap to avoid panics but otherwise I think this is looking good

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels May 11, 2023
@github-actions github-actions bot removed logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) physical-expr Physical Expressions labels May 11, 2023
@Ted-Jiang Ted-Jiang requested a review from alamb May 11, 2023 05:51
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- thank you @Ted-Jiang

@alamb alamb merged commit a07d6eb into apache:main May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid read parquet index when there is no filter pushdown
2 participants