Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet: Fix column pruning for deeply nested fields #12634

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sriharshaj
Copy link

@sriharshaj sriharshaj commented Mar 25, 2025

This PR addresses a performance issue related to column pruning in nested structures when reading Iceberg tables using Parquet files.

Issue

In the current implementation of PruneColumns#struct, we are including the entire parent column in the read schema, even when only a subset of nested fields is required. This occurs because:

The originalField is being added to the filteredFields regardless of whether the passed filtered field is already present. This behavior causes the entire parent column to be read if any leaf node within the nested structure is selected in a query.

Proposed solution

I updated the code to use passed filtered field when present, and use originalField when passed filtered field is null.

@sriharshaj sriharshaj marked this pull request as ready for review March 25, 2025 02:27
@sriharshaj
Copy link
Author

I am not sure whether the use of originalField even when field is present is intentional behavior or a bug.

@sriharshaj
Copy link
Author

sriharshaj commented Mar 25, 2025

Following can be the issues related to this:
#12428
#11617

@sriharshaj sriharshaj force-pushed the fix-parquet-column-pruning branch from 213bf7b to 69dfcd9 Compare March 26, 2025 16:13
@amogh-jahagirdar amogh-jahagirdar self-requested a review March 26, 2025 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant