-
Notifications
You must be signed in to change notification settings - Fork 3k
Python: Compute parquet stats #7831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
54 commits
Select commit
Hold shift + click to select a range
e16d7d4
Add function to compute parquet file metadata
maxdebayser 96adb31
Addition of docstring and extra parameter to avoid reading the file
maxdebayser ce8a5df
Refactor the statistics computation entirely to use pyarrow metadata
maxdebayser 1be86e7
Merge remote-tracking branch 'iceberg/master' into compute_parquet_stats
maxdebayser e6c3f94
Appease pre-commit hooks
maxdebayser e4e0b2b
Fix temporary path
maxdebayser a5f4ef9
Merge remote-tracking branch 'iceberg/master' into compute_parquet_stats
maxdebayser ac23783
Merge remote-tracking branch 'iceberg/master' into compute_parquet_stats
maxdebayser ed27875
Make the metrics mode configurable as documented here: https://iceber…
maxdebayser de46bef
Initialize binary serializers only once
maxdebayser 5ae5b2e
Log arrow not implemented exception
maxdebayser 33218eb
Fix None comparison expression
maxdebayser 4975e99
Add map column to test data
maxdebayser 98c93ca
Moving pyarrow specific code to io.pyarrow
maxdebayser a480539
type annotation
maxdebayser a0f44d5
Refactor the stats collection using the pyarrow visitor
maxdebayser 3e738fe
Merge remote-tracking branch 'iceberg/master' into compute_parquet_stats
maxdebayser 1d5cbbf
Clean redundant code and add warning message to the log
maxdebayser f2f001e
Merge remote-tracking branch 'iceberg/master' into compute_parquet_stats
maxdebayser dc34698
Address some of the review comments
maxdebayser e233f54
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser 8dda3fa
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser 820938a
Add tests to check of the number of columns found by the statistics
maxdebayser 9e114c8
We don't want to truncate numeric data types
maxdebayser e7a6fb8
Verify match of Iceberg types with Parquet physical types
maxdebayser 8ad7f3f
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser c965a3e
Fix truncation of upper bounds
maxdebayser 1ba46d6
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser 44dbb0c
Transform asserts to ValueErrors
maxdebayser 74a3d6a
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser cdc6eb8
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser 5b4c2f2
Add review suggestions
maxdebayser ec5fcaa
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser 4ee5036
Address simple code style review comments
maxdebayser 45abc6d
Fix potential null write
maxdebayser 7ee1ef0
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser 5898f3f
Apply function name refactoring
maxdebayser e7edf0b
Move pyarrow statistics tests to a new file
maxdebayser 6f7bd98
Disable stats computation for nested types
maxdebayser 05579ff
Modularize the fill_parquet_file_metadata function
maxdebayser aae1118
Allow metrics modes to have extra whitespace but not other trailing
maxdebayser 11b5d3a
Move upper bound truncation logic to another file
maxdebayser 4332a95
Be defensive with regards to missing row group statistics
maxdebayser 09c5955
Add tests for structs
maxdebayser c131b58
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser 5e01924
Remove special treatment of UUIDType
maxdebayser 7f768eb
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser be70fd5
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser 0be438e
Rely on parquet column path rather than column order
maxdebayser 8226a01
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser 867ea80
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser ebb604a
Change mood to imperative to appease linter
maxdebayser 640f885
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser acf6d4f
Factor out the logic to obtain the current table schema
maxdebayser File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.