Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
e16d7d4
Add function to compute parquet file metadata
maxdebayser Jun 13, 2023
96adb31
Addition of docstring and extra parameter to avoid reading the file
maxdebayser Jun 13, 2023
ce8a5df
Refactor the statistics computation entirely to use pyarrow metadata
maxdebayser Jun 26, 2023
1be86e7
Merge remote-tracking branch 'iceberg/master' into compute_parquet_stats
maxdebayser Jun 26, 2023
e6c3f94
Appease pre-commit hooks
maxdebayser Jun 26, 2023
e4e0b2b
Fix temporary path
maxdebayser Jun 26, 2023
a5f4ef9
Merge remote-tracking branch 'iceberg/master' into compute_parquet_stats
maxdebayser Jul 6, 2023
ac23783
Merge remote-tracking branch 'iceberg/master' into compute_parquet_stats
maxdebayser Jul 10, 2023
ed27875
Make the metrics mode configurable as documented here: https://iceber…
maxdebayser Jul 10, 2023
de46bef
Initialize binary serializers only once
maxdebayser Jul 10, 2023
5ae5b2e
Log arrow not implemented exception
maxdebayser Jul 10, 2023
33218eb
Fix None comparison expression
maxdebayser Jul 10, 2023
4975e99
Add map column to test data
maxdebayser Jul 10, 2023
98c93ca
Moving pyarrow specific code to io.pyarrow
maxdebayser Jul 10, 2023
a480539
type annotation
maxdebayser Jul 10, 2023
a0f44d5
Refactor the stats collection using the pyarrow visitor
maxdebayser Jul 12, 2023
3e738fe
Merge remote-tracking branch 'iceberg/master' into compute_parquet_stats
maxdebayser Jul 12, 2023
1d5cbbf
Clean redundant code and add warning message to the log
maxdebayser Jul 13, 2023
f2f001e
Merge remote-tracking branch 'iceberg/master' into compute_parquet_stats
maxdebayser Jul 13, 2023
dc34698
Address some of the review comments
maxdebayser Jul 18, 2023
e233f54
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser Jul 18, 2023
8dda3fa
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser Jul 28, 2023
820938a
Add tests to check of the number of columns found by the statistics
maxdebayser Jul 28, 2023
9e114c8
We don't want to truncate numeric data types
maxdebayser Jul 28, 2023
e7a6fb8
Verify match of Iceberg types with Parquet physical types
maxdebayser Jul 30, 2023
8ad7f3f
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser Jul 30, 2023
c965a3e
Fix truncation of upper bounds
maxdebayser Jul 31, 2023
1ba46d6
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser Jul 31, 2023
44dbb0c
Transform asserts to ValueErrors
maxdebayser Aug 1, 2023
74a3d6a
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser Aug 1, 2023
cdc6eb8
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser Aug 4, 2023
5b4c2f2
Add review suggestions
maxdebayser Aug 4, 2023
ec5fcaa
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser Aug 7, 2023
4ee5036
Address simple code style review comments
maxdebayser Aug 7, 2023
45abc6d
Fix potential null write
maxdebayser Aug 7, 2023
7ee1ef0
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser Aug 9, 2023
5898f3f
Apply function name refactoring
maxdebayser Aug 9, 2023
e7edf0b
Move pyarrow statistics tests to a new file
maxdebayser Aug 9, 2023
6f7bd98
Disable stats computation for nested types
maxdebayser Aug 9, 2023
05579ff
Modularize the fill_parquet_file_metadata function
maxdebayser Aug 9, 2023
aae1118
Allow metrics modes to have extra whitespace but not other trailing
maxdebayser Aug 9, 2023
11b5d3a
Move upper bound truncation logic to another file
maxdebayser Aug 9, 2023
4332a95
Be defensive with regards to missing row group statistics
maxdebayser Aug 9, 2023
09c5955
Add tests for structs
maxdebayser Aug 9, 2023
c131b58
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser Aug 10, 2023
5e01924
Remove special treatment of UUIDType
maxdebayser Aug 10, 2023
7f768eb
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser Aug 18, 2023
be70fd5
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser Aug 21, 2023
0be438e
Rely on parquet column path rather than column order
maxdebayser Aug 21, 2023
8226a01
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser Aug 25, 2023
867ea80
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser Sep 12, 2023
ebb604a
Change mood to imperative to appease linter
maxdebayser Sep 12, 2023
640f885
Merge branch 'master' of https://github.com/apache/iceberg into compu…
maxdebayser Sep 15, 2023
acf6d4f
Factor out the logic to obtain the current table schema
maxdebayser Sep 16, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions python/pyiceberg/avro/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,8 @@
# under the License.
import struct

STRUCT_BOOL = struct.Struct("?")
STRUCT_FLOAT = struct.Struct("<f") # little-endian float
STRUCT_DOUBLE = struct.Struct("<d") # little-endian double
STRUCT_INT32 = struct.Struct("<i")
STRUCT_INT64 = struct.Struct("<q")
Loading