Closed
Description
TODOs for the Iceberg Connector
- Update the README to reflect the current status, and convert it to proper connector documentation before announcing the connector as ready for use (Iceberg connector documentation #4537, Remove Iceberg README.md #5887)
- Lower case all field names read from Iceberg metadata files.
- Fix table listing to skip non-Iceberg tables. This will need a new metastore method to list tables filtered on a property name, similar to how view listing works in
ThriftHiveMetastore
. - Predicate pushdown is currently broken, which means delete is also broken. The code from the original
getTableLayouts()
implementation needs to be updated forapplyFilter()
. - Delete is broken and should be fixed. Note that unlike Hive Connector, Iceberg Connector should support row-by-row deletion.
- All of the
HdfsContext
calls that use/tmp
need to be fixed. -
HiveConfig
needs to be removed. We might need to split out separate config classes in the Hive connector for the components that are reused in Iceberg. - We should try to remove
HiveColumnHandle
. This will require replacing or abstractingHivePageSource
, which is currently used to handle schema evolution and prefilled column values (identity partitions). - Writing of decimals and timestamps is broken, since their representation in Parquet seems to be different for Iceberg and Hive. Reads are probably also broken, but this isn't tested yet since writes don't work. We will need a native Parquet writer to fix this.
-
UUID type is not implemented and will be dropped from the Iceberg specification. - Support Iceberg's UUID in the Iceberg connector #6663
- Implement time type.
- Partition table
- History table
- Snapshots table
- Manifests table
- Files table
- Return table statistics so CBO can leverage them.
- Add implementation and tests for table comments.
- Add implementation and tests for column comments.
- Needs complete tests for all data types and all partitioning transforms.
- Needs integration tests (probably as product tests) for interoperability with Spark in both directions (write Spark -> read Presto, write Presto -> read Spark).
- Needs correctness tests for partition pruning. (also validate the pushdown is happening by checking the query plans?) Iceberg: test partition pruning #2660
- Add tests for
CREATE TABLE LIKE
. - Add test for creating
NOT NULL
columns. - Add tests for non-Iceberg tables: listing tables in a schema, listing columns in a schema, describing a table, selecting from a table) Add Iceberg tests for Hive tables in the same metastore #5459
- Add product tests: Add Iceberg product tests with HDFS and metastore impersonation #2304
- Add a procedure to migrate Hive tables to Iceberg #13196
- Determine and support appropriate schema evolution semantics for Iceberg table with legacy Hive files #9843
- Add procedure for rollback table to snapshot.
- ORC support Iceberg Connector ORC support #2042
- Add support for AVRO in Iceberg #12125
- Allow querying Iceberg table by its location, without registering it in metastore #2298
-
NOT NULL
enforcement -
location
orexternal_location
table property Iceberg integration: allow to specify LOCATION property on CREATE TABLE #2501 - Use metastore locking around read-modify-write operations for transaction commit Data loss due to lack of commit orchestration in Iceberg #9583
- Iceberg commit retries Iceberg commit retries #9582
- Add tests for truncate on numeric types Add Iceberg tests for truncate on numeric types #5456
- Add tests for partition transforms on structured types Add Iceberg tests for partition transforms on structured types #5458
- Add tests for Hive tables in the same metastore Add Iceberg tests for Hive tables in the same metastore #5459
- Dereference Pushdown for Iceberg Connector Dereference Pushdown for Iceberg Connector #5179
- Flaky test TestIcebergCreateTable.testCreateTable Flaky test TestIcebergCreateTable.testCreateTable #4864
- Add support for partition evolution Add support for partition evolution in Iceberg. #7580
- Trino cannot read an Iceberg table that has dropped a partition field Trino cannot read an Iceberg table that has dropped a partition field #8284
- Test bucketing consistency and stability, like Hive's
TestHiveBucketing
. - Support predicate pushdown and metadata deletion for non-partition columns Support predicate pushdown and deletion for non-identity partition columns in Iceberg #7905
- Run Iceberg product tests with all tested Hive distributions Run Iceberg product tests with all tested Hive distributions #7898
- Improve test coverage around partitioned tables and $partition system table Improve test coverage for partitioned tables and partition system table in Iceberg #7972
- Add support for Trino views in Iceberg connector Add support for Trino views in Iceberg connector #8540
- Add support for Iceberg void transform #8623
- Fix reading of specific Iceberg snapshots Fix reading of specific Iceberg snapshots #8663
- Accessing non-existent Iceberg system table should result in "table not found" #8690
- Properly reject Iceberg tables in Hive connector #8693
- Support use-preferred-write-partitioning for the Iceberg connector Support use-preferred-write-partitioning for the Iceberg connector #8682
- Improve performance of Iceberg decimal bucket transform Improve performance of Iceberg decimal bucket transform #8724
- Unexpected results when reading Iceberg Parquet table after nested field schema evolved Unexpected results when reading Iceberg Parquet table after nested field schema evolved #8750
- Evaluate Apache Iceberg's support for predicate on structural types Evaluate Apache Iceberg's support for predicate on structural types #8759
- Add $file hidden column in Iceberg connector Add $path hidden column in Iceberg connector #8769
- Populate split_offsets in Iceberg data files Populate split_offsets in Iceberg metadata #9018
- Iceberg partition pruning does not work for predicates not expressible by tuple domain Iceberg partition pruning does not work for predicates not expressible by tuple domain #9309
- Support dynamic filtering in Iceberg connector #4115
- Support Glue metastore in Iceberg connector Support Glue metastore in Iceberg connector #9363
- Excessive metastore invocations when querying Iceberg table Excessive metastore invocations when querying Iceberg table #8675
- Reject Hive configuration properties that have no meaning for Iceberg Reject Hive configuration properties that have no meaning for Iceberg #9607
- IcebergSplitSource: Support large IN predicates IcebergSplitSource: Support large
IN
predicates #9743 - Incorrect query results for Iceberg table partitioned on varbinary / binary Incorrect query results for Iceberg table partitioned on varbinary / binary #9755
- Revamp Iceberg statistics reporting Iceberg table statistics are non-deterministic #9716
- SHOW STATS fails with NPE when Iceberg file has no columns with stats SHOW STATS fails with NPE when Iceberg file has no columns with stats #9714
- SHOW STATS fails if Iceberg metadata has no statistics for a file SHOW STATS fails if Iceberg metadata has no statistics for a file #9707
- Incorrect values returned when Iceberg table partitioned by timestamp with time zone Incorrect values returned when Iceberg table partitioned by timestamp with time zone #9704
- Query failure when reading from $partitions when Iceberg table partitioned on timestamp with time zone Query failure when reading from
$partitions
when Iceberg table partitioned on timestamp with time zone #9703 - Garbage return value from Iceberg $partitions for varbinary non-partition column Garbage return value from Iceberg
$partitions
for varbinary non-partition column #9756 - IcebergSplitSource throws away CombinedScanTask combinations
IcebergSplitSource
throws awayCombinedScanTask
combinations #8486 - Iceberg Connector ORC writer writes incorrect file size #9810
- Improve DeterminePreferredWritePartitioning for projection-based partitioning like in Iceberg #9852
- Feature Request: support for Iceberg Dynamodb catalogues #9953
- Feature request: Support for jdbc catalog in Iceberg Connector #9968
- Support tinyint and smallint when reading Iceberg ORC files #8919
- Use ZSTD by default in Iceberg #10058
- Support Metrics mode when creating/writing Iceberg tables #9791
- Add support to redirect table reads from Hive to Iceberg #10173
- Add support to redirect table reads from Iceberg to Hive #10245
- support of iceberg v2 format #10758
- Respect hive.target-max-file-size in Iceberg #10786
- Skip reading Parquet pages using Column Indexes for Iceberg #11000
- Support migrating Iceberg v1 tables to v2 #12138
- Implement UPDATE for the Iceberg connector #12026
- Ability to OPTIMIZE a single time-based partition in Iceberg #12362
- Support Iceberg time travel #10258
- Iceberg snapshot queries use the latest schema of the table #12743
- Filter Iceberg splits based on $path column predicates #12785
- Iceberg table name on s3 #5632
- Skip Glue archive in Iceberg table commits #13413