Description
Currently, IcebergFilesCommitter
will validate all snapshot history for every time commit new snapshot in commitDeltaTxn
. That means that the same snapshot will be verified multiple times, and take a lot of time to read manifests and manifest file. And That is the reason why for IcebergFilesCommitter
need opening multiple Avro metadata files and take several minutes
in #2900 (comment) (the more detailed reason is that flink will call notifyCheckpointComplete(ckptId)
immediately after calling snapshotState(ckptId)
, and committer will travel all snapshot history
to verify whether the data files which are referenced by pos-delete files still exists. That will block the commiter thread and make snapshotState(ckptId+1)
timeout if hdfs response slow or table has too many manifest file need to travel.)
I think IcebergFilesCommitter
doesn't need to validate all snapshot history for every commit, just need to validate snapshots between last committed snapshot id and current snapshot id. For IcebergFilesCommitter
first commit, we still need to travel all snapshot history to ensure referenced data files still exists.
Activity