You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When joining two hash-partitioned Iceberg tables by their hash-partitioned columns, we should ensure that we either (1) pass information about the hash bucket that each file exists in (if any) as a hint to the join compute engine (e.g. Daft) so that it can automatically prune files whose records are known to not satisfy the join predicate or (2) prune these files before handing them off to the compute engine.
The 2nd approach is more flexible in terms of extending the optimization to more compute engines since it doesn't require the underlying engine to support hint-based pruning and may thus be preferred in the short term, while the 1st approach presents a more clear decoupling of responsibilities between DeltaCAT and the compute engine to aid long-term maintainability.
The text was updated successfully, but these errors were encountered:
There's also an opportunity to share common code required for general cross-catalog support for hash-bucketed compaction at: #150 (since both compute problems depend in part on efficiently detecting which files may contain records with one or more equal field values).
When joining two hash-partitioned Iceberg tables by their hash-partitioned columns, we should ensure that we either (1) pass information about the hash bucket that each file exists in (if any) as a hint to the join compute engine (e.g. Daft) so that it can automatically prune files whose records are known to not satisfy the join predicate or (2) prune these files before handing them off to the compute engine.
The 2nd approach is more flexible in terms of extending the optimization to more compute engines since it doesn't require the underlying engine to support hint-based pruning and may thus be preferred in the short term, while the 1st approach presents a more clear decoupling of responsibilities between DeltaCAT and the compute engine to aid long-term maintainability.
The text was updated successfully, but these errors were encountered: