At e6data, we are big admirers of Apache Iceberg. We're witnessing a steep increase in its adoption, with our customers running E6data's query engine for heavy workloads.
While we were scrambling for resources on the internet to optimize Iceberg, why not curate it for the rest of the community?
Here's a curated collection of links, guides, and insights to help you discover the best practices for optimizing your Iceberg tables.
-
Optimization Strategies for Iceberg Tables by Cloudera
An overview of strategies to optimize Iceberg table performance, including partitioning, file format selection, and maintenance. -
Best Practices for Optimizing Apache Iceberg Workloads by AWS
General best practices for improving Iceberg workloads in AWS environments. -
How Amazon Ads Uses Iceberg Optimizations to Accelerate Their Spark Workload on Amazon S3
A case study on how Amazon Ads leverages Iceberg optimizations to enhance Spark workloads on Amazon S3. -
Improve Operational Efficiencies of Apache Iceberg Tables Built on Amazon S3 Data Lakes
Insights into enhancing the operational efficiency of Iceberg tables within Amazon S3 data lakes.
-
Partitioning and Indexing in Apache Iceberg by IOMETE
A guide to leveraging partitioning and indexing for improved query performance in Iceberg tables. -
Iceberg 101: A Guide to Iceberg Partitioning by Upsolver
Learn the basics and advanced techniques for partitioning Iceberg tables. -
Improving Performance with Iceberg Sorted Tables by Starburst
Techniques for improving query performance using sorted tables in Iceberg. -
How Z-Ordering in Apache Iceberg Helps Improve Performance by Dremio
Video series explaining how Z-ordering optimizes data layout for Iceberg tables. -
Z-Order Sorting During Compaction by IOMETE
A detailed blog on the role of Z-order sorting during compaction.
-
Compaction in Apache Iceberg: Fine-Tuning Your Iceberg Table’s Data Files by Dremio
Insights into compacting small files into larger ones for better performance and storage efficiency. -
Maintaining Tables by Using Compaction by AWS
Best practices for maintaining Iceberg tables through compaction.
-
Iceberg 101: Ten Tips to Optimize Performance by Upsolver
Ten actionable tips to enhance the performance of Iceberg tables. -
Amazon EMR 7.1 Runtime for Apache Spark and Iceberg Can Run Spark Workloads 2.7 Times Faster Than Apache Spark 3.5.1 and Iceberg 1.5.2
Details on how Amazon EMR's runtime optimizations significantly speed up Spark workloads using Iceberg.
-
Optimizing Read Performance by AWS
Guidelines for optimizing read performance when working with Iceberg tables. -
Accelerate Query Performance with Apache Iceberg Statistics on the AWS Glue Data Catalog
How leveraging Iceberg statistics within AWS Glue Data Catalog can speed up query performance.
- Optimizing Write Performance by AWS
Strategies for ensuring write operations are efficient in Iceberg workloads.
-
Optimizing Storage by AWS
Tips for reducing storage costs and improving efficiency when using Iceberg tables. -
Manage and Optimize Iceberg Tables for Efficient Data Storage and Querying by AWS
A guide to managing Iceberg tables for both storage and query efficiency.
- Apache Iceberg Official Documentation (v1.6.0)
Detailed technical documentation covering Iceberg table performance optimization.
Feel free to contribute to this resource list by suggesting additional articles, tools, or best practices for Apache Iceberg.