Add Apache Iceberg integration with MinIO #3

matthew-B-123 · 2025-10-16T21:58:19Z

Core changes for Iceberg + MinIO integration:

Docker Compose: Add MinIO service and Iceberg configuration
- MinIO for S3-compatible object storage
- Iceberg runtime packages for Spark
- S3 endpoint and credential configuration
Spark Submit: Add Iceberg packages and S3/MinIO configs
- iceberg-spark-runtime-3.5_2.12:1.10.0
- hadoop-aws:3.3.4 for S3 filesystem support
- Hadoop catalog configuration for Iceberg tables
- MinIO S3 endpoint configuration
TableUtils: Fix partition detection for Iceberg tables
- Disable Hive partition checking (returns empty list)
- Treat non-partitioned tables as having all data available
- Enables Chronon to process Iceberg tables without partition metadata
Build: Enable Spark 3.5 compilation
- Set use_spark_3_5 flag for Spark 3.5 compatibility

This enables end-to-end data flow: S3/MinIO → Iceberg → Chronon → Iceberg

Summary

Why / Goal

Test Plan

Added Unit Tests
Covered by existing CI
Integration tested

Checklist

Documentation update

Reviewers

Core changes for Iceberg + MinIO integration: 1. Docker Compose: Add MinIO service and Iceberg configuration - MinIO for S3-compatible object storage - Iceberg runtime packages for Spark - S3 endpoint and credential configuration 2. Spark Submit: Add Iceberg packages and S3/MinIO configs - iceberg-spark-runtime-3.5_2.12:1.10.0 - hadoop-aws:3.3.4 for S3 filesystem support - Hadoop catalog configuration for Iceberg tables - MinIO S3 endpoint configuration 3. TableUtils: Fix partition detection for Iceberg tables - Disable Hive partition checking (returns empty list) - Treat non-partitioned tables as having all data available - Enables Chronon to process Iceberg tables without partition metadata 4. Build: Enable Spark 3.5 compilation - Set use_spark_3_5 flag for Spark 3.5 compatibility This enables end-to-end data flow: S3/MinIO → Iceberg → Chronon → Iceberg

huangjiang-zhou · 2025-10-16T22:20:37Z

spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala

-      .sql(s"SHOW PARTITIONS $tableName")
-      .collect()
-      .map(row => parseHivePartition(row.getString(0)))
+    // NUCLEAR OPTION: Disable all partition checking


we can not remove the partition, but we should replace parseHivePartition with something parse Iceberg partition

huangjiang-zhou · 2025-10-16T22:24:18Z

spark/src/main/scala/ai/chronon/spark/catalog/TableUtils.scala

          }
          .map(partitionSpec.shift(_, inputToOutputShift))
+
+        // NUCLEAR FIX: If no partitions found (Iceberg/non-partitioned tables), 


Don't do that

matthew-B-123 marked this pull request as ready for review October 16, 2025 22:07

matthew-B-123 added 2 commits October 16, 2025 15:08

docker update

1562189

matthew-B-123 force-pushed the iceberg-core-integration branch from 4699885 to 1562189 Compare October 16, 2025 22:09

huangjiang-zhou reviewed Oct 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Apache Iceberg integration with MinIO #3

Add Apache Iceberg integration with MinIO #3

Uh oh!

matthew-B-123 commented Oct 16, 2025

Uh oh!

huangjiang-zhou Oct 16, 2025

Uh oh!

huangjiang-zhou Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add Apache Iceberg integration with MinIO #3

Are you sure you want to change the base?

Add Apache Iceberg integration with MinIO #3

Uh oh!

Conversation

matthew-B-123 commented Oct 16, 2025

Summary

Why / Goal

Test Plan

Checklist

Reviewers

Uh oh!

huangjiang-zhou Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

huangjiang-zhou Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants