Apache Iceberg version
1.10.1 (latest release)
Query engine
Kafka Connect (Iceberg Kafka Connect sink; generic Record writes)
Please describe the bug 🐞
Parquet variant shredding (write.parquet.shred-variants and related write properties) did not take effect for the generic Record Parquet path used by Kafka Connect (and similar tools). The generic ParquetFormatModel registration did not supply a variant shredding analyzer and row copier, so the writer could not buffer rows and infer typed_value Parquet columns. Separately, a Record-based analyzer that relied on resolveColumnIndex with a Void engine schema never obtained valid column indices, so VARIANT columns were not analyzed and shredding stayed inactive.
Expected: With shredding enabled on the table / via connector write props, VARIANT data written through the Connect sink should produce Parquet with expanded typed_value paths (consistent with other engines).
Actual: Writes behaved like shredding was off for this code path (few physical columns; no typed subtree materialization).
Willingness to contribute
Pull request
A proposed fix is in PR #16370 — Kafka Connect: Enable Parquet variant shredding for generic Record writes (wires RecordVariantShreddingAnalyzer + Record::copy in GenericFormatModels, and analyzes VARIANT columns using Iceberg Schema#columns() order).
Apache Iceberg version
1.10.1 (latest release)
Query engine
Kafka Connect (Iceberg Kafka Connect sink; generic
Recordwrites)Please describe the bug 🐞
Parquet variant shredding (
write.parquet.shred-variantsand related write properties) did not take effect for the genericRecordParquet path used by Kafka Connect (and similar tools). The genericParquetFormatModelregistration did not supply a variant shredding analyzer and row copier, so the writer could not buffer rows and infertyped_valueParquet columns. Separately, aRecord-based analyzer that relied onresolveColumnIndexwith aVoidengine schema never obtained valid column indices, so VARIANT columns were not analyzed and shredding stayed inactive.Expected: With shredding enabled on the table / via connector write props, VARIANT data written through the Connect sink should produce Parquet with expanded typed_value paths (consistent with other engines).
Actual: Writes behaved like shredding was off for this code path (few physical columns; no typed subtree materialization).
Willingness to contribute
Pull request
A proposed fix is in PR #16370 — Kafka Connect: Enable Parquet variant shredding for generic Record writes (wires
RecordVariantShreddingAnalyzer+Record::copyinGenericFormatModels, and analyzes VARIANT columns using IcebergSchema#columns()order).