Skip to content

Kafka Connect: Enable Parquet variant shredding for generic Record writes#16370 #16387

@soumilshah199500

Description

@soumilshah199500

Apache Iceberg version

1.10.1 (latest release)

Query engine

Kafka Connect (Iceberg Kafka Connect sink; generic Record writes)

Please describe the bug 🐞

Parquet variant shredding (write.parquet.shred-variants and related write properties) did not take effect for the generic Record Parquet path used by Kafka Connect (and similar tools). The generic ParquetFormatModel registration did not supply a variant shredding analyzer and row copier, so the writer could not buffer rows and infer typed_value Parquet columns. Separately, a Record-based analyzer that relied on resolveColumnIndex with a Void engine schema never obtained valid column indices, so VARIANT columns were not analyzed and shredding stayed inactive.

Expected: With shredding enabled on the table / via connector write props, VARIANT data written through the Connect sink should produce Parquet with expanded typed_value paths (consistent with other engines).

Actual: Writes behaved like shredding was off for this code path (few physical columns; no typed subtree materialization).

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Pull request

A proposed fix is in PR #16370Kafka Connect: Enable Parquet variant shredding for generic Record writes (wires RecordVariantShreddingAnalyzer + Record::copy in GenericFormatModels, and analyzes VARIANT columns using Iceberg Schema#columns() order).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions