Skip to content

Iceberg does not respect the avro properties from TBLPROPERTIES while writing Manifest files #5892

Closed as not planned
@sumeetgajjar

Description

@sumeetgajjar

Apache Iceberg version

0.14.1 (latest release)

Query engine

No response

Please describe the bug 🐞

Hi,

Iceberg does not respect the Avro properties i.e. write.avro.compression-codec and write.avro.compression-level from TBLPROPERTIES while writing Manifest and Manifest list files.

This is because the table properties are not forwarded to Avro WriteBuilder:

return Avro.write(file)
.schema(manifestSchema)
.named("manifest_entry")
.meta("schema", SchemaParser.toJson(spec.schema()))
.meta("partition-spec", PartitionSpecParser.toJsonFields(spec))
.meta("partition-spec-id", String.valueOf(spec.specId()))
.meta("format-version", "1")
.overwrite()
.build();

Thus the Context defaults to TableProperties#AVRO_COMPRESSION_DEFAULT i.e gzip

static Context dataContext(Map<String, String> config) {
String codecAsString = config.getOrDefault(AVRO_COMPRESSION, AVRO_COMPRESSION_DEFAULT);
String compressionLevel =
config.getOrDefault(AVRO_COMPRESSION_LEVEL, AVRO_COMPRESSION_LEVEL_DEFAULT);
CodecFactory codec = toCodec(codecAsString, compressionLevel);

Steps to reproduce

scala> sql(" CREATE TABLE tpcds_1_tb_iceberg.manifest_compression (a INT) USING iceberg TBLPROPERTIES ('write.avro.compression-codec'='zstd')")
res0: org.apache.spark.sql.DataFrame = []

scala> spark.range(10).toDF("a").coalesce(1).writeTo("tpcds_1_tb_iceberg.manifest_compression").append()

scala>
bash-5.1$ avro-tools getmeta iceberg_warehouse/tpcds_1_tb_iceberg/manifest_compression/metadata/snap-3374754284586474934-1-ac1d7acb-bbe0-484c-b4b2-4e4891a100a3.avro | grep -i avro.codec
22/09/29 16:59:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
avro.codec      deflate
bash-5.1$

Even though we set the compression to zstd, the underlying Avro file is compressed using Gzip.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions