Closed as not planned
Closed as not planned
Description
Apache Iceberg version
0.14.1 (latest release)
Query engine
No response
Please describe the bug 🐞
Hi,
Iceberg does not respect the Avro properties i.e. write.avro.compression-codec
and write.avro.compression-level
from TBLPROPERTIES
while writing Manifest and Manifest list files.
This is because the table properties are not forwarded to Avro WriteBuilder:
iceberg/core/src/main/java/org/apache/iceberg/ManifestWriter.java
Lines 293 to 301 in 731e5f0
Thus the Context
defaults to TableProperties#AVRO_COMPRESSION_DEFAULT
i.e gzip
iceberg/core/src/main/java/org/apache/iceberg/avro/Avro.java
Lines 207 to 211 in 731e5f0
Steps to reproduce
scala> sql(" CREATE TABLE tpcds_1_tb_iceberg.manifest_compression (a INT) USING iceberg TBLPROPERTIES ('write.avro.compression-codec'='zstd')")
res0: org.apache.spark.sql.DataFrame = []
scala> spark.range(10).toDF("a").coalesce(1).writeTo("tpcds_1_tb_iceberg.manifest_compression").append()
scala>
bash-5.1$ avro-tools getmeta iceberg_warehouse/tpcds_1_tb_iceberg/manifest_compression/metadata/snap-3374754284586474934-1-ac1d7acb-bbe0-484c-b4b2-4e4891a100a3.avro | grep -i avro.codec
22/09/29 16:59:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
avro.codec deflate
bash-5.1$
Even though we set the compression to zstd, the underlying Avro file is compressed using Gzip.