Skip to content

Spark aggreation by partition could use metadata files #11394

Open
@lrpt

Description

@lrpt

Hello everybody,
I have a apache iceberg table in aws glue, this table is partitioned by string year-month.
When I do a spark.sql("select count(1),partition_field from table group by partition_field"). Spark goes through every file and perform the count. Cant spark engine use just the metadata files as each underlying data file just contains data from one partition.
(I dont have any delete file)

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requestedspark

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions