Spark iceberg runtime - predicate pushdown in parquet reader

Hello!

After migrating some of our parquet tables (in Hive) to Iceberg (still parquet), I've noticed that reading the new Iceberg tables with Spark is much slower (at least / 4) than reading from the initial parquet tables. I've been trying to understand why we see such slowdown and it seems that Iceberg don't push the predicates to the parquet reader. I've written a unit test where I read one of our new Iceberg table with Spark and I always see the NoOp row group filter in the parquet reader. However, when reading one of our initial parquet tables, I see a row group filter that actually filter row groups based on statistics, dictionaries ...
Is my understanding correct? If yes, I've read many times that Iceberg supports predicate pushdown so when is it done? After reading the parquet files? 

@RussellSpitzer, I'm pinging you as I've seen you've answered several questions about predicate pushdown in Iceberg.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark iceberg runtime - predicate pushdown in parquet reader #12428

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Spark iceberg runtime - predicate pushdown in parquet reader #12428

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions