Description
Hello!
After migrating some of our parquet tables (in Hive) to Iceberg (still parquet), I've noticed that reading the new Iceberg tables with Spark is much slower (at least / 4) than reading from the initial parquet tables. I've been trying to understand why we see such slowdown and it seems that Iceberg don't push the predicates to the parquet reader. I've written a unit test where I read one of our new Iceberg table with Spark and I always see the NoOp row group filter in the parquet reader. However, when reading one of our initial parquet tables, I see a row group filter that actually filter row groups based on statistics, dictionaries ...
Is my understanding correct? If yes, I've read many times that Iceberg supports predicate pushdown so when is it done? After reading the parquet files?
@RussellSpitzer, I'm pinging you as I've seen you've answered several questions about predicate pushdown in Iceberg.