Tracking for Iceberg cloud enablement

Roadmap for allowing distributed iceberg scans in Polars Cloud.

### MVP

* [x] Support native dispatch to `scan_parquet()` with transparent fallback
  * https://github.com/pola-rs/polars/pull/22405
  * Users can force fallback scans by passing `reader_override='pyiceberg'`
* [x] Support schema-evolved datasets via type-casting in multi-scan post-apply pipeline
  * [x] Expose cast options parameter to scan_parquet
    * https://github.com/pola-rs/polars/pull/22617
  * [x] Expose `extra_columns` parameter to scan_parquet
    * https://github.com/pola-rs/polars/pull/22699
* [x] Native support for deletion files in multi-scan post-apply pipeline
  * (1) https://github.com/pola-rs/polars/pull/23045
  * (2) https://github.com/pola-rs/polars/pull/23059
  * (3) https://github.com/pola-rs/polars/pull/23091
* [x] Set appropriate `cast_options` / `extra_columns` etc. parameters when calling native `scan_parquet()`
  * https://github.com/pola-rs/polars/pull/23416
  * [x] (Related) Enable use of `ScanCastOptions` in Delta scans by default (**ES**) https://github.com/pola-rs/polars/pull/23398
* [x] Parquet row-group skipping with type-casting
  * https://github.com/pola-rs/polars/pull/23356
* [x] Column-mapping support https://github.com/pola-rs/polars/issues/23428
  * https://github.com/pola-rs/polars/pull/23532
  * https://github.com/pola-rs/polars/pull/23671
  * https://github.com/pola-rs/polars/pull/23713
* [x] Parquet row-group skipping with column-mapping
  * https://github.com/pola-rs/polars/pull/23792

After completing the above, we should be safe to switch `scan_iceberg()` to use the native Parquet scanner by default.

### Further work
* [ ] Filtering on Iceberg statistics
* [ ] Filtering on Iceberg partitions fields
* [ ] Fast-count (physical and deleted row counts are available in the Iceberg python objects)
* [x] Parquet pre-filtering with type-casting
  * https://github.com/pola-rs/polars/pull/23792
* [ ] Parquet pre-filtering with deletion files
* [x] Parquet pre-filtering with column mapping
  * https://github.com/pola-rs/polars/pull/23792

(ES) - Related to enterprise support work


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Tracking for Iceberg cloud enablement #22450

MVP

Further work

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Tracking for Iceberg cloud enablement #22450

Description

MVP

Further work

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions