Skip to content

Enable native BatchScanExec for Iceberg COW tables #1676

@ShreyeshArangath

Description

@ShreyeshArangath

Is your feature request related to a problem? Please describe.
As detailed in #1472, Auron currently doesn’t support native execution for DSv2 reads using Iceberg. BatchScanExec plans for Iceberg tables are executed by Spark only, so Iceberg reads (especially COW tables) cannot benefit from native acceleration.

Describe the solution you'd like
Introduce a NativeIcebergBatchScanExec that:

  • Hooks into the existing convert provider infrastructure to convert BatchScanExec plans backed by SparkBatchQueryScan (Iceberg) into a native scan.
  • Converts Iceberg InputPartition / FileScanTask into FilePartition + PartitionedFile compatible with the existing native file scan protobufs (Parquet/ORC).
  • Supports COW Iceberg tables for Parquet/ORC in the initial version, guarded by a feature flag (spark.auron.enable.iceberg.scan).

Describe alternatives you've considered
N/A

Additional context

  • Initial scope: COW tables only, Parquet/ORC, basic predicate pushdown where already supported by the existing native pruning expression converters.
  • MOR/delete-file handling, time-travel, and metadata table support can be handled in follow-up issues once the base NativeIcebergBatchScanExec is in-place

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions