Skip to content

Return TableProviderFilterPushDown::Exact when Parquet Pushdown Enabled #4028

Closed
@tustvold

Description

@tustvold

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
(This section helps Arrow developers understand the context and why for this feature, in addition to the what)

Currently even when parquet predicate pushdown is enabled, and the predicate can be fully pushed down, the physical plan still contains a FilterExec when using ListingTable

| physical_plan | ProjectionExec: expr=[service@0 as service, host@1 as host, pod@2 as pod, container@3 as container, image@4 as image, time@5 as time, client_addr@6 as client_addr, request_duration_ns@7 as request_duration_ns, request_user_agent@8 as request_user_agent, request_method@9 as request_method, request_host@10 as request_host, request_bytes@11 as request_bytes, response_bytes@12 as response_bytes, response_status@13 as response_status]                            |
|               |   CoalesceBatchesExec: target_batch_size=4096                                                                                                                                                                                                                                                                                                                                                                                                                                |
|               |     FilterExec: container@3 = backend_container_0 OR pod@2 = aqcathnxqsphdhgjtgvxsfyiwbmhlmg                                                                                                                                                                                                                                                                                                                                                                                 |
|               |       RepartitionExec: partitioning=RoundRobinBatch(8)                                                                                                                                                                                                                                                                                                                                                                                                                       |
|               |         ParquetExec: limit=None, partitions=[home/raphael/Downloads/data.parquet], predicate=container_min@0 <= backend_container_0 AND backend_container_0 <= container_max@1 OR pod_min@2 <= aqcathnxqsphdhgjtgvxsfyiwbmhlmg AND aqcathnxqsphdhgjtgvxsfyiwbmhlmg <= pod_max@3, projection=[service, host, pod, container, image, time, client_addr, request_duration_ns, request_user_agent, request_method, request_host, request_bytes, response_bytes, response_status] |
|               |        

Describe the solution you'd like

ListingTable::supports_filter_pushdown should return TableProviderFilterPushDown::Exact when

  • Parquet predicate pushdown is enabled
  • The FileFormat is parquet
  • The predicate is fully pushed down by ParquetExec (not all predicates are supported)

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestperformanceMake DataFusion faster

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions