Skip to content

Add support for SessionState in supports_filters_pushdown for a Custom Data Source #11193

@cisaacson

Description

@cisaacson

Is your feature request related to a problem or challenge?

We need the ability to get the TaskContext.task_id any place where a Custom Data Source is invoked. As it stands currently, the state: &SessionState is available in TableProvider.scan and task_ctx: Arc<TaskContext> is available in ExecutionPlan.execute, but not in the supports_filters_pushdown. This prohibits per-query customization or tracking of external state in this method. For example if there are 3 filters for a custom table, and 10 are possible, we need to be able to choose the best one at runtime.

Further, the task_id should always be available by passing the TaskContext or from SessionState to keep things consistent.

In trying to implement this it proved infeasible because supports_filters_pushdown is in 2 interfaces in 2 separate crates: TableProvider (in core) and TableSource (in expr). It is not possible to add state: &SessionState to the TableSource implementation as it cannot access the core crate, a cyclic dependency occurs the way it is now. This was intentional to make LogicalPlan separable, which makes sense, but preventing this type of enhancement.

Describe the solution you'd like

Add &SessionState or minimally TaskContext in every pertinent method for per-query specific processing in a custom data source.

A possible way to solve this is to make a new datafusion-traits crate, and to move SessionState and other common items to datafusion-common, such that these components are used by core and expr. It will make some components available in expr that are not strictly necessary, but I think that is a good trade-off. This work could be combined with other efforts to break core into more sub-crates, that would make DataFusion much more flexible overall.

Describe alternatives you've considered

No response

Additional context

Restructuring crates in a project of this size will be a lot of work, but I believe the benefit will be there. There are other issues that also would benefit. I would recommended a separate restructure ticket that can be reviewed before any implementation is attempted. In addition then this would need to be implemented by multiple contributors, it will inevitably cause a lot of temporary breakage and retesting will also be required.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions