Skip to content

Improved support for "User Defined Catalogs" #5291

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I think it is a bit confusing now how to use DataFusion with a custom catalog.

Background

DataFusion is primarily a query engine, rather than a complete database system that also must handle persistence, catalog management, ingest, data lifecycle management, and other things.

Systems like Ballista or GreptimeDB are examples of complete systems that use DataFusion for query but have their own catalog implementations.

However, in order to function the query engine needs to read information catalog, and DataFusion provides a rich set of APIs such as the following

The query engine also knows how to plan for Catalog manipulations which often need planner support (e.g. to do type checking or coercion, etc)

Making things even more confusing is that DataFusion does have a basic ephemeral in-memory based catalog implementation, https://docs.rs/datafusion/18.0.0/datafusion/catalog/catalog/struct.MemoryCatalogList.html and the methods on SessionContext know how to modify that memory catalog.

Challenges

The interface and use between the built in catalog support and how to plug in an external catalog are not super clear. For example this PR #5277

Also, as projects like #5130 get under way it becomes even more important to distinguish between catalog manipulations and simply catalog read-only access

Another example is the fact that SessionContext::sql by default modifies the in memory catalog:

https://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionContext.html#method.sql

Note: This api implements DDL such as CREATE TABLE and CREATE VIEW with in memory default implementations.

If this is not desirable, consider using SessionState::create_logical_plan() which does not mutate the state based on such statements.

Describe the solution you'd like

I would like a clearer interface (or maybe just documentation) that makes it clear what manipulations are allowed and which are not, as well as an example that other people could follow to implement an external catalog. This interface should make it clear what the catalog supports and what it does not (aka does it allow creating new tables or views?)

To do this, I suggest:

This project might also help

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions