Skip to content

API to register behavior for Extension Types #18223

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

This is is part of implementing LogicalTypes / Extension Types in DataFusion, as described by @findepi

ExtensionTypes are defined using the metadata on an arrow Field (not the DataType) and stored physically as one of the existing arrow types. This system has the nice benefit that extension types can be processed (passed through) by systems that don't support them as their underlying arrow type, and then additional semantics added by systems that do.

As people continue using DataFusion to implement more sophisticated extension types such as geometry and geography (@paleolimbot) and Variant @friendlymatthew ), they are finding is important to customize certain operations that are currently hardcoded in DataFusion based on physical type.

Some example of operations where special semantics are sometimes needed for extension types

  1. printing / displaying values (e.g. printing Variant values in a JSON like manner rather than their raw bytes)
  2. casting values to/from other types
  3. Comparing values (e.g. it is not correct to compare two variant values byte by byte)

There are a few challenges challenges now:

  1. Extension type information is carried on Field (rather than DataType), and the Field is not yet available everywhere (though @paleolimbot and others are working on this)
  2. Even once we have Field available everywhere, the callsites for many cast/print and binary operations call directly into the arrow kernels which have no way to customize behavior for extension types.

Describe the solution you'd like

I think we need some sort of DataFusion API for users of extension types to specify and customize their behavior.

Describe alternatives you've considered

One possibility is to add a DFExtensionType trait, that extends the exiting ExtensionType trait, similar to DFSchema

Maybe something like

/// DataFusion Extension Type support
pub trait DFExtensionType: ExtensionTrait {
  /// Cast a column of this extension type to the target
  fn cast(&self, input: ColumarValue, output_type: &Field) -> Result<ColumnarValue>;
  // .. other functions ...
}

We would also need some way to register these types dynamically with the SessionContext as well as pass along the registry into the places they are needed.

let ctx = SessionContext::new();
ctx.register_extension(Arc::new(DFVariantExtension));
...

I am not quite sure if this is the right API, we would need to try it out probably

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions