-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge?
This is is part of implementing LogicalTypes / Extension Types in DataFusion, as described by @findepi
ExtensionTypes are defined using the metadata on an arrow Field (not the DataType) and stored physically as one of the existing arrow types. This system has the nice benefit that extension types can be processed (passed through) by systems that don't support them as their underlying arrow type, and then additional semantics added by systems that do.
As people continue using DataFusion to implement more sophisticated extension types such as geometry and geography (@paleolimbot) and Variant @friendlymatthew ), they are finding is important to customize certain operations that are currently hardcoded in DataFusion based on physical type.
Some example of operations where special semantics are sometimes needed for extension types
- printing / displaying values (e.g. printing Variant values in a JSON like manner rather than their raw bytes)
- casting values to/from other types
- Comparing values (e.g. it is not correct to compare two variant values byte by byte)
There are a few challenges challenges now:
- Extension type information is carried on
Field(rather than DataType), and the Field is not yet available everywhere (though @paleolimbot and others are working on this) - Even once we have
Fieldavailable everywhere, the callsites for many cast/print and binary operations call directly into the arrow kernels which have no way to customize behavior for extension types.
Describe the solution you'd like
I think we need some sort of DataFusion API for users of extension types to specify and customize their behavior.
Describe alternatives you've considered
One possibility is to add a DFExtensionType trait, that extends the exiting ExtensionType trait, similar to DFSchema
Maybe something like
/// DataFusion Extension Type support
pub trait DFExtensionType: ExtensionTrait {
/// Cast a column of this extension type to the target
fn cast(&self, input: ColumarValue, output_type: &Field) -> Result<ColumnarValue>;
// .. other functions ...
}We would also need some way to register these types dynamically with the SessionContext as well as pass along the registry into the places they are needed.
let ctx = SessionContext::new();
ctx.register_extension(Arc::new(DFVariantExtension));
...I am not quite sure if this is the right API, we would need to try it out probably
Additional context
No response