-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make DfSchema wrap SchemaRef #4680
Comments
I wrote the original The key design goal is that DataFusion should always use a Going the other way, cc @Dandandan @jackwener @mingmwang @andygrove in case you have additional thoughts |
This mostly works but runs into issues with the handling of schemas in datafusion-expr. In particular they have a fairly hard-coded assumption that they can pass around DFField, concatenate them and munge them back into a DFSchema. It would be possible to workaround this, but it would be a fairly non-trivial amount of code-churn. |
Attaching link to DFSchema struct for convenience https://github.com/apache/arrow-datafusion/blob/main/datafusion/common/src/dfschema.rs#L108 |
Ive been doing some work on planning performance and in order to lay a good foundation for the next steps I think this is necessary. So my plan is to start working on this shortly. |
Thank you @matthewmturner 🙏 BTW I started hacking on some version of what this might look like in #7944 but I haven't had a chance to finish it |
@alamb yes i saw, thanks. havent decided yet whether i will pick up on that PR or create a new one. |
Feel free to do anyting you want with the code / PR |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently DataFusion has two ways to represent a schema,
Schema
andDFSchema
. The former is the representation used by arrow, and in most components.DFSchema
appears to "enhance" an arrow schema with the notion of a qualifier.I'm not entirely sure of the history of this split, but to the uninitiated the split is confusing and frustrating. It also results in a non-trivial amount of schema munging logic to convert to/from the relevant representations
Describe the solution you'd like
I would like to change
DfSchema
to beWe could then make
DfSchema
automatically deref toSchemaRef
, or at the very least implementAsRef<SchemaRef>
, avoiding a lot of code that ends up looking likeComponents wishing to combine the information can easily zip the two together, we could even assist this by adding
Describe alternatives you've considered
We could not do this
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: