-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
For reasons I am not sure of (likely historical), certain APIs of Schema
/RecordBatch
return owned instances of SchemaRef
.
So for example
let schema: SchemaRef = record_batch.schema()
There are at least two challenges here:
- This often requires an extra copy when not needed (though it is an
Arc::clone
which is relatively inexpensive compared to the work done by most Arrow apis) - It makes it hard for borrow propagation to work correctly.
An example of challenges with borrow propagation is illustrated on #5342, which observes, we can't change Schema::try_merge
to take (&Schema
) even when it doesn't actually need owned ownership of the Schema
without creating a temporary Vec<SchemaRef>
or something similiar
- pub fn try_merge(schemas: impl IntoIterator<Item = Self>) -> Result<Self, ArrowError> {
+ pub fn try_merge<'a>(
+ schemas: impl IntoIterator<Item = &'a Schema>,
+ ) -> Result<Schema, ArrowError> {
Describe the solution you'd like
I recommend we add new functions that return a reference to the SchemaRef
So new functions like
/// Returns a reference to the [`Schema`] of the record batch.
pub fn schema_ref(&self) -> &SchemaRef {
&self.schema
}
Describe alternatives you've considered
One alternative is to change the existing APIs as proposed in #5448
/// Returns a reference to the [`Schema`] of the record batch.
pub fn schema(&self) -> &SchemaRef {
&self.schema
}
However I think this requires a substantial amount of downstream code change (as explained on #5448 (review))
Additional context
This came up recently on #5342
@tustvold notes we have made similar changes in the past, e.g. #2035 and #313