You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
MutableArrayData is created with one or more ArrayData and can be used to copy across rows from the source arrays to a destination array. It does this by constructing the following for each of the arrays. These can then be used to copy a range of values from the source array's null mask and data respectively.
type ExtendNulls = Box<dyn Fn(&mut _MutableArrayData, usize)>;
Which can be used to append null values to the in-progress array.
Users don't call these boxed functions directly, but instead call MutableArrayData::extend or MutableArrayData::extend_nulls which in turn call the appropriate functions.
This works really well for kernels such as concat which call MutableArrayData with large ranges, however, it performs poorly in kernels such as take and filter where the contiguous ranges may be very small.
Edit: The take kernel in fact has custom implementations for each array, likely because using MutableArrayData would be painfully slow, perhaps with this we could unify the implementations 🤔
Describe the solution you'd like
Modify the signatures of these functions to a slice of ranges, and add
This will not only amortise the cost of the extend functions, but will also allow implementations to do more performant gather operations where possible
Additional context
The Filter returned by build_filter and used when filtering a record batch with more than one column, already computes a Vec of ranges - and so this would be effectively free.
The text was updated successfully, but these errors were encountered:
Having thought about this a bit more, filter would likely be better off with specialized impls as it can then elide range checks, etc... I'm going to take a stab at that and see what I can come up with.
I'll leave this ticket open as it may still aid SortPreservingMerge
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
MutableArrayData
is created with one or moreArrayData
and can be used to copy across rows from the source arrays to a destination array. It does this by constructing the following for each of the arrays. These can then be used to copy a range of values from the source array's null mask and data respectively.It then also constructs
Which can be used to append null values to the in-progress array.
Users don't call these boxed functions directly, but instead call
MutableArrayData::extend
orMutableArrayData::extend_nulls
which in turn call the appropriate functions.This works really well for kernels such as
concat
which callMutableArrayData
with large ranges, however, it performs poorly in kernels such astake
andfilter
where the contiguous ranges may be very small.Edit: The
take
kernel in fact has custom implementations for each array, likely because usingMutableArrayData
would be painfully slow, perhaps with this we could unify the implementations 🤔Describe the solution you'd like
Modify the signatures of these functions to a slice of ranges, and add
This will not only amortise the cost of the extend functions, but will also allow implementations to do more performant gather operations where possible
Additional context
The
Filter
returned bybuild_filter
and used when filtering a record batch with more than one column, already computes a Vec of ranges - and so this would be effectively free.The text was updated successfully, but these errors were encountered: