You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The new sort preserving merge operator, introduced in #379 likely has room for performance improvement.
For bigger number of partitions, storing the cursors in a BinaryHeap, sorted by their current item, would be beneficial.
A rust implementation of that approach can be seen in this blog post and the first comment under it. I have implemented the same approach in java before. I agree with @alamb though to make it work first, and then optimize later.
Closing this ticket as I believe it is not tracking anything anymore.
SortPreservingMerge is now implemented as an n-way tournament tree making use of an order-preserving row encoding for multi-column sorts, and specialized cursors for single column sorts. I'm not aware of any major low-hanging fruit to make it run faster
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The new sort preserving merge operator, introduced in #379 likely has room for performance improvement.
Describe the solution you'd like
Here is a suggestion from @jhorstmann https://github.com/apache/arrow-datafusion/pull/379/files#r637948151 as a separate ticket so it doesn't get lost:
For bigger number of partitions, storing the cursors in a BinaryHeap, sorted by their current item, would be beneficial.
A rust implementation of that approach can be seen in this blog post and the first comment under it. I have implemented the same approach in java before. I agree with @alamb though to make it work first, and then optimize later.
The text was updated successfully, but these errors were encountered: