-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate the N-way merging code and SortPreservingMergeStream
(which has quite good tests of what is often quite tricky code, and it will be performance critical)
#1572
Comments
Thanks @alamb for bringing it up. I propose using heap-sort for N-way merge, but consolidate all the codes we have now in The main reason to use For the current sorting methods in SPMS, I think it's to compare all starting records from all streams at once for each time we emit the least. The comparison might not be an issue when N is relatively small. But would deteriorate fast as N grows. @tustvold, what's your opinion on the merging algorithm to choose? |
Also cc @houqp for more minds. |
SGTM 👍. This was in fact mentioned on the original PR that added the operator here and led to the filing of #416. I would describe the current |
Great! @yjshen added a heap based implementation https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_plan/sorts/in_mem_sort.rs#L60 -- standardizing on that one sounds like a great idea |
I think this was fixed in #1596 |
No description provided.
The text was updated successfully, but these errors were encountered: