-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move sorting algorithms from base to stdlib (First try) #45584
Conversation
IMO, it would be fine if (like |
This PR puts The point is to allow sorting to depend on stdlibs, the cost is that Core.Compiler.Sort.sort! will have fewer algorithms available (just InsertionSort). I believe that cost may even be e benefit because of how the compiler uses sorting. |
The one thing that I would be worried about here is that even if the compiler typically only has to sort small things, we have worked really hard in the past year to remove accidental quadratic algorithms from the compiler that result in bad scaling for large amounts of generated code, and we don't want to add a new one. |
I think this is a good idea! The name is quite long though, how about keeping the
Would using mergesort instead of insertionsort for longer vectors as a fallback be an alternative? It's slower on average, but doesn't have the worst case quadratic behavior. |
@oscardssmith Yep, that makes sense. In my profiling, I never compiled an expression consisting of the sum of 1000 variables, or really anything quite unusual. @Seelengrab that sounds reasonable. This PR would leave MergeSort in base. Then, #45222, which stabilizes QuickSort, could move MergeSort out of base (and deprecate it?) and move QuickSort back in. Core.Compiler.Sort.sort!(alg=QuickSort) would have to use non-randomized pivot selection, but I think it is safe to assume non-malicious input to Core.Compiler.Sort.sort! (if you are compiling malicious code you have bigger problems than quadratic runtime!). I don't want to merge this PR with #45222 because both are already pretty big. |
Agreed. I had hoped we could find a shorter name. I had assumed that |
I think Keno sometimes reaches such quite unusual cases, from function & method generation from AD. |
Thanks for the feedback, y'all! I'm going to plan to move forward with this after several other sorting-related PRs are finished to avoid merge conflicts. Hold pending other PRs (see #44876 (reply in thread) for roadmap) |
Loving all your work on the sorting stuff! |
Superseded by #46679 |
I believe this makes sense for two reasons:
Core.Compiler.Sort.sort!
) is heavily skewed toward sorting very short vectors. InsertionSort is well suited to those sorting tasks.I did some profiling by inserting
into
sort.jl
before building from source and then doing a bit of plotting.The results I got indicate that the vast majority of sorting happens for vectors of length less than 30, well before any of our existing advanced sorting algorithm start to outperform InsertionSort.
Here are some figures I and a script that helped me reach that conclusion:
In the top row are some data on runtie if InsertionSort and standard sorting. InsertionSort can be a tiny bit faster for short inputs, and a decent bit slower for medium inputs.
In the bottom row are some data on which lengths of vectors are sorted, the conclusion is that the vectors are almost all very short. Even when weighted by runtime (which all the figures are) the vast majority of runtime is spent sorting very small vectors. This makes me think it is okay to not have advanced sorting in base.
Script
This is a work in progress (e.g. docs and test are copied directly from Random), I'm not sure exactly how to carry out the migration and want to hear feedback before proceeding any further.
Likely fixes #45522