Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move sorting from base to stdlib #46679

Closed
wants to merge 28 commits into from
Closed

Conversation

LilithHafner
Copy link
Member

@LilithHafner LilithHafner commented Sep 8, 2022

This PR eliminates all three current uses of sort in the compiler by merging PRs for the first two and inlining the third use.

Then it eliminates all but one use of sort! in base and replaces that final use with a use of the new special purpose Core.Compiler.sort! which is CombSort (thanks, @nlw0), a small, efficient, unstable, n log n sorting algorithm. And finally, it moves base/sort.jl to stdlibs/Sort/src/Sort.jl and the sorting code from base/range.jl into stdlibs/Sort/src/ranges.jl.

  • Make Sort a stdlib (big diff)

Finally, it takes advantage of the Sort not participating in bootstrapping and removes some hacks

  • remove no longer needed using statements
  • switch from Floats to IEEEFloat
  • replace metaprogramming with unsigned

This will allow us to avoid many of the gymnastics in #45222 needed to accommodate bootstrapping. It might also reduce the sysimage/compilecache size a wee bit, but I haven't checked.

Supersedes #45584
Closes #46586
Closes #46587
Closes #46507
Is needed by #45222

@LilithHafner LilithHafner added excision Removal of code from Base or the repository sorting Put things in order labels Sep 8, 2022
@LilithHafner LilithHafner changed the title Remove Core.Compiler.sort Move sorting from base to stdlib Sep 10, 2022
@LilithHafner LilithHafner added needs docs Documentation for this change is required needs news A NEWS entry is required for this change labels Sep 10, 2022
@nsajko
Copy link
Contributor

nsajko commented Sep 11, 2022

Regarding the bootstrapping sort, could you clarify the claim of comb sort having O(n log n) complexity? Pretty sure you're wrong, assuming you meant worst-case complexity, it seems that your variant (with a geometric sequence of increments) has a quadratic lower bound on the worst case complexity, although better worst-case complexity is possible with other sequences of increments (p. 16): https://homepages.cwi.nl/~paulv/papers/sorting.pdf

Perhaps you should fix the commit message and doc-comment?

On the other hand, why not just use the standard insertion sort or some Shell sort?

@oscardssmith
Copy link
Member

If the decision is based around ease of implementation, would a heap sort be good for the compiler? I'm pretty sure we already have some stuff in the compiler for manipulating heaps.

@nlw0
Copy link
Contributor

nlw0 commented Sep 11, 2022

Regarding the bootstrapping sort, could you clarify the claim of comb sort having O(n log n) complexity? Pretty sure you're wrong, assuming you meant worst-case complexity, it seems that your variant (with a geometric sequence of increments) has a quadratic lower bound on the worst case complexity, although better worst-case complexity is possible with other sequences of increments (p. 16): https://homepages.cwi.nl/~paulv/papers/sorting.pdf

Perhaps you should fix the commit message and doc-comment?

On the other hand, why not just use the standard insertion sort or some Shell sort?

Thanks for sharing the interesting reference. It's indeed not n log n. Please note that the final stage of the algorithm is insertion sort itself, the first stage supposedly improves on it. The main advantage of this algorithm is that it enables vectorization optimizations by the compiler.

Complexity might not matter so much in the stated context of small lists. On the other hand, maybe the lists are so small that straight insertion sort might be good enough? I'm not sure @LilithHafner tried that.

@nlw0
Copy link
Contributor

nlw0 commented Sep 11, 2022

If the decision is based around ease of implementation, would a heap sort be good for the compiler? I'm pretty sure we already have some stuff in the compiler for manipulating heaps.

Heap sort nicely guarantees n log n complexity, but the main thing about comb sort is the vectorization, like I said before. Heapsort might not be the best for small lists as well. Also, might not be as easy to implement, but if there are functions available to do it in-place, maybe... Otherwise I'd say either stick to a simpler algorithm or just go ahead with a full-on smoothsort!

base/compiler/sort.jl Outdated Show resolved Hide resolved
@LilithHafner
Copy link
Member Author

Would someone please help me with getting the docs to build? I'm having trouble figuring it out.

@ViralBShah
Copy link
Member

Can you rebase this branch to master and resolve the conflict? Happy to try help on the doc failure (but maybe it just goes away)?

Lilith Hafner and others added 3 commits October 2, 2022 07:35
Not an ideal solution because we no longer use automatic @ref, but produces the same result.
@LilithHafner LilithHafner removed the needs docs Documentation for this change is required label Oct 2, 2022
@LilithHafner
Copy link
Member Author

The documentation issues are now resolved, it's possible there is a more elegant way, but the current situation is possible and does not require making any changes to @refs to sorting functionality in the documentation that lives outside of stdlib/Sort.

I believe all prior comments have been addressed, but this PR has yet to receive a comprehensive review.

@LilithHafner
Copy link
Member Author

Bump <3

@oscardssmith
Copy link
Member

imo this looks very good. would it be possible to separate this info 2 PRs, one with all the changes to Core and the other which moves to stdlib? since there are non-trivial algorithm changes there, it would be good to get those looked at by someone familiar with those parts of the compiler.

@LilithHafner
Copy link
Member Author

Yes, and I think for the most part it already is (#46586 and #46587 implement those algorithm changes). I'll rebase and bump those PRs.

@fredrikekre
Copy link
Member

What is the motivation for this? #45222 doesn't really explain why this is needed. Will you not be able to sort with vanilla Julia now? I imaging sorting is way more common than taking a mean, and mean is moving back in because the people want it.

Also, this branch doesn't build without Sort in the sysimg.

@LilithHafner
Copy link
Member Author

Sort will stay in the sysimage and sort! and friends will remain accessible without any using or import statements (much like rand). You will continue to be able to sort with vanilla Julia.

The primary motivation from #45222 is to let Sort depend on Random. Another perk of dropping the compiler's dependence on Sort is to let Sort use views, the extrema and unsigned functions, the IEEEFloat type, and other convenient constructs that are not available during bootstrapping.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Oct 4, 2022

What conceptually is the line between what sorting functionality is available in Base and what would require loading the Sorting stdlib? Btw, I'd much rather call the stdlib Sorting since that follows the more modern Julia package naming convention (which the Sort submodule long predates). Alas, I don't think we can do this in a 1.x release since it seems quite breaking.

@gbaraldi
Copy link
Member

gbaraldi commented Oct 4, 2022

Not sure how different this would be compared to turning something from a stdlib to a normal package. Specially if the API remains the same and no extra imports/usings are needed.

@fredrikekre
Copy link
Member

sort! and friends will remain accessible without any using or import statements

That is not the case right now though, Julia doesn't even compile (UndefVarError: sort! etc)

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Oct 4, 2022

It's quite different since you can access Sort via Base.Sort without adding any packages. If it becomes a stdlib then you'd have to do add Sorting in order to access it, which is quite breaking. The only way that's acceptable is if everything that's currently working and documented as part of the sorting API continues to work. We could (silently by default) deprecate that API and instead recommend using the new Sorting stdlib and eventually hopefully people will do that, but it would be a slow transition and may not allow the desired changes for the internals.

@gbaraldi
Copy link
Member

gbaraldi commented Oct 4, 2022

I'm not sure why moving it to a stdlib is so necessary (outside of the Random dep). If the issue is bootstrap. Could we move the new sorting code to late in the bootstrap after the other things have been defined and keep the minimal sorting things in the compiler?

@StefanKarpinski
Copy link
Member

I'm in favor it in principle just because I think that any complex functionality that can be moved out should be moved out. However, it remains unclear if we can actually meaningfully move much sorting functionality out.

@LilithHafner
Copy link
Member Author

What conceptually is the line between what sorting functionality is available in Base and what would require loading the Sorting stdlib?

Everything can be done without loading the Sorting stdlib. This is necessary to avoid breaking things.

@fredrikekre, could you explain what you mean by "Julia doesn't even compile"? CI builds fine, and when I run

git clone --depth=1 --branch=remove-compiler-sort https://github.com/JuliaLang/julia.git 
cd julia
make
./julia
sort(rand(10))
include("stdlib/Sort/test/runtests.jl")

everything works fine.

@fredrikekre
Copy link
Member

without Sort in the sysimg.

@LilithHafner
Copy link
Member Author

Sorry, I still don't understand. I'm not aware of any plans to remove sorting from the sysimage.

@KristofferC
Copy link
Member

Sorry, I still don't understand. I'm not aware of any plans to remove sorting from the sysimage.

In general, the expectation is that unless you depend on a stdlib, your package should be usable without that stdlib being present (even in the sysimage). There are currently a few exceptions to that for stdlibs that do type piracy:

  • LinearAlgebra with multiplication between arrays.
  • Random with rand()

It would IMO be good not to introduce more cases like that since it makes filtering out stdlibs unsafe (e.g. via PackageCompiler).

@oscardssmith
Copy link
Member

@LilithHafner Note that currently the compiler sort is broken for large sizes. I'm going to steel the part of this PR that will fix this (i.e. the part that uses heapsort) to make into a separate PR to fix the regression.

@LilithHafner
Copy link
Member Author

Once sorting and sorting optimizations are modular and extensible, it should be possible to move short integer range optimizations, the Sort.Float submodule, and radix sort out of base. If we are willing to let sort(v, alg=MergeSort) dispatch to a stable QuickSort when the Sorting stdlib is not loaded, then we should be able to move MergeSort out as well. This PR does not do the refactoring necessary to make those components optional, though.

Should we defer this PR until sorting has gone through that refactoring?

@LilithHafner
Copy link
Member Author

The only way I can think of to do this is to have a default implementation of sort! in base that ignores most alg arguments but still works and a faster version in Sorting so that almost everyone will use the fast version but if folks excise unused stdlibs for a smaller sysimg and don't depend on Sorting then they will get the simpler, smaller algorithm with different (worse) performance characteristics.

Seems like a whole lot of complexity in a roundabout way for not much gain.

@LilithHafner LilithHafner deleted the remove-compiler-sort branch May 27, 2023 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
excision Removal of code from Base or the repository sorting Put things in order
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants