Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements the CombSort algorithm #54

Merged
merged 15 commits into from
Oct 9, 2022

Conversation

nlw0
Copy link
Contributor

@nlw0 nlw0 commented Feb 19, 2022

This implements the comb sort algorithm. The patch was first submitted to Julia core, but it was decided that SortingAlgorithms.jl would be a better place. JuliaLang/julia#32696

Please check previous threads for details and motivation. This algorithm was discussed in a 2019 JuliaCon presentation https://youtu.be/_bvb8X4DT90?t=402 . The main motivation to use comb sort is that the algorithm happens to lead itself very well to compiler optimizations, especially vectorization. This can be checked by running e.g. @code_llvm sort!(rand(Int32, 2^12), 1, 2^12, CombSort, Base.Order.Forward) and looking for an instruction such as icmp slt <8 x i32>.

Comb sort is a general, non-stable comparison sort that outperforms the standard quick/intro sort for 32-bit integers. It doesn't seem to outperform radix sort for that kind of element type, though. So it's not clear whether it only outperforms quick sort in the cases where radix sort is actually optimal. The motivation is that comb sort is a simple general-purpose algorithm that seems to be easily optimized by the compiler to exploit modern parallel architectures.

I'd gladly perform more benchmarks if this is desired, although it would be nice to hear specific ideas of the kind of input types and sizes we are interested in. As far as I know, none of the currently implemented algorithms had to be validated with such experiments before being merged. It would be great to hear some advice about moving forward with this contribution, if at all, since this peculiar algorithm seems to attract a high level of scrutiny, probably deserved.

All the tests right now seem to be heavily based on floating-point numbers, and here there's actually some challenges in the implementation. The core of the implementation is the function ltminmax which compares two values and returns an ordered pair using the min and max functions. This is perfect for integers and strings, but with floating-point things get weird, as usual. The results with NaNs right now are actually not even correct, although the test is passing (!). It would be great to have some advice about how to fix that, as well as how we might extend the tests.

I'm very glad to have studied this algorithm using Julia, I feel it's a great showcase for the language, and it seems to epitomize modern, parallel-focused computing. I'd love to hear suggestions about how we might highlight these ideas in this patch.

@codecov-commenter
Copy link

codecov-commenter commented Feb 19, 2022

Codecov Report

Merging #54 (3da6412) into master (a17c80c) will increase coverage by 0.10%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #54      +/-   ##
==========================================
+ Coverage   96.40%   96.51%   +0.10%     
==========================================
  Files           1        1              
  Lines         334      344      +10     
==========================================
+ Hits          322      332      +10     
  Misses         12       12              
Impacted Files Coverage Δ
src/SortingAlgorithms.jl 96.51% <100.00%> (+0.10%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@nalimilan
Copy link
Contributor

Thanks. Contrary to Base, I don't think systematic benchmarks are needed to include a new algorithm in this package. The point of SortingAlgorithms.jl is to provide a variety of algorithms. However, it should give correct results or throw an error for unsupported values or types. It's surprising that tests pass with NaN if you said they currently don't work. Do you have an example of a case that fails? Would throwing an error if you detect an NaN be OK?

It would be good to add tests for other types BTW (strings, custom type, missing...). This is a real lack for current algorithms but it's never too late to improve.

@nlw0
Copy link
Contributor Author

nlw0 commented Feb 20, 2022

Thank you, looking forward to contributing this and perhaps other algorithms later.

I was perhaps a bit fast to judge, the implementation actually does seem to work:

julia> sort(randn_with_nans(11, 0.1), alg=CombSort)'
1×11 adjoint(::Vector{Float64}) with eltype Float64:
 -1.1997  -0.266208  -0.0248003  0.165739  0.731837  0.779865  0.801758  0.900917  1.8231  NaN  NaN

My confusion is because I was testing with direct calls to the 6-parameters method, and I guess I picked the wrong choice for the ordering parameter. I suppose I just don't understand exactly how that works. What happens is that if I call it with sort!(v, 1, length(v), CombSort, Base.Sort.Float.Right()) we end up with all NaNs. I believe I can fix it with a special function that treats NaN as the max, but I'm not sure if this is necessary or not. In fact, a direct call to other algorithms also cause problems. Maybe nan-handling should even be implemented through by?

My uncertainty was later reinforced because it seems the test checks for "issorted" instead of comparing the output to a vector sorted by another method, and if the output was all [NaN, NaN, NaN, ...], then it would pass. So I recommend the test should always be checking for the specific values in the output.

It all seems to be fine, though, I just don't understand the mechanics of the ordering etc, and using the algorithm with the higher-level sort function seems fine.

@nlw0 nlw0 mentioned this pull request Mar 2, 2022
@nalimilan
Copy link
Contributor

Sorry for the delay. I don't think you should be concerned with Base.Sort.Float.Right() handling NaN. AFAICT it's an internal ordering used only by fpsort! after it has moved all NaNs to the end of the vector so that they are not passed to the sorting algorithm at all. But indeed tests should ideally be stricter than just calling issorted.

@nlw0
Copy link
Contributor Author

nlw0 commented Aug 25, 2022

Can somebody help me here? What are we missing to go ahead?

src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
@LilithHafner LilithHafner mentioned this pull request Aug 25, 2022
src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
src/SortingAlgorithms.jl Show resolved Hide resolved
@nlw0
Copy link
Contributor Author

nlw0 commented Aug 27, 2022

Thanks for the review, I believe I have covered everything.

Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been doing a bit of benchmarking, and this looks really good! It seems to be faster than any algorithm I know of for unstable sorting of primitives in default order of length about 10–1500. That's a very particular domain, but also a fairly common use case and a case where Julia currently struggles.

Comment on lines 41 to 42
- H. Inoue, T. Moriyama, H. Komatsu and T. Nakatani, "AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD Processors," 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), 2007, pp. 189-198, doi: 10.1109/PACT.2007.4336211.
- Werneck, N. L., (2020). ChipSort: a SIMD and cache-aware sorting module. JuliaCon Proceedings, 1(1), 12, https://doi.org/10.21105/jcon.00012
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these works describe different and much larger sorting algorithms. CombSort (Albeit with a bubble finish) was introduced in a one-pager from 1980: Dobosiewicz, Wlodzimierz, "An efficient variation of bubble sort", Information Processing Letters, 11(1), 1980, pp. 5-6, https://doi.org/10.1016/0020-0190(80)90022-8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add further references if you like, but please add them as suggestions in this case. It feels enough to me.

This implementation is the "default" method presented in ChipSort.jl, and the inspiration came mostly from AA-Sort. I'm not sure earlier papers discuss vectorization, and the early history of all these algorithms gets a little murky. I recommend following the citations in Section 2.4 of the ChipSort paper, which includes Knuth, and also this wiki which has a good recap. They cite Dobosiewicz there (and Knuth himself also did in a later edition of his book). https://code.google.com/archive/p/combsortcs2p-and-other-sorting-algorithms/wikis/CombSort.wiki

The algorithm in AA-Sort Figure 2 actually finishes with bubble sort as well. That's why I point out finishing with insertion might be the small contribution from the ChipSort paper. Although it's a pretty conventional idea, as explained in sec 2.4. What's still a mystery is whether this comb+insertion approach might be guaranteed to have n.log(n) complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I view Wlodzimierz's piece as a concise exposition of CombSort, I view the AA-Sort paper as a longer analysis of a set of modifications to CombSort. Does this PR implement the modifications the AA-Sort paper describes?

The second reference seems more appropriate, but IIUC the only part of that paper that describes this algorithm is the first half of section 2.4.

Copy link
Contributor Author

@nlw0 nlw0 Aug 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you would like a more specific reference, it may not exist. This PR is a result of finding out, amidst the other investigations in ChipSort, that this simple algorithm can be well vectorized and offers a great performance.

Indeed, very little from the AA-sort paper is here. It was just the original inspiration, and I feel I can't just cite myself, although I wouldn't object to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be reasonable to just cite yourself or to cite yourself first followed by AA-Sort

src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
@nlw0
Copy link
Contributor Author

nlw0 commented Aug 27, 2022

I've been doing a bit of benchmarking, and this looks really good! It seems to be faster than any algorithm I know of for unstable sorting of primitives in default order of length about 10–1500. That's a very particular domain, but also a fairly common use case and a case where Julia currently struggles.

Cool! Are you talking about any cases where Radix cannot be applied? I'm not sure I've ever seen a case where Radix does not win, not even very specific examples. That's one unfortunate omission in the ChipSort paper, I didn't get to benchmark against Radix. Which I also understand is now finally going to be offered in Base as well.

One interesting technique for small inputs is sorting networks. ChipSort.jl has it, and I believe there are other packages offering it as well.

@LilithHafner
Copy link
Member

Cool! Are you talking about any cases where Radix cannot be applied? I'm not sure I've ever seen a case where Radix does not win, not even very specific examples.

What radix sort were you comparing to? The one in Base is slower than CombSort for 700 Ints on this computer:

julia> @btime sort!(x; alg=CombSort) setup=(x=rand(Int, 700)) evals=1;
  10.110 μs (0 allocations: 0 bytes)

julia> @btime sort!(x) setup=(x=rand(Int, 700)) evals=1; # Adaptive sort dispatching to radix sort
  12.163 μs (3 allocations: 7.84 KiB)

julia> versioninfo()
Julia Version 1.9.0-DEV.1035
Commit 52f5dfe3e1* (2022-07-20 20:15 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin21.3.0)
  CPU: 4 × Intel(R) Core(TM) i5-8210Y CPU @ 1.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.5 (ORCJIT, skylake)
  Threads: 1 on 2 virtual cores
Environment:
  LD_LIBRARY_PATH = /usr/local/lib
  JULIA_PKG_PRECOMPILE_AUTO = 0

@nlw0
Copy link
Contributor Author

nlw0 commented Aug 27, 2022

Interesting, I'm not familiar yet with what's been added to base. I think there's a package that specifically implements radix sort, and that's what I've tried in the past. Maybe it uses a few extra tricks. One of my arguments for having comb sort in base was that implementing radix is not as easy, so I'm curious to find out if the implementation there is outperformed.

I've never actually understood how radix can make use of any ILP, what's simple to understand with comb. It might be that some small detail is missing in base to enable vectorization. Or otherwise it might be just a case of tuning eg when to switch to insertion sort. I'll definitely make some experiments later now that you showed me that!

@LilithHafner
Copy link
Member

We should mention asymptotic runtime (c.f. JuliaLang/julia#46679 (comment))

@LilithHafner
Copy link
Member

LilithHafner commented Oct 2, 2022

I think this is very close to ready. All it needs are a couple of documentation changes and a rebase/merge onto the latest master to make sure tests pass on nightly.

It's a neat algorithm that I'd like to see merged!

@nlw0
Copy link
Contributor Author

nlw0 commented Oct 2, 2022 via email

@nlw0
Copy link
Contributor Author

nlw0 commented Oct 4, 2022

@LilithHafner I've changed the docstring and rebased, hope it's all fine now

- *in-place* in memory.
- *parallelizable* suitable for vectorization with SIMD instructions
because it performs many independent comparisons.
- *complexity* worst-case only proven to be better than quadratic, but not `n*log(n)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This algorithm has quadratic worst-case runtime.

julia> @time sort!(4^7*repeat(1:30, 4^7));
  0.027213 seconds (8 allocations: 11.258 MiB)

julia> @time sort!(4^7*repeat(1:30, 4^7); alg=CombSort);
  4.866824 seconds (4 allocations: 7.500 MiB)

Proof

Take an arbitrary k, let m = 4k, and let n = m*4^7. Consider the first 7 intervals for an input of length n: [n*(3/4)^i for i in 1:7] == [m*4^7*(3/4)^i for i in 1:7] == [m*4^(7-i)*3^i for i in 1:7]. Notice that each interval is divisible by m.

Now, construct a pathological input v = repeat(1:m, 4^7). This input has the property v[i] == v[i+*jm] for any intergers i and j which yield inbounds indices. Consequently, the first 7 passes cannot alter v at all.

Informal interlude: There are still a lot of low numbers near the end of the list, and the remaining passes will have a hard time moving them to the beginning because their intervals are fairly small.

Consider the elements 1:k that fall in the final quarter of v. There are k*4^7/4 = n/16 such elements. Each of them must end up in the first quarter of the list once sorted, so they must each travel a total of at least n/2 slots (in reality they must each travel more than this, but all I claim is a lower bound).

To recap, we have established n/16 elements that must travel at least n/2 slots, and that they do not travel at all in the first 7 passes. The remaining comb passes have intervals no greater than [n*(3/4)^i for i in 8:inf]. The furthest an elemental can move toward the start of the vector in a single pass is the interval size of that pass, so the furthest an element can move toward the start of the vector in all remaining passes combined is sum([n*(3/4)^i for i in 8:inf]) = n*(3/4)^8 / (1 - 3/4) = 4n*(3/4)^8 < 0.401n. Thus, after all the comb passes are compete, we will still have n/16 elements that have to move at least 0.099n slots toward the start of the vector. Insertion sort, which can only move one swap at a time will require 0.099n*n/4 > .024n^2 swaps to accomplish this. Therefore, the worst case runtime of this algorithm is Ω(n^2).

It is structurally impossible for this algorithm to take more than O(n^2) time, so we can conclude Θ(n^2) is a tight asymptotic bound on the worst case runtime of this implementation of combsort. (A similar analysis holds for any geometric interval distribution).


We can verify the math in this proof empirically:

code
function comb!(v)
    lo, hi = extrema(eachindex(v))
    interval = (3 * (hi-lo+1)) >> 2
    while interval > 1
        for j in lo:hi-interval
            a, b = v[j], v[j+interval]
            v[j], v[j+interval] = b < a ? (b, a) : (a, b)
        end
        interval = (3 * interval) >> 2
    end
    v
end

function count_insertion_sort!(v)
    count = 0
    lo, hi = extrema(eachindex(v))
    for i = lo+1:hi
        j = i
        x = v[i]
        while j > lo && x < v[j-1]
            count += 1
            v[j] = v[j-1]
            j -= 1
        end
        v[j] = x
    end
    count
end

K = 1:6
M = 4 .* K
N = M .* 4^7
swaps = [count_insertion_sort!(comb!(repeat(1:m, 4^7))) for m in M]

using Plots
plot(N, swaps, label="actual swaps", xlabel="n", ylabel="swaps", legend=:topleft)
plot!(N, .024N.^2, label="theoretical minimum")

Results
Screen Shot 2022-10-05 at 5 02 03 PM


The proof conveniently provides us with a pathological input to test. So, even more empirically, we can simply measure runtime.

Code
# multiply by a large number ot avoid dispatch to counting sort
make_vector(m) = 4^7*repeat(1:m, 4^7)
x = 1:20
n       = 4^7*x
comb    = [(x = make_vector(m); @elapsed(sort!(x; alg=CombSort))) for m in x]
default = [(x = make_vector(m); @elapsed(sort!(x              ))) for m in x]
theory  = .024n.^2 / 1.6e9 # 1.6 ghz clock speed

plot(n, comb, label="comb sort", xlabel="n", ylabel="time (s)", legend=:topleft)
plot!(n, default, label="default sort")
plot!(n, theory, label="theoretical minimum")

Results

Screen Shot 2022-10-05 at 5 18 43 PM

@nlw0
Copy link
Contributor Author

nlw0 commented Oct 5, 2022 via email

@LilithHafner
Copy link
Member

I agree that reference must have been talking about nongeometric gap sequences if it found subquadratic runtimes (i.e. some special way of reducing the intervals). I suspect that, like shell sort, the ideal gap sequence is hard to compute.


The algorithm as written (with a geometric gap sequence) also has Θ(n^2) average case runtime.

The proof is similar to the worst case proof, but gives a much lower constant factor.

Take arbitrary integer m ≥ 5. Our input is v = rand(m*4^7). Now, consider the views [@view(v[i:m:end]) for i in 1:m]. Because the first 7 passes have intervals with multiples of m, they cannot swap elements from one view to another. At best, these first 7 passes sort each view independently.

Now consider which elements fall in the first quartile. Obviously, one quarter do. Less obviously, consider how many elements of a given view fall above the median. Specifically, what are the odds that more than three quarters of a view falls in the first quartile? This is not an easy question to answer precisely. Note that the answer depends on m because if a view consists of very large elements, that will push up the median and as m increases, this effect is lessened. When m is 1, 2, or 3, the odds are 0, and as m increases, the odds increase monotonically. Let k be the probability when m = 5. We now know that the probability for all m ≥ 5 is at least k.

Consider the on average mk views which have more than 3/4 of their elements in the first quadrant. In passes 8:end, those elements can move at most .401*m*4^7 slots toward the beginning, but there are .049*4^7 elements that must be more than (.401+.049)*m*4^7 slots out of place in each view, even if the view is fully sorted by the first 7 passes. After the entire comb process is compete, this leaves .049*4^7*m*k elements that are .049*m*4^7 slots out of place for a minimum runtime of the insertion sort pass of .049^2*(4^7*m)^2*k ≈ .0024k*n^2. Thus comb sort with this (or any) geometric gap sequence has average case runtime of Ω(n^2), and structurally cannot be worse, so is Θ(n^2). This completes the answer to the question Dobosiewicz was unable to answer and posed in his initial publication (pdf attached) of the algorithm Best case: Θ(n log n) worst case: Θ(n^2) average runtime: Θ(n^2).

But before we write off this algorithm as quadratic and suitable only for small vectors, we should compute k. Statistical formulas I don't know off the top of my head would tell us that the odds approach something like error_function(4^7/sqrt(4^7)), but we can also compute this exactly for m=5 which will give us a precise lower bound for all m ≥ 5 which is what we seek. First, we compute a denominator: how many ways are there to choose 5*4^7*3/4 elements above the first quartile and 5*4^7*1/4 elements below the first quartile? binomial(5*4^7, 5*4^6). Then, a numerator: how many ways are there to choose those elements such that at least 4^7*3/4of the elements in the first view are in the first quantile? If i elements from the first quantile are in the first view, then that leaves 5*4^7*1/4-i elements in the first quantile for the remaining 4 views. The number of ways to choose all these elements is binomial(4^7, i)*binomial(4*4^7, 5*4^7*1/4-i). We can add these up for i > 4^7*3/4 with sum(binomial(big(4^7 [combsort.pdf](https://github.com/JuliaCollections/SortingAlgorithms.jl/files/9721279/combsort.pdf) ), i)*binomial(big(4*4^7), 5*4^6-i) for i in 4^6*3:4^7) ≈ 2.75e+14720. Dividing this by our denominator yields k ≈ 3.2e-5284, a very low constant factor.

This is a proof that this algorithm has quadratic asymptotic runtime, but due to a very low constant factor, the proof is empirically vacuous.

src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
src/SortingAlgorithms.jl Outdated Show resolved Hide resolved
nlw0 and others added 2 commits October 8, 2022 09:21
Co-authored-by: Lilith Orion Hafner <lilithhafner@gmail.com>
Co-authored-by: Lilith Orion Hafner <lilithhafner@gmail.com>
@nlw0
Copy link
Contributor Author

nlw0 commented Oct 8, 2022

@LilithHafner that's awesome, this algorithm really seems to require some different ways of thinking in order to analyze it, not just figuring out the "mechanics"...

I can't see any attachments, is the article you referred to somewhere on-line?

Regarding the impact of the interval choices, I imagine it would be nice to find a way to ensure we don't stick to a partition of views like you describe. I also imagine the main issue is whether we can guarantee that once we reach interval 1, all the values are at most a certain distance d to their correct position, what would make the final insertion sort step linear. Is this correct? Or perhaps there could be an estimate about the probability distribution of the distances at that stage.

I've also been thinking about this algorithm compared to sorting networks. The geometric interval decay would serve as a kind of heuristic in the design of the complete sorting network for the full input. Knowing that there must be "optimal", n log n sorting networks for any input size, the remaining question would be what modifications would we need to perform to the original network in order to implement an optimal network? Could there really be a systematic limitation in this approach that puts it decidedly outside the set of optimal sorting networks? Of course insertion sort is not a sorting network, this is just how I've been thinking about this lately.

@nlw0
Copy link
Contributor Author

nlw0 commented Oct 8, 2022

I made an experiment here trying to understand what the "comb" passes do to the data, and what's the effect of the partitioning. I ran the algorithm without the final insertion sort on random permutations of 1:Nlen, with Nlen=10007 (a prime number) and Nlen=2^13, on a sample of 1001 inputs.

I'm plotting here statistics of what I'm calling the "error", which is the vector minus 1:Nlen, or the distance from the value to where it should be in the sorted array.

The general impression I have is that with the partition, it seems we actually get a hard limit on the error, although the distribution is broader. With the prime input, the distribution is more concentrated, but there can be a few strong peaks. So without the partition we can remain with a few values way far from where they should be, what I believe are called "turtles" in the context of bubble sort. Other than that, values tend to be closer to where they should in general. Anyways, there seems to be some interesting compromise between the two cases. Partitioning leaves us further from the desired position, but seems to guarantee a maximum error.

image
image

@nlw0
Copy link
Contributor Author

nlw0 commented Oct 8, 2022

OK there seems to be no guarantee for N^13, actually!...

image
image

Here are just the general error ecdf from both cases

image
image

@nlw0
Copy link
Contributor Author

nlw0 commented Oct 8, 2022

With the Mersenne prime 2^13-1 the differences to 2^13 are less pronounced, so the length might actually have been a larger factor here than the partitioning of the input.

image

This plot here highlights what the distribution of the errors looks like. 40% of the numbers are in the correct place after the "combing", and 85% within 1 step.

image
image

A histogram with logarithmic scale, perhaps the tails are exponential?... If that was the case, what would it imply to the complexity of insertion sort?
image

@LilithHafner
Copy link
Member

I can't see any attachments, is the article you referred to somewhere on-line?

Sorry about that:

combsort.pdf

https://pdfslide.net/download/link/an-efficient-variation-of-bubble-sort

Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your exploration of the effect of the comb pass is interesting. It seems that the theoretical problems don't really come up in random input of reasonable sizes. That's good!

If the tails were exponential (i.e. odds of an element being x places out of place is p^x for some p < 1) then that would imply that the insertion pass will run in linear time. Empirically, that seems to hold on the data you tested.

I think working on better gap sequences and/or the theory or empirical benchmarks to back them is a worthwhile pursuit if you are interested, but I also think it is a long pursuit and would prefer to merge this first, and then improve the gap sequence later, if that is okay with you.

src/SortingAlgorithms.jl Show resolved Hide resolved
Co-authored-by: Lilith Orion Hafner <lilithhafner@gmail.com>
@nlw0
Copy link
Contributor Author

nlw0 commented Oct 9, 2022

Sure, let's merge this.

If the standand deviation of that exponential distribution is linear with the input size, wouldn't that make the final insertion sort quadratic?

I think a variation of this algorithm that offers n log n worst-case will probably require some big insight, there's some structural detail missing. And like I said, maybe sorting networks will offer the inspiration. In fact, maybe the best step forward trying to leverage the good parallelism we get from this code might actually be to implement a generic sorting network method such as https://en.wikipedia.org/wiki/Bitonic_sorter

@LilithHafner
Copy link
Member

If the standand deviation of that exponential distribution is linear with the input size, wouldn't that make the final insertion sort quadratic?

I was assuming that the coefficient for the geometric distribution was constant with input size, if it scales linearly, then that would indeed be quadratic.

@LilithHafner LilithHafner merged commit 80c14f5 into JuliaCollections:master Oct 9, 2022
@nlw0 nlw0 mentioned this pull request Oct 11, 2022
@LilithHafner LilithHafner mentioned this pull request Nov 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants