-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BracketedSort
a new, faster algorithm for partialsort
and friends
#52006
Conversation
This commit speeds up my_partialsort! to be faster than it was before, but partialsort! is slower than any version of my_partialsort! julia> using Random; Random.seed!(1); x = rand(1000); hash(x) 0x13843e5a90ce1e51 julia> @Btime partialsort!(copy($x), 500); 3.838 μs (3 allocations: 7.88 KiB) julia> @Btime my_partialsort!(copy($x), 500); 1.946 μs (3 allocations: 7.88 KiB)
…ble IEEE opt. because it makes the kernel slower)
julia> using Random; Random.seed!(1); x = rand(1000); hash(x) 0x13843e5a90ce1e51 julia> @Btime partialsort!(copy($x), 500); 2.671 μs (3 allocations: 7.88 KiB) julia> @Btime my_partialsort!(copy($x), 500); 2.000 μs (3 allocations: 7.88 KiB) julia> @Btime partialsort!(copy($x), 500, lt=<); 1.733 μs (3 allocations: 7.88 KiB)
Here's a much more comprehensive benchmarkusing Random, Printf
get_length() = round(Int, exp(17*rand()))
get_eltype() = rand([Float64, Float32, Int, UInt8, Int128, String])
get_function() = rand([partialsort!, partialsort])
get_element(x) = rand(x)
get_element(::Type{String}) = randstring()
get_k(n) = rand([
1,
n,
rand(1:n),
rand(1:1+n÷100),
rand(n-n÷100:n),
iseven(n) ? (n÷2:n÷2+1) : (n+1)÷2,
UnitRange(sort((rand(1:n), rand(1:n)))...),
1:rand(1:n),
1:rand(1:1+n÷100),
rand(n-n÷100:n):n,
(1:rand(1:1+n÷100)) .+ rand(0:(n-n÷100-1)),
])
new!(v, k) = partialsort!(v, k)
new(v, k) = partialsort(v, k)
function old!(v, k)
Base.Sort._sort!(v, Base.Sort.InitialOptimizations(Base.Sort.ScratchQuickSort(k)), Base.Order.Forward, (;))
Base.Sort.maybeview(v, k)
end
old(v::AbstractVector, k::Union{Integer,OrdinalRange}; kws...) =
old!(Base.copymutable(v), k; kws...)
function test(length::Int, eltype, func)
func === partialsort && (length ÷= 2) # benchmarking is inefficient
data = [get_element(eltype) for _ in 1:length]
ks = [get_k(length) for _ in 1:5]
sort!(ks, by = x->(x isa Number, x))
unique!(ks)
println("\n$func(rand($eltype, $length), k)")
for k in ks
(t_old, t_new) = func === partialsort! ? test!(similar(data), data, k) : test(data, k)
a = 1e9t_old/length
b = 1e9t_new/length
c = log(b/a)*100
@printf "k = %-18s %6.3f %6.3f %+05.1f%%\n" k a b c
end
end
estimate_runtime(data::Vector{T}, k) where T =
round(Int, (10 + length(data) + length(k) * log(length(k))) * (T === String ? 3 : T == Int128 ? 2 : 1))
function test(data::Vector{T}, k::K) where {T,K}
estimated_runtime = estimate_runtime(data, k)
if estimated_runtime > 10^7
t_old = @elapsed old_res = old(data, k)
t_new = @elapsed new_res = new(data, k)
old_res == new_res || error("Bad sort")
t_old, t_new
elseif estimated_runtime > 10^5
truth = old(data, k)
t_old, t_new = Inf, Inf
for _ in 1:2*10^7 ÷ estimated_runtime
t1 = @elapsed res1 = old(data, k)
res1 == truth || error("Bad old sort")
t_old = min(t_old, t1)
t2 = @elapsed res2 = new(data, k)
res2 == truth || error("Bad sort")
t_new = min(t_new, t2)
end
t_old, t_new
else
old(data, k) == new(data, k) || error("Bad sort")
evals = max(1, 10^4 ÷ estimated_runtime)
samples = 10^7 ÷ evals ÷ estimated_runtime
t_old = @belapsed old($data, $k) evals=evals samples=samples gctrial=false
t_new = @belapsed new($data, $k) evals=evals samples=samples gctrial=false
t_old, t_new
end
end
function test!(d::Vector{T}, data::Vector{T}, k::K) where {T,K}
estimated_runtime = estimate_runtime(data, k)
if estimated_runtime > 10^7
copyto!(d, data)
t_old = @elapsed old_res = old!(d, k)
copyto!(d, data)
t_new = @elapsed new_res = new!(d, k)
old_res == new_res || error("Bad sort")
elseif estimated_runtime > 10^6
truth = old(data, k)
t_old, t_new = Inf, Inf
for _ in 1:2*10^7 ÷ estimated_runtime
copyto!(d, data)
t1 = @elapsed res1 = old!(d, k)
res1 == truth || error("Bad old sort")
t_old = min(t_old, t1)
copyto!(d, data)
t2 = @elapsed res2 = new!(d, k)
res2 == truth || error("Bad sort")
t_new = min(t_new, t2)
end
else
old!(copyto!(d, data), k) == new!(copyto!(d, data), k) || error("Bad sort")
evals = max(1, 10^4 ÷ estimated_runtime)
samples = 10^7 ÷ evals ÷ estimated_runtime
if evals == 1
t_old = @belapsed old($d, $k) setup=(copyto!($d, $data)) evals=evals samples=samples gctrial=false
t_new = @belapsed new($d, $k) setup=(copyto!($d, $data)) evals=evals samples=samples gctrial=false
else
t_old = @belapsed old(copyto!($d, $data), $k) evals=evals samples=samples gctrial=false
t_new = @belapsed new(copyto!($d, $data), $k) evals=evals samples=samples gctrial=false
end
end
t_old, t_new
end
function test(n, seed=1729)
Random.seed!(seed)
samples = [(get_length(), get_eltype(), get_function()) for _ in 1:n]
sort!(samples, by = x->(x[3] === partialsort, string(x[2]), x[1]))
try
for s in samples
test(s...)
end
catch x
x isa InterruptException || rethrow()
end
nothing
end
test(100) And the resultsThe columns are
There are a few too many regressions right now, I'm going to try to squash a few. |
…mize scalar arithmetic to avoid >100% regression when taking fallback
…tes, and remove type instability in crit-fail path
… short-circuit to avoid dispatch overhead
…sized input success rates)
… drop max trials to 3 and adjust comments
Rerunning the benchmarks
Produces better results
The noise comes from both ScratchQuickSort and PartialQuickSort. Maybe worth looking into, but niether a merge blocking regression nor particularly closely related to this PR more closely related to #45222. Edit: I looked into it a bit. I think that for this random seed, There were 4 regressions in the 30% range that look to be from the same cause. I did not investigate smaller regressions. |
@@ -91,7 +91,15 @@ issorted(itr; | |||
issorted(itr, ord(lt,by,rev,order)) | |||
|
|||
function partialsort!(v::AbstractVector, k::Union{Integer,OrdinalRange}, o::Ordering) | |||
_sort!(v, InitialOptimizations(ScratchQuickSort(k)), o, (;)) | |||
# TODO move k from `alg` to `kw` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a TODO that predates this PR, I'm just adding a note because this PR touches target range handling and reminded me that this is not the best approach.
get_next::F | ||
end | ||
|
||
# TODO: this composition between BracketedSort and ScratchQuickSort does not bring me joy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be avoided via moving k
from alg
to kw
.
There are some pretty compelling reasons to use PartialQuickSort instead of ScratchQuickSort as a base case for BracketedSort. However, PartialQuickSort can segfault for invalid but seemingly innocuous lt functions. I don't want to talk about introducing segfaults to the default sorting algorithms or introducing a regression to PartialQuickSort in this PR Revert this commit it you want `partialsort!(::Vector{Int}, ::Int)` not to allocate.
Here's my latest benchmark results
Which are mostly regression-free. The largest reported regression is a 1.76x regression on Investigating that regression more closely, it is due to random noise and I expect all the other significant regressions are as well: Methodologyusing BenchmarkTools, Random, Printf
get_length() = round(Int, exp(17*rand()))
get_eltype() = rand([Float64, Float32, Int, UInt8, Int128, String])
get_function() = rand([partialsort!, partialsort])
get_element(x) = rand(x)
get_element(::Type{String}) = randstring()
get_k(n) = rand([
1,
n,
rand(1:n),
rand(1:1+n÷100),
rand(n-n÷100:n),
iseven(n) ? (n÷2:n÷2+1) : (n+1)÷2,
UnitRange(sort((rand(1:n), rand(1:n)))...),
1:rand(1:n),
1:rand(1:1+n÷100),
rand(n-n÷100:n):n,
(1:rand(1:1+n÷100)) .+ rand(0:(n-n÷100-1)),
])
new!(v, k) = partialsort!(v, k)
new(v, k) = partialsort(v, k)
function old!(v, k)
Base.Sort._sort!(v, Base.Sort.InitialOptimizations(Base.Sort.ScratchQuickSort(k)), Base.Order.Forward, (;))
Base.Sort.maybeview(v, k)
end
old(v::AbstractVector, k::Union{Integer,OrdinalRange}; kws...) =
old!(Base.copymutable(v), k; kws...)
function test(length::Int, eltype, func; zoom=false)
func === partialsort && (length ÷= 2) # benchmarking is inefficient
length == 0 && return
data = [get_element(eltype) for _ in 1:length]
ks = [get_k(length) for _ in 1:5]
sort!(ks, by = x->(x isa Number, x))
unique!(ks)
println("\n$func(rand($eltype, $length), k)")
for k in ks
(t_old, t_new) = func === partialsort! ? test!(similar(data), data, k; zoom) : test(data, k; zoom)
a = 1e9t_old/length
b = 1e9t_new/length
c = log(b/a)*100
@printf "k = %-18s %6.3f %6.3f %+05.1f%%\n" k a b c
end
end
estimate_runtime(data::Vector{T}, k) where T =
round(Int, (10 + length(data) + length(k) * log(length(k))) * (T === String ? 3 : T == Int128 ? 2 : 1))
function test(data::Vector{T}, k::K; zoom=false) where {T,K}
estimated_runtime = estimate_runtime(data, k)
if estimated_runtime > 10^7 || zoom
t_old = @elapsed old_res = old(data, k)
t_new = @elapsed new_res = new(data, k)
old_res == new_res || error("Bad sort")
t_old, t_new
elseif estimated_runtime > 10^5
truth = old(data, k)
t_old, t_new = Inf, Inf
for _ in 1:2*10^7 ÷ estimated_runtime
t1 = @elapsed res1 = old(data, k)
res1 == truth || error("Bad old sort")
t_old = min(t_old, t1)
t2 = @elapsed res2 = new(data, k)
res2 == truth || error("Bad sort")
t_new = min(t_new, t2)
end
t_old, t_new
else
old(data, k) == new(data, k) || error("Bad sort")
evals = max(1, 10^4 ÷ estimated_runtime)
samples = 10^7 ÷ evals ÷ estimated_runtime
t_old = @belapsed old($data, $k) evals=evals samples=samples gctrial=false
t_new = @belapsed new($data, $k) evals=evals samples=samples gctrial=false
t_old, t_new
end
end
function test!(d::Vector{T}, data::Vector{T}, k::K; zoom=false) where {T,K}
estimated_runtime = estimate_runtime(data, k)
if estimated_runtime > 10^7 || zoom
copyto!(d, data)
t_old = @elapsed old_res = old!(d, k)
copyto!(d, data)
t_new = @elapsed new_res = new!(d, k)
old_res == new_res || error("Bad sort")
elseif estimated_runtime > 10^6
truth = old(data, k)
t_old, t_new = Inf, Inf
for _ in 1:2*10^7 ÷ estimated_runtime
copyto!(d, data)
t1 = @elapsed res1 = old!(d, k)
res1 == truth || error("Bad old sort")
t_old = min(t_old, t1)
copyto!(d, data)
t2 = @elapsed res2 = new!(d, k)
res2 == truth || error("Bad sort")
t_new = min(t_new, t2)
end
else
old!(copyto!(d, data), k) == new!(copyto!(d, data), k) || error("Bad sort")
evals = max(1, 10^4 ÷ estimated_runtime)
samples = 10^7 ÷ evals ÷ estimated_runtime
if evals == 1
t_old = @belapsed old($d, $k) setup=(copyto!($d, $data)) evals=evals samples=samples gctrial=false
t_new = @belapsed new($d, $k) setup=(copyto!($d, $data)) evals=evals samples=samples gctrial=false
else
t_old = @belapsed old(copyto!($d, $data), $k) evals=evals samples=samples gctrial=false
t_new = @belapsed new(copyto!($d, $data), $k) evals=evals samples=samples gctrial=false
end
end
t_old, t_new
end
function test(n, seed=1729; zoom=false)
Random.seed!(seed)
samples = [(get_length(), get_eltype(), get_function()) for _ in 1:n]
sort!(samples, by = x->(x[3] === partialsort, string(x[2]), x[1]))
try
for s in samples
test(s...; zoom)
end
catch x
x isa InterruptException || rethrow()
end
nothing
end
test(100)
function validate_sample()
d = [randstring() for _ in 1:631]
k = 631
ol = @belapsed old($d, $k) evals=5 samples=100 gctrial=false
nw = @belapsed new($d, $k) evals=5 samples=100 gctrial=false
ol*1e9/631, nw*1e9/631
end
function validate(n)
# data = [validate_sample() for _ in 1:n]
mx = maximum(maximum.(data))
bins = LinRange(0, mx, 40)
histogram(first.(data); bins, label="old", alpha=.7, color=:lightblue)
histogram!(last.(data); bins, label="new", alpha=.7, color=:pink)
vline!([mean(first.(data))], label="mean old", color=:blue, linewidth=3)
vline!([mean(last.(data))], label="mean new", color=:red, linewidth=3)
vline!([4.873], label="reported old", color=:blue, linewidth=2, linestyle=:dash)
vline!([8.558], label="reported new", color=:red, linewidth=2, linestyle=:dash)
xlabel!("Runtime (ns / element)")
ylabel!("Frequency")
title!("x = [randstring() for _ in 1:631]\n@elapsed partialsort(x, 631)")
end
display(validate(1000)) At this point I'd say runtime is adequately regression-free; this PR no longer re-introduces PartialQuickSort (since 5c18e25) so there are no fun new segfaults; uses The two remaining downsides I see are increased code complexity and instability. Instability is surmountable but not without a non-negligible performance and code complexity penalty. I think it's okay to drop stability given that we have never promised it and have only provided it since ScratchQuickSort was introduced. |
Does anyone think the stability thing is a problem? If not, this is a fairly solid performance win an should be merged. |
See the docstring of
BracketedSort
for an explanation of this algorithm.The diff is kinda big but only 40% of it is code, the rest are comments, docstrings, and tests.
I produced it without reference to any sources that describe such an algorithm but I do not claim original authorship—somebody has certainly come up with this before.
Here's an earlier version: JuliaStats/Statistics.jl#154
Robust benchmarks are available further down in the thread. Here's an appetizer
Some benchmarks vs master
A detailed view of a special case using
@benchmark
and including 1.9.3I'm not touching the *perm varients because
partialsortperm!
and I will only be able to do that once the overall approach is improved.This makes
sortperm
unstable once again, with no option to make it stable. It was unstable in 1.9 with no option to make it stable, stable in 1.10 with no option to make it unstable, and has never been documented one way or the other.