Skip to content

Add sortperm with dims arg for AbstractArray, fixes #16273 #45211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jun 22, 2022

Conversation

pcjentsch
Copy link
Contributor

This PR adds a dims argument to sortperm, mirroring the API for sort, and closing issue #16273.

Given an A::AbstractArray, the function satisfies A[sortperm(A, dims = dim)] == sort(A, dims = dim).

Credit to @timholy, I used his idea and blog post in this implementation. I replaced the now-deprecated sub2ind with LinearIndices, which is just as fast (benchmarked against Base._sub2ind).

In that issue, Steven G. Johnson also suggests a special case for dims=1, but my benchmarks suggest the CartesianIndices approach is now faster.

The related issue #9258 can also be closed I think, as sortrows and sortcols are no longer in Base.

@oscardssmith oscardssmith requested a review from LilithHafner May 6, 2022 18:50
@oscardssmith oscardssmith added the arrays [a, r, r, a, y, s] label May 6, 2022
Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contributions! It will be nice to close this hole in the sorting interface.

While we no longer have sortrows and sortcolumns, we do have sortslices, so to close #9258, we would need sortperm for slices.

Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added several small style suggestions, but I also have a larger question of whether we can utilize sort!(ix, order=..., dims=dims) without sacrificing efficiency?

@pcjentsch
Copy link
Contributor Author

Thanks for catching all those!

I've added several small style suggestions, but I also have a larger question of whether we can utilize sort!(ix, order=..., dims=dims) without sacrificing efficiency?

Yep! it seems if we use your change to sort!, then a much shorter version with equivalent performance is possible. I'll add your new sort! to the PR then, if that is ok with you?

benchmarks
function sortperm_new(A::AbstractArray; dims::Integer)
    Rpre = CartesianIndices(size(A)[1:dims-1])
    Rpost = CartesianIndices(size(A)[dims+1:end])
    ix = Base.copymutable(LinearIndices(A))
    ord = Base.Perm(
        Base.ord(Base.isless, identity, nothing, Base.Order.Forward),
        vec(reshape(A, (:, 1)))
    )
    _sortperm_new!(ix, A, Rpre, dims, Rpost, ord)
    return ix
end

@inline function _sortperm_new!(ix::AbstractArray, A::AbstractArray, Rpre, dims, Rpost, order)
    for Ipost in Rpost, Ipre in Rpre
        sort!(view(ix, Ipre, :, Ipost); order)
    end
end

function sortperm_no_loop(A::AbstractArray; dims::Integer)
    ix = Base.copymutable(LinearIndices(A))
    order = Base.Perm(
        Base.ord(Base.isless, identity, nothing, Base.Order.Forward),
        vec(reshape(A, (:, 1)))
    )
    sort!_new(ix; dims, order)
    return ix
end

function sort!_new(A::AbstractArray;
    dims::Integer,
    alg::Base.Algorithm=Base.Sort.defalg(A),
    lt=isless,
    by=identity,
    rev::Union{Bool,Nothing}=nothing,
    order::Base.Ordering=Base.Forward) where {K}
    _sort!(A, Val(dims), alg, Base.ord(lt, by, rev, order))
end
function _sort!(A::AbstractArray, ::Val{K}, alg::Base.Algorithm, order::Base.Ordering) where {K}
    nd = ndims(A)

    1 <= K <= nd || throw(ArgumentError("dimension out of range"))

    remdims = ntuple(i -> i == K ? 1 : axes(A, i), nd)
    for idx in CartesianIndices(remdims)
        Av = view(A, ntuple(i -> i == K ? Colon() : idx[i], nd)...)
        sort!(Av, alg, order)
    end
    A
end

using BenchmarkTools
function test()
    for dims in 1:3
        a = rand(10, 10, 10)
        @info "dims = $dims"
        @info "PR"
        @btime sortperm_new($a; dims=$dims)
        @info "w/ sort!"
        @btime sortperm_no_loop($a; dims=$dims)

        perm_1 = sortperm_new(a; dims=dims)
        perm_2 = sortperm_no_loop(a; dims=dims)
        @assert perm_1 == perm_2
    end
end

test()

@pcjentsch
Copy link
Contributor Author

I've added several small style suggestions,

Thanks for catching all those!

but I also have a larger question of whether we can utilize sort!(ix, order=..., dims=dims) without sacrificing efficiency?

Yes it seems if we use your change to sort!, then a much shorter version with equivalent performance is possible. I'll add your new sort! to the PR then, if that is ok with you?

benchmarks
function sortperm_new(A::AbstractArray; dims::Integer)
    Rpre = CartesianIndices(size(A)[1:dims-1])
    Rpost = CartesianIndices(size(A)[dims+1:end])
    ix = Base.copymutable(LinearIndices(A))
    ord = Base.Perm(
        Base.ord(Base.isless, identity, nothing, Base.Order.Forward),
        vec(reshape(A, (:, 1)))
    )
    _sortperm_new!(ix, A, Rpre, dims, Rpost, ord)
    return ix
end

@inline function _sortperm_new!(ix::AbstractArray, A::AbstractArray, Rpre, dims, Rpost, order)
    for Ipost in Rpost, Ipre in Rpre
        sort!(view(ix, Ipre, :, Ipost); order)
    end
end

function sortperm_no_loop(A::AbstractArray; dims::Integer)
    ix = Base.copymutable(LinearIndices(A))
    order = Base.Perm(
        Base.ord(Base.isless, identity, nothing, Base.Order.Forward),
        vec(reshape(A, (:, 1)))
    )
    sort!_new(ix; dims, order)
    return ix
end

function sort!_new(A::AbstractArray;
    dims::Integer,
    alg::Base.Algorithm=Base.Sort.defalg(A),
    lt=isless,
    by=identity,
    rev::Union{Bool,Nothing}=nothing,
    order::Base.Ordering=Base.Forward) where {K}
    _sort!(A, Val(dims), alg, Base.ord(lt, by, rev, order))
end
function _sort!(A::AbstractArray, ::Val{K}, alg::Base.Algorithm, order::Base.Ordering) where {K}
    nd = ndims(A)

    1 <= K <= nd || throw(ArgumentError("dimension out of range"))

    remdims = ntuple(i -> i == K ? 1 : axes(A, i), nd)
    for idx in CartesianIndices(remdims)
        Av = view(A, ntuple(i -> i == K ? Colon() : idx[i], nd)...)
        sort!(Av, alg, order)
    end
    A
end

using BenchmarkTools
function test()
    for dims in 1:3
        a = rand(10, 10, 10)
        @info "dims = $dims"
        @info "PR"
        @btime sortperm_new($a; dims=$dims)
        @info "w/ sort!"
        @btime sortperm_no_loop($a; dims=$dims)

        perm_1 = sortperm_new(a; dims=dims)
        perm_2 = sortperm_no_loop(a; dims=dims)
        @assert perm_1 == perm_2
    end
end

test()

@pcjentsch
Copy link
Contributor Author

pcjentsch commented May 12, 2022

Weird, make fails on this sort! function when I try to build Julia with this PR. Tests all pass with Revise.track(Base) though.

~/Work/julia sortperm-array-16273:master ⇣59⇡3 ❯ make                                                                                                                                                    Py base 22:45:22
    JULIA usr/lib/julia/corecompiler.ji
error during bootstrap:
LoadError(at "compiler/compiler.jl" line 3: LoadError(at "sort.jl" line 1290: UndefVarError(var=:Val)))
ijl_undefined_var_error at /home/workaccount/Work/julia/src/rtutils.c:132
jl_eval_global_var at /home/workaccount/Work/julia/src/interpreter.c:150 [inlined]
eval_value at /home/workaccount/Work/julia/src/interpreter.c:196
do_call at /home/workaccount/Work/julia/src/interpreter.c:125
eval_value at /home/workaccount/Work/julia/src/interpreter.c:215
eval_stmt_value at /home/workaccount/Work/julia/src/interpreter.c:166 [inlined]
...

Can you reproduce that problem by any chance?

@pcjentsch pcjentsch force-pushed the sortperm-array-16273 branch from 93920e1 to 1ed2373 Compare May 12, 2022 06:26
@pcjentsch
Copy link
Contributor Author

Moving the Val argument to the last position in _sort! like this

function _sort!(A::AbstractArray, alg::Algorithm, order::Ordering, ::Val{K}) where K

fixes the issue, and it builds now.

I'll open an issue about that when I get the chance.

@LilithHafner
Copy link
Member

LoadError(at "compiler/compiler.jl" line 3: LoadError(at "sort.jl" line 1290: UndefVarError(var=:Val)))

Exports from Base aren't automatically available in Sort.jl due to bootstrapping. You need to add Val here to use it.

@pcjentsch
Copy link
Contributor Author

Exports from Base aren't automatically available in Sort.jl due to bootstrapping. You need to add Val here to use it.

Ah I did not know that. Thanks. It's odd that it worked with Val in the last argument, then.

@LilithHafner
Copy link
Member

It's odd that it worked with Val in the last argument, then.

I think the ci build still failed.

@pcjentsch pcjentsch force-pushed the sortperm-array-16273 branch from fe6c480 to ed09ca8 Compare May 16, 2022 19:16
Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks nice!

And very general. I wonder if it would be possible to merge this function with the sortperm(v::AbstractVector; kw...) function and handle the AbstractVector case and the AbstractArray case simultaneously.

Would it also be convenient to add sortperm! with dims arg here or is that out of scope?

@pcjentsch
Copy link
Contributor Author

And very general. I wonder if it would be possible to merge this function with the sortperm(v::AbstractVector; kw...) function and handle the AbstractVector case and the AbstractArray case simultaneously.

I think we would have to refactor sort! similarly for it to be possible to handle both cases in a nice way. Otherwise we need to have two cases which each call sort! with and without dims depending on the type of the input.

Would it also be convenient to add sortperm! with dims arg here or is that out of scope?

Yep added.

Thanks for the other comments as well.

Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise we need to have two cases which each call sort! with and without dims depending on the type of the input.

Perhaps we could accept dims as a vararg kw..., error if (A <: AbstractVector) == (dims in keys(kw)), and pass kw... on to sort!

@LilithHafner
Copy link
Member

Hi @pcjentsch, I'm going away for the next couple of weeks so I won't be able to continue with this until early June. Feel free to ping oscardssmith, nalimilan, vtjnash, or mbauman if you want someone to review it before then.

@pcjentsch
Copy link
Contributor Author

thanks for letting me know Lilith, @oscardssmith I would appreciate a review if you have the chance!

@oscardssmith
Copy link
Member

The implementation looks good. There is some formatting changes that I'm not sure how I feel about, but I don't really care.

@pcjentsch
Copy link
Contributor Author

what else is needed for merge here?

@oscardssmith
Copy link
Member

cc @LilithHafner

Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for your perseverance, @pcjentsch!

Sorry I was away from this PR for so long, thanks for the reminders, @pcjentsch and @oscardssmith. The functionality looks good to me too!

I'm not a fan of some of the formatting changes, but also don't really care.

One thing that should be changed before merging are the docstrings: I recommend revising the existing docstrings for sortperm[!](v::AbstractVector...) rather than replacing them with the docstrings for sortperm[!](A::AbstractArray...).

You will also need to resolve the merge conflicts since these files have changed since you opened this PR. Let me know if you have any confusion about that, or feel free to just guess and I can review the merge.

@pcjentsch pcjentsch force-pushed the sortperm-array-16273 branch from b710728 to 8cae75a Compare June 16, 2022 06:32
@pcjentsch pcjentsch force-pushed the sortperm-array-16273 branch from 8cae75a to f8966c8 Compare June 16, 2022 06:35
Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just some minor formatting issues now.

base/sort.jl Outdated

# Examples
```jldoctest
julia> v = [3, 1, 2];

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep these newlines to match REPL formatting. I care about style here because it is public facing documentation.

Co-authored-by: Lilith Orion Hafner <60898866+LilithHafner@users.noreply.github.com>
@LilithHafner
Copy link
Member

Aside from the new merge conflicts (which I believe should be straightforward), some style changes I'm mixed about but that I also don't care enough about to object to, and the freebsd64 CI fail which looks unrelated, this looks good to me!

@pcjentsch
Copy link
Contributor Author

Alright, merge conflicts sorted. I guess "style changes" refers to the spacing between operators? I reverted those.

Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are primarily what I meant by "style changes", but I now realize that we also need to pass through workspace which is more than a style change.

Co-authored-by: Lilith Orion Hafner <60898866+LilithHafner@users.noreply.github.com>
Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks, @pcjentsch!

@LilithHafner LilithHafner added the merge me PR is reviewed. Merge when all tests are passing label Jun 17, 2022
@LilithHafner
Copy link
Member

The apple test received a cancelation signal and the win32 test got connection reset by peer. Rerunning failed builds just to be sure.

@LilithHafner
Copy link
Member

Oops! Sorry about that.

@LilithHafner
Copy link
Member

Update: I can't figure out how to rerun the builds/get CI to pass cleanly, but I'm pretty sure the failures do not have to do with this PR. @oscardssmith, can you confirm?

@oscardssmith
Copy link
Member

the CI failures do look unrelated. I think we can merge this.

@LilithHafner
Copy link
Member

I agree that we can merge this.

@oscardssmith oscardssmith merged commit fe9ac99 into JuliaLang:master Jun 22, 2022
@LilithHafner LilithHafner removed the merge me PR is reviewed. Merge when all tests are passing label Jun 22, 2022
@LilithHafner
Copy link
Member

Thanks, @pcjentsch!

@pcjentsch
Copy link
Contributor Author

cool, thank you @oscardssmith and especially @LilithHafner for your thorough reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] sorting Put things in order
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants