Skip to content

topn for efficiently doing sorted head/tail #3804

@MichaelChirico

Description

@MichaelChirico

Inspired by Matt's observation here: e1ac663

DT[topn(score, 5L)] also looks nicer than DT[order(score)[1:5]] or DT[order(score)][1:5].

A quick search suggests two possible implementations which might be better in one situation or another:

https://stackoverflow.com/questions/4956593/optimal-algorithm-for-returning-top-k-values-from-an-array-of-length-n

Will take a look at feasibility to implement cleanly.

Just looked now and dplyr also has top_n but seems they implement it inefficiently:

dplyr:::top_n_rank
function (n, wt) 
{
    if (n > 0) {
        min_rank(desc(wt)) <= n
    }
    else {
        min_rank(wt) <= abs(n)
    }
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions