Skip to content

Calculating multiple TopK accuracies is slow/inefficient #744

Closed

Description

When calculating the top-k accuracy for multi-class classifiers the current evaluation code is pretty slow. Especially if multiple Top-K accuracies are desired (this will re-do a lot of the work unnecessarily).

Current implementation will first sort (N logN):

Then get the index:

A more efficient algorithm would be to just calculate the rank (O(N) and no memory needed) of the correct label and keeping track of the seen ranks (0 being the best-case, correct prediction). Then the Top-k accuracy can be easily returned for any k. Possibly changing the API to returning a vector of top-k predictions.

One issue to discuss: What happens when the score is equal for multiple values? There would be a "best-case" and "worst-case" top-k accuracy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    P2Priority of the issue for triage purpose: Needs to be fixed at some point.enhancementNew feature or requestgood first issueGood for newcomersperfPerformance and Benchmarking relatedup-for-grabsA good issue to fix if you are trying to contribute to the project

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions