Calculating multiple TopK accuracies is slow/inefficient

When calculating the top-k accuracy for multi-class classifiers the current evaluation code is pretty slow. Especially if multiple Top-K accuracies are desired (this will re-do a lot of the work unnecessarily).

Current implementation will first sort (`N logN`):

* https://github.com/dotnet/machinelearning/blob/f85e722fbd6b1710d104e85e6b3bcef4e593b5d2/src/Microsoft.ML.Data/Evaluators/MulticlassClassifierEvaluator.cs#L442-L446

Then get the index:

* https://github.com/dotnet/machinelearning/blob/f85e722fbd6b1710d104e85e6b3bcef4e593b5d2/src/Microsoft.ML.Data/Evaluators/MulticlassClassifierEvaluator.cs#L345-L350

A more efficient algorithm would be to just calculate the rank (`O(N)` and no memory needed) of the correct label and keeping track of the seen ranks (0 being the best-case, correct prediction). Then the Top-k accuracy can be easily returned for any `k`. Possibly changing the API to returning a vector of top-k predictions.

One issue to discuss: What happens when the score is equal for multiple values? There would be a "best-case" and "worst-case" top-k accuracy.

	if (Utils.Size(_indicesArr) < _scoresArr.Length)
	_indicesArr = new int[_scoresArr.Length];
	int j = 0;
	foreach (var index in Enumerable.Range(0, _scoresArr.Length).OrderByDescending(i => _scoresArr[i]))
	_indicesArr[j++] = index;

	if (OutputTopKAcc > 0)
	{
	int idx = Array.IndexOf(indices, label);
	if (0 <= idx && idx < OutputTopKAcc)
	_numCorrectTopK += weight;
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Calculating multiple TopK accuracies is slow/inefficient #744

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Calculating multiple TopK accuracies is slow/inefficient #744

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions