-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
When calculating the top-k accuracy for multi-class classifiers the current evaluation code is pretty slow. Especially if multiple Top-K accuracies are desired (this will re-do a lot of the work unnecessarily).
Current implementation will first sort (N logN
):
machinelearning/src/Microsoft.ML.Data/Evaluators/MulticlassClassifierEvaluator.cs
Lines 442 to 446 in f85e722
if (Utils.Size(_indicesArr) < _scoresArr.Length) _indicesArr = new int[_scoresArr.Length]; int j = 0; foreach (var index in Enumerable.Range(0, _scoresArr.Length).OrderByDescending(i => _scoresArr[i])) _indicesArr[j++] = index;
Then get the index:
machinelearning/src/Microsoft.ML.Data/Evaluators/MulticlassClassifierEvaluator.cs
Lines 345 to 350 in f85e722
if (OutputTopKAcc > 0) { int idx = Array.IndexOf(indices, label); if (0 <= idx && idx < OutputTopKAcc) _numCorrectTopK += weight; }
A more efficient algorithm would be to just calculate the rank (O(N)
and no memory needed) of the correct label and keeping track of the seen ranks (0 being the best-case, correct prediction). Then the Top-k accuracy can be easily returned for any k
. Possibly changing the API to returning a vector of top-k predictions.
One issue to discuss: What happens when the score is equal for multiple values? There would be a "best-case" and "worst-case" top-k accuracy.