Open
Description
For binary classification (and perhaps multiclass classification) the log loss
can be infinite. The log loss reduction
can also be negative infinity, as it is a shifting and rescaling of the log loss
.
Similarly, the log loss
can be a NaN
. This is specifically guarded against in the code, but does seems like a bug too.
The culprit for both cases lies in the initial calculations in the ProcessRow()
method of the Aggregator
for the BinaryClassifierEvaluator
.
Double logloss;
if (!Single.IsNaN(prob))
{
if (_label > 0)
{
// REVIEW: Should we bring back the option to use ln instead of log2?
logloss = -Math.Log(prob, 2);
}
else
logloss = -Math.Log(1.0 - prob, 2);
}
else
logloss = Double.NaN;
I propose that to guard against infinities we add an epsilon before taking the log.
To guard against NaNs
, we will need to fix the probability calculations (e.g. in the calibrator(s)).