Description
🐛 Bug
In unittests sklearn's recall_score
and precision_score
is being used as a reference . So even if in _reference_sklearn_precision_recall_multiclass()
function remove_ignore_index
function is being used for removing those predictions whose real values are ignore_index class before passing it to recall_score
function, it does not matter. Because whenever average='macro'
sklearn's recall_score
and precision_score
will always return mean cosidering the total no. of classes (as we are passing all the classes in recall_score()
and precision_score()
function's labels
argument).
To Reproduce
#2441 issue already talks about the wrong behaviour of MulticlassRecall macro average when ignore_index is specified. Although ignore_index is getting tested, but for it's wrong implementation testcase got passed.
same error for multiclass precision
### Code Example for Multiclass Precision
import torch
from torchmetrics.classification import MulticlassPrecision
metric = MulticlassPrecision(num_classes=2, ignore_index=0, average="none")
y_true = torch.tensor([0, 0, 1, 1])
# Predicted probabilities (logits)
y_pred = torch.tensor([
[0.9, 0.1], # Correctly predicted as class 0
[0.9, 0.1], # Correctly predicted as class 0
[0.9, 0.1], # Incorrectly predicted as class 0 (should be class 1)
[0.1, 0.9], # Correctly predicted as class 1
])
metric.update(y_pred, y_true)
precision_result = metric.compute()
print(precision_result) # tensor([0., 1.])
import torch
from torchmetrics.classification import MulticlassPrecision
metric = MulticlassPrecision(num_classes=2, ignore_index=0, average="macro")
y_true = torch.tensor([0, 0, 1, 1])
# Predicted probabilities (logits)
y_pred = torch.tensor([
[0.9, 0.1], # Correctly predicted as class 0
[0.9, 0.1], # Correctly predicted as class 0
[0.9, 0.1], # Incorrectly predicted as class 0 (should be class 1)
[0.1, 0.9], # Correctly predicted as class 1
])
metric.update(y_pred, y_true)
precision_result = metric.compute()
print(precision_result) # tensor(0.5000) , expected: tensor(1.0)
Expected behavior
import numpy as np
from sklearn.metrics import precision_score
y_true = np.array([0, 0, 1, 1])
# Predicted probabilities (logits)
y_pred_probs = np.array([
[0.9, 0.1], # Correctly predicted as class 0
[0.9, 0.1], # Correctly predicted as class 0
[0.9, 0.1], # Incorrectly predicted as class 0 (should be class 1)
[0.1, 0.9], # Correctly predicted as class 1
])
# Convert predicted probabilities to predicted classes
y_pred = np.argmax(y_pred_probs, axis=1)
precision = precision_score(y_true, y_pred, average='macro', labels = [1]) #only considering label 1, i.e. ignoring label 0
print(f"Multiclass Precision: {precision:.2f}") #1.00
Environment
- TorchMetrics version : 1.5.1
- Python version: 3.10.12
- OS : ubuntu