Skip to content

Reference Metric in multiclass pecision recall unittests provides wrong answer when ignore_index is specified with average = 'macro' #2828

Open
@rittik9

Description

@rittik9

🐛 Bug

In unittests sklearn's recall_score and precision_score is being used as a reference . So even if in _reference_sklearn_precision_recall_multiclass() function remove_ignore_index function is being used for removing those predictions whose real values are ignore_index class before passing it to recall_score function, it does not matter. Because whenever average='macro' sklearn's recall_score and precision_score will always return mean cosidering the total no. of classes (as we are passing all the classes in recall_score() and precision_score() function's labels argument).

To Reproduce

#2441 issue already talks about the wrong behaviour of MulticlassRecall macro average when ignore_index is specified. Although ignore_index is getting tested, but for it's wrong implementation testcase got passed.

same error for multiclass precision
### Code Example for Multiclass Precision

import torch
from torchmetrics.classification import MulticlassPrecision

metric = MulticlassPrecision(num_classes=2, ignore_index=0, average="none")

y_true = torch.tensor([0, 0, 1, 1])

# Predicted probabilities (logits)
y_pred = torch.tensor([
    [0.9, 0.1],  # Correctly predicted as class 0
    [0.9, 0.1],  # Correctly predicted as class 0
    [0.9, 0.1],  # Incorrectly predicted as class 0 (should be class 1)
    [0.1, 0.9],  # Correctly predicted as class 1
])

metric.update(y_pred, y_true)
precision_result = metric.compute()
print(precision_result)  # tensor([0., 1.])
import torch
from torchmetrics.classification import MulticlassPrecision

metric = MulticlassPrecision(num_classes=2, ignore_index=0, average="macro")

y_true = torch.tensor([0, 0, 1, 1])

# Predicted probabilities (logits)
y_pred = torch.tensor([
    [0.9, 0.1],  # Correctly predicted as class 0
    [0.9, 0.1],  # Correctly predicted as class 0
    [0.9, 0.1],  # Incorrectly predicted as class 0 (should be class 1)
    [0.1, 0.9],  # Correctly predicted as class 1
])

metric.update(y_pred, y_true)
precision_result = metric.compute()
print(precision_result)  # tensor(0.5000) , expected: tensor(1.0)

Expected behavior

import numpy as np
from sklearn.metrics import precision_score

y_true = np.array([0, 0, 1, 1])

# Predicted probabilities (logits)
y_pred_probs = np.array([
    [0.9, 0.1],  # Correctly predicted as class 0
    [0.9, 0.1],  # Correctly predicted as class 0
    [0.9, 0.1],  # Incorrectly predicted as class 0 (should be class 1)
    [0.1, 0.9],  # Correctly predicted as class 1
])

# Convert predicted probabilities to predicted classes
y_pred = np.argmax(y_pred_probs, axis=1)

precision = precision_score(y_true, y_pred, average='macro', labels = [1]) #only considering label 1, i.e. ignoring label 0
print(f"Multiclass Precision: {precision:.2f}") #1.00

Environment

  • TorchMetrics version : 1.5.1
  • Python version: 3.10.12
  • OS : ubuntu

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions