Skip to content

GeneralizedDiceScore yields 0 scores when using per_class=True for samples where class is not present #2846

Open
@nkaenzig

Description

@nkaenzig

🐛 Bug

The current implementation of GeneralizedDiceScore yields scores of 0.0 for samples that don't contain a particular class when calculating class-wise metrics via per_class=True.

This leads to very low dice scores, particularly for rare classes and therefore makes the dice scores between classes incomparable.

To Reproduce

The following code sample calculates class-wise scores of tensor([0.2500, 0.2500, 0.0000]), even though all the predictions match the targets:

Code sample
import torch
from torchmetrics.segmentation import GeneralizedDiceScore
from torchmetrics.segmentation import DiceScore

N_SAMPLES = 4
N_CLASSES = 3

target = torch.full((N_SAMPLES, N_CLASSES, 128, 128), 0, dtype=torch.int8)
preds = torch.full((N_SAMPLES, N_CLASSES, 128, 128), 0, dtype=torch.int8)

target[0, 0], preds[0, 0] = 1, 1 
target[2, 1], preds[2, 1]  = 1, 1

generalized_dice = GeneralizedDiceScore(num_classes=3, per_class=True, include_background=True)
print(generalized_dice(preds, target))

Expected behavior

I'd expect the above code sample to return [1.0, 1.0, nan] for the class-wise scores (nan for the third class, given that this class is not present in any of the samples, therefore returning a 1.0 score might also be misleading). Also, samples where the class doesn't occur should not contribute to the dice score of that class.

Environment

  • TorchMetrics version (if build from source, add commit SHA): 1.6.0
  • Python & PyTorch Version (e.g., 1.0): 3.11.10
  • Any other relevant information such as OS (e.g., Linux): macOS 15.1.1 (24B91)

Additional context

Very similar to issue #2850.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions