Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use average="none" in v0.11? #1425

Closed
carbocation opened this issue Jan 3, 2023 · 9 comments · Fixed by #2725
Closed

Cannot use average="none" in v0.11? #1425

carbocation opened this issue Jan 3, 2023 · 9 comments · Fixed by #2725
Assignees
Labels
help wanted Extra attention is needed question Further information is requested
Milestone

Comments

@carbocation
Copy link

carbocation commented Jan 3, 2023

🐛 Bug

Using version 0.11 and trying to use Dice with average="none", I get the following error:

ValueError: The `reduce` none is not valid.

Additional context

There is a check for the allowed average values in the Dice class. This permits "none".

However, comparing v0.11 and v0.10.3, it looks like a second check was added further down in the class during this PR.

if average not in ["micro", "macro", "samples"]:
            raise ValueError(f"The `reduce` {average} is not valid.")

Since that doesn't check for average of "none", and since average isn't redefined above it, I think that secondary check added in #1252 causes the bug.

@carbocation carbocation added bug / fix Something isn't working help wanted Extra attention is needed labels Jan 3, 2023
@github-actions
Copy link

github-actions bot commented Jan 3, 2023

Hi! thanks for your contribution!, great first issue!

@Borda Borda added question Further information is requested and removed bug / fix Something isn't working labels Jan 4, 2023
@blazejdolicki
Copy link

Hey,

Is there any progress on this? :)

I stumbled upon the same problem and needed a quick fix, so I dug a bit deeper. It seems more complicated than removing this if statement. For TLDR, I wrote a quick and dirty fix which allows to instantiate the Dice class with average set to weighted or none without any errors. However, I did not check if the computed score is actually correct. Feel free to check it out:
blazejdolicki@5a22643

Furthermore, I don't understand some changes made by #1252 in __init__ of Dice. For instance, right now it assigns kwargs["reduce"] only after calling super().init(**kwargs) so essentially the new value of kwargs["reduce"] is never used. I have very little idea about the library implementation, but this doesn't seem right. Looking at this PR it seems that "weighted" and "none" average weren't taken into account at all in the changes.

Please let me know if I'm missing something.

@blazejdolicki
Copy link

Update: When updating the metric state, I ran into a new error regarding shape mismatch of self.tp. It seems that I needed to change how self.reduce was initialized. My guess is that self.reduce used to be initialized in the super class (Metric), but in #1252 it was moved to Dice class, but with incorrect initalization. Here is a commit with the update: blazejdolicki@f94b9e1
Now the training script finishes without errors.

@SkafteNicki
Copy link
Member

Hi everyone, thanks for reporting this issue.
We are actually in the process of completely deprecating Dice as it is mathematically the same as F1Score. Therefore please use that metric instead. We have some work thats been going on in PR #1090 (that is sadly a bit stagnated at the moment) that is going to deprecate Dice but instead add GeneralizedDice as an metric.

@Jesse-Phitidis
Copy link

Jesse-Phitidis commented Feb 17, 2023

Hi everyone, thanks for reporting this issue. We are actually in the process of completely deprecating Dice as it is mathematically the same as F1Score. Therefore please use that metric instead. We have some work thats been going on in PR #1090 (that is sadly a bit stagnated at the moment) that is going to deprecate Dice but instead add GeneralizedDice as an metric.

I cannot get the expected Dice result using MulticlassF1Score. See below this toy example where I calculate the Dice for class 1 and 2 (ignoring 0). My implementation and the MONAI implementation agree, but the F1 score does not (and also still returns a tensor with length of classes including 0).

from monai.metrics import DiceMetric as dice
from torchmetrics.classification import MulticlassF1Score as F1_score
import torch
from torch.nn.functional import one_hot

calc = dice(include_background=False, reduction='mean_batch')
F1 = F1_score(num_classes=3, multidim_average='global', average='none', ignore_index=0)

def with_monai(pred, gt, calc):
    pred = one_hot(pred, num_classes=3)
    gt = one_hot(gt, num_classes=3)
    pred = torch.permute(pred, (0,3,1,2))
    gt = torch.permute(gt, (0,3,1,2))
    return calc(pred, gt)

def my_dice(pred, gt):
    pred1map = (pred==1).type(torch.int64)
    pred2map = (pred==2).type(torch.int64)
    gt1map = (gt==1).type(torch.int64)
    gt2map = (gt==2).type(torch.int64)
    c1 = 2*torch.sum(pred1map * gt1map)/ (torch.sum(pred1map) + torch.sum(gt1map))
    c2 = 2*torch.sum(pred2map * gt2map)/ (torch.sum(pred2map) + torch.sum(gt2map))
    return torch.tensor([c1, c2])

test_pred = torch.tensor([[[1, 1], [2, 0]]]).type(torch.int64)
test_gt = torch.tensor([[[1, 0], [2, 1]]]).type(torch.int64)

print(test_pred.shape)
# torch.Size([1, 2, 2])

print(test_pred)
# tensor([[[1, 1],
           [2, 0]]])

print(test_gt)
# tensor([[[1, 0],
           [2, 1]]])

print(F1(test_pred, test_gt))
# tensor([0.0000, 0.6667, 1.0000])

print(with_monai(test_pred, test_gt, calc))
# tensor([[0.5000, 1.0000]])

print(my_dice(test_pred, test_gt))
# tensor([0.5000, 1.0000])

@SkafteNicki SkafteNicki added this to the v1.2.0 milestone Aug 18, 2023
@SkafteNicki SkafteNicki modified the milestones: v1.2.0, v1.3.0 Aug 30, 2023
@Jesse-Phitidis
Copy link

Jesse-Phitidis commented Sep 13, 2023 via email

@Borda Borda changed the title Cannot use average="none" in version 0.11? Cannot use average="none" in v0.11? Sep 14, 2023
@Borda
Copy link
Member

Borda commented Sep 14, 2023

The issue is due to my misunderstanding of the ignore_index argument.

Would you think that some rewording is needed?

@Jesse-Phitidis
Copy link

Jesse-Phitidis commented Sep 14, 2023 via email

@lirfu
Copy link

lirfu commented Sep 25, 2023

I can confirm same behaviour of ignore_index for other multiclass classification metrics (like precision, recall, f1) on v1.2.0.

When using the ignore_index value, that class is simply filtered out from the labels, so the corresponding row of the confusion matrix is zero. Filtering is done here: stat_scores.py

When calculating with macro averaging, the value is biased towards 0 because the TP of ignored class is 0, leading to overall score of 0. Averaging is performed here: compute.py

The weighted averaging removes this problem because the weight of the ignored class is set to 0, but then the other classes are re-weighted as well according to their frequency.

I agree with @Jesse-Phitidis, the current implementation either accounts for all classes (bias towards class imbalance) or doesn't account for false-positives (promotes trivial classifiers). The parameter ignore_class sounds like it ignores that class from the final averaging in macro or weighted (which would give the more complete result), but actually just ignores any TP&FP for that class: If y=0 and model says 1, that is ok, (but not vice-versa).

My suggestion is to keep the current ignore_index (with more elaborate docs on its effect) for backwards compatibility of any unforseen usages people might have for it and add a new parameter background_index to stop the background class from contributing to the final averaging macro and weighted, while keeping its influence on the FPs. Combining the two parameters also fixes the bias towards 0 of the current implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants
@carbocation @Borda @justusschock @lirfu @SkafteNicki @blazejdolicki @Jesse-Phitidis and others