Description
🐛 Bug
I'm filing this is a bug, but perhaps this is actually just an important documentation issue.
Standard usage of Accuracy, Precision, Recall, and F1Score on multiclass produce identical results.
I am aware of #1717 but want to revisit this.
To Reproduce
Here is code (adapted from SO) replicating what happens when I've try to use torchmetrics in lightning:
import torch
import torchmetrics
from torchmetrics import MetricTracker, MetricCollection
from torchmetrics import Accuracy, F1Score, Precision, Recall, CohenKappa
num_classes = 3
list_of_metrics = [Accuracy(task="multiclass", num_classes=num_classes, average="micro"),
F1Score(task="multiclass", num_classes=num_classes),
Precision(task="multiclass",num_classes=num_classes),
Recall(task="multiclass",num_classes=num_classes),
]
maximize_list=[True,True,True,True]
metric_coll = MetricCollection(list_of_metrics)
tracker = MetricTracker(metric_coll, maximize=maximize_list)
pred = torch.Tensor([[0,.1,.5], # 2
[0,.1,.5], # 2
[0,.1,.5], # 2
[0,.1,.5], # 2
[0,.9,.1], # 1
[.9,.1,0]]) # 1
label = torch.Tensor([2,2,2,0,2,1])
tracker.increment()
tracker.update(pred, label)
for key, val in tracker.compute_all().items():
print(key,val)
gives
MulticlassAccuracy tensor([0.5000])
MulticlassF1Score tensor([0.5000])
MulticlassPrecision tensor([0.5000])
MulticlassRecall tensor([0.5000])
Expected behavior
Let's do recall:
- Class 2 we recall 3/4
- Class 1 we recall 0/1
- Class 0 we recall 0/1
That is recall 0.25.
MulticlassRecall docs (and other multiclass docs) indicate that average='macro'
by default. But in fact they are not. They are 'micro' (!?).
This is wrong for several reasons:
- Nasty gotcha.
- Not really a best practice either.
- Intimidating to people new to the library.
Environment
- TorchMetrics version (and how you installed TM, e.g.
conda
,pip
, build from source): main, and 1.2.1, pip - Python & PyTorch Version (e.g., 1.0): 2.x
- Any other relevant information such as OS (e.g., Linux): OSX
Additional context
Every few months, I come back to torchmetrics and stumble on this and then switch to sklearn.metrics.
I follow the tutorial, eagerly add Multiclass, and puzzle over the docs.
I'm happy this is done, but a smooth quickstart is important for attracting users. If even I, lightning fanboy, have shunned this library, then that say others have too.