Skip to content

Multiclass Accuracy, Precision, Recall, and F1Score the same (documentation issue?) #2280

Open
@turian

Description

@turian

🐛 Bug

I'm filing this is a bug, but perhaps this is actually just an important documentation issue.

Standard usage of Accuracy, Precision, Recall, and F1Score on multiclass produce identical results.

I am aware of #1717 but want to revisit this.

To Reproduce

Here is code (adapted from SO) replicating what happens when I've try to use torchmetrics in lightning:

import torch
import torchmetrics
from torchmetrics import MetricTracker, MetricCollection
from torchmetrics import Accuracy, F1Score, Precision, Recall, CohenKappa

num_classes = 3

list_of_metrics = [Accuracy(task="multiclass", num_classes=num_classes, average="micro"),
                   F1Score(task="multiclass", num_classes=num_classes),
                   Precision(task="multiclass",num_classes=num_classes),
                   Recall(task="multiclass",num_classes=num_classes),
                   ]

maximize_list=[True,True,True,True]

metric_coll = MetricCollection(list_of_metrics)
tracker = MetricTracker(metric_coll, maximize=maximize_list)


pred = torch.Tensor([[0,.1,.5], # 2
                     [0,.1,.5],  # 2
                     [0,.1,.5],  # 2
                     [0,.1,.5],  # 2
                     [0,.9,.1],  # 1
                     [.9,.1,0]]) # 1

label = torch.Tensor([2,2,2,0,2,1])

tracker.increment()
tracker.update(pred, label)

for key, val in tracker.compute_all().items():
    print(key,val)

gives

MulticlassAccuracy tensor([0.5000])
MulticlassF1Score tensor([0.5000])
MulticlassPrecision tensor([0.5000])
MulticlassRecall tensor([0.5000])

Expected behavior

Let's do recall:

  • Class 2 we recall 3/4
  • Class 1 we recall 0/1
  • Class 0 we recall 0/1

That is recall 0.25.

MulticlassRecall docs (and other multiclass docs) indicate that average='macro' by default. But in fact they are not. They are 'micro' (!?).

This is wrong for several reasons:

  • Nasty gotcha.
  • Not really a best practice either.
  • Intimidating to people new to the library.

Environment

  • TorchMetrics version (and how you installed TM, e.g. conda, pip, build from source): main, and 1.2.1, pip
  • Python & PyTorch Version (e.g., 1.0): 2.x
  • Any other relevant information such as OS (e.g., Linux): OSX

Additional context

Every few months, I come back to torchmetrics and stumble on this and then switch to sklearn.metrics.

I follow the tutorial, eagerly add Multiclass, and puzzle over the docs.

I'm happy this is done, but a smooth quickstart is important for attracting users. If even I, lightning fanboy, have shunned this library, then that say others have too.

Metadata

Metadata

Assignees

Labels

help wantedExtra attention is neededquestionFurther information is requestedv1.2.x

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions