Multiclass Accuracy, Precision, Recall, and F1Score the same (documentation issue?)

## 🐛 Bug

I'm filing this is a bug, but perhaps this is actually just an important documentation issue.

Standard usage of Accuracy, Precision, Recall, and F1Score on multiclass produce identical results.

I am aware of https://github.com/Lightning-AI/torchmetrics/issues/1717 but want to revisit this.


### To Reproduce

Here is code (adapted from [SO](https://stackoverflow.com/questions/76198399/torchmetrics-multiclass-f1score-same-results-as-accuracy)) replicating what happens when I've try to use torchmetrics in lightning:

```py
import torch
import torchmetrics
from torchmetrics import MetricTracker, MetricCollection
from torchmetrics import Accuracy, F1Score, Precision, Recall, CohenKappa

num_classes = 3

list_of_metrics = [Accuracy(task="multiclass", num_classes=num_classes, average="micro"),
                   F1Score(task="multiclass", num_classes=num_classes),
                   Precision(task="multiclass",num_classes=num_classes),
                   Recall(task="multiclass",num_classes=num_classes),
                   ]

maximize_list=[True,True,True,True]

metric_coll = MetricCollection(list_of_metrics)
tracker = MetricTracker(metric_coll, maximize=maximize_list)


pred = torch.Tensor([[0,.1,.5], # 2
                     [0,.1,.5],  # 2
                     [0,.1,.5],  # 2
                     [0,.1,.5],  # 2
                     [0,.9,.1],  # 1
                     [.9,.1,0]]) # 1

label = torch.Tensor([2,2,2,0,2,1])

tracker.increment()
tracker.update(pred, label)

for key, val in tracker.compute_all().items():
    print(key,val)
```

gives

```py
MulticlassAccuracy tensor([0.5000])
MulticlassF1Score tensor([0.5000])
MulticlassPrecision tensor([0.5000])
MulticlassRecall tensor([0.5000])
```

### Expected behavior

Let's do recall:

* Class 2 we recall 3/4
* Class 1 we recall 0/1
* Class 0 we recall 0/1

That is recall 0.25.

[MulticlassRecall](https://lightning.ai/docs/torchmetrics/stable/classification/recall.html#torchmetrics.classification.MulticlassRecall) docs (and other multiclass docs) indicate that `average='macro'` by default. But in fact they are not. They are 'micro' (!?).

This is wrong for several reasons:
* Nasty gotcha.
* Not really a best practice either.
* Intimidating to people new to the library.

### Environment

- TorchMetrics version (and how you installed TM, e.g. `conda`, `pip`, build from source): main, and 1.2.1, pip
- Python & PyTorch Version (e.g., 1.0): 2.x
- Any other relevant information such as OS (e.g., Linux): OSX

### Additional context

Every few months, I come back to torchmetrics and stumble on this and then switch to sklearn.metrics.

I follow the tutorial, eagerly add Multiclass, and puzzle over the docs.

I'm happy this is done, but a smooth quickstart is important for attracting users. If even I, lightning fanboy, have shunned this library, then that say others have too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiclass Accuracy, Precision, Recall, and F1Score the same (documentation issue?) #2280

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multiclass Accuracy, Precision, Recall, and F1Score the same (documentation issue?) #2280

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions