Improve the accuracy of Classification models by using SOTA recipes and primitives

## 🚀 Feature
Update the weights of all pre-trained models to improve their accuracy.

## Motivation

<h4 id="new-recipe-with-fixres">New Recipe + FixRes mitigations</h4>

```
torchrun --nproc_per_node=8 train.py --model $MODEL_NAME --batch-size 128 --lr 0.5 \
--lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear \
--auto-augment ta_wide --epochs 600 --random-erase 0.1 --weight-decay 0.00002 \
--norm-weight-decay 0.0 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 \
--train-crop-size 176 --model-ema --val-resize-size 232
```

Using a recipe which includes Warmup, Cosine Annealing, Label Smoothing, Mixup, Cutmix, Random Erasing, TrivialAugment, No BN weight decay, EMA and long training cycles and optional FixRes mitigations we are able to improve the `resnet50` accuracy by over 4.5 points. For more information on the training recipe, check [here](https://pytorch.org/blog/how-to-train-state-of-the-art-models-using-torchvision-latest-primitives/):
```
Old ResNet50:
Acc@1 76.130 Acc@5 92.862

New ResNet50:
Acc@1 80.674 Acc@5 95.166
```

Running other models through the same recipe, achieves the following improved accuracies:
```
ResNet101:
Acc@1 81.728 Acc@5 95.670

ResNet152:
Acc@1 82.042 Acc@5 95.926

ResNeXt50_32x4d:
Acc@1 81.116 Acc@5 95.478

ResNeXt101_32x8d:
Acc@1 82.834 Acc@5 96.228

MobileNetV3 Large:
Acc@1 74.938 Acc@5 92.496

Wide ResNet50 2:
Acc@1 81.602 Acc@5 95.758 (@prabhat00155)

Wide ResNet101 2:
Acc@1 82.492 Acc@5 96.110 (@prabhat00155)

regnet_x_400mf:
Acc@1 74.864 Acc@5 92.322 (@kazhang)

regnet_x_800mf:
Acc@1 77.522 Acc@5 93.826 (@kazhang)

regnet_x_1_6gf:
Acc@1 79.668 Acc@5 94.922 (@kazhang)
```

<h4 id="new-recipe">New Recipe (without FixRes mitigations)</h4>

```
torchrun --nproc_per_node=8 train.py --model $MODEL_NAME --batch-size 128 --lr 0.5 \
--lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear \
--auto-augment ta_wide --epochs 600 --random-erase 0.1 --weight-decay 0.00002 \
--norm-weight-decay 0.0 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 \
--model-ema --val-resize-size 232
```

Removing the optional FixRes mitigations seems to yield better results for some deeper architectures and variants with larger receptive fields:
```
ResNet101:
Acc@1 81.886 Acc@5 95.780

ResNet152:
Acc@1 82.284 Acc@5 96.002

ResNeXt50_32x4d:
Acc@1 81.198 Acc@5 95.340

ResNeXt101_32x8d:
Acc@1 82.812 Acc@5 96.226

MobileNetV3 Large:
Acc@1 75.152 Acc@5 92.634

Wide ResNet50_2:
Acc@1 81.452 Acc@5 95.544 (@prabhat00155)

Wide ResNet101_2:
Acc@1 82.510 Acc@5 96.020 (@prabhat00155)

regnet_x_3_2gf:
Acc@1 81.196 Acc@5 95.430

regnet_x_8gf:
Acc@1 81.682 Acc@5 95.678

regnet_x_16g:
Acc@1 82.716 Acc@5 96.196

regnet_x_32gf:
Acc@1 83.014 Acc@5 96.288

regnet_y_400mf:
Acc@1 75.804 Acc@5 92.742

regnet_y_800mf:
Acc@1 78.828 Acc@5 94.502

regnet_y_1_6gf:
Acc@1 80.876 Acc@5 95.444

regnet_y_3_2gf:
Acc@1 81.982 Acc@5 95.972

regnet_y_8gf:
Acc@1 82.828 Acc@5 96.330

regnet_y_16gf:
Acc@1 82.886 Acc@5 96.328

regnet_y_32gf:
Acc@1 83.368 Acc@5 96.498
```

<h4 id="new-recipe-with-reg-tuning">New Recipe + Regularization tuning</h4>

```
torchrun --nproc_per_node=8 train.py --model $MODEL_NAME --batch-size 128 --lr 0.5 \
--lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear \
--auto-augment ta_wide --epochs 600 --random-erase 0.1 --weight-decay 0.00001 \
--norm-weight-decay 0.0 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 \
--model-ema --val-resize-size 232
```

Adjusting slightly the regularization can help us improve the following:
```
MobileNetV3 Large:
Acc@1 75.274 Acc@5 92.566
```
In addition to regularization adjustment we can also apply the Repeated Augmentation trick ` --ra-sampler --ra-reps 4`:

```
MobileNetV2:
Acc@1 72.154 Acc@5 90.822
```

<h4 id="ptq-models">Post-Training Quantized models</h4>

```
ResNet50:
Acc@1 80.282 Acc@5 94.976

ResNeXt101_32x8d:
Acc@1 82.574 Acc@5 96.132
```

<h4 id="new-recipe-with-lr-wd-crop-tuning">New Recipe (LR+weight_decay+train_crop_size tuning)</h4>

```
torchrun --ngpus 8 --nodes 1 --model $MODEL_NAME --batch-size 128 --lr 1 \
--lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear \
--auto-augment ta_wide --epochs 600 --random-erase 0.1 --weight-decay 0.000002 \
--norm-weight-decay 0.0 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 \
--train-crop-size 208 --model-ema --val-crop-size 240 --val-resize-size 255
```

```
EfficientNet-B1:
Acc@1 79.838 Acc@5 94.934
```

## Pitch

To be able to improve the pre-trained model accuracy, we need to complete the "Batteries Included" work as #3911. Moreover we will need to extend our existing model builders to support multiple weights as described at #4611. Then we will be able to:
- Update our reference scripts for classification to support the new primitives added by the "Batteries Included" initiative.
- Find a good training recipe for the most important pre-trained models and re-train them. Note that different training configuration might be required for different types of models (for example mobile models are less likely to overfit comparing to bigger models and thus make use of different recipes/primitives)
- Update the weights of the models in the library.

cc @datumbox @vfdev-5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve the accuracy of Classification models by using SOTA recipes and primitives #3995

🚀 Feature

Motivation

New Recipe + FixRes mitigations

New Recipe (without FixRes mitigations)

New Recipe + Regularization tuning

Post-Training Quantized models

New Recipe (LR+weight_decay+train_crop_size tuning)

Pitch

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve the accuracy of Classification models by using SOTA recipes and primitives #3995

Description

🚀 Feature

Motivation

New Recipe + FixRes mitigations

New Recipe (without FixRes mitigations)

New Recipe + Regularization tuning

Post-Training Quantized models

New Recipe (LR+weight_decay+train_crop_size tuning)

Pitch

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions