Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add model script, training recipe and pretrained weight of cmt_s #680

Merged
merged 1 commit into from
Jul 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 85 additions & 0 deletions configs/cmt/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# CMT: Convolutional Neural Networks Meet Vision Transformers

> [CMT: Convolutional Neural Networks Meet Vision Transformers](https://arxiv.org/abs/2107.06263)

## Introduction

CMT is a method to make full use of the advantages of CNN and transformers so that the model could capture long-range
dependencies and extract local information. In addition, to reduce computation cost, this method use lightweight MHSA(multi-head self-attention)
and depthwise convolution and pointwise convolution like MobileNet. By combing these parts, CMT could get a SOTA performance
on ImageNet-1K dataset.


## Results

Our reproduced model performance on ImageNet-1K is reported as follows.

<div align="center">

| Model | Context | Top-1 (%) | Top-5 (%) | Params(M) | Recipe | Download |
|-----------| -------- |-----------|-----------|-----------|---------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|
| cmt_small | D910x8-G | 83.24 | 96.41 | 26.09 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/cmt/cmt_small_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/cmt/cmt_small-6858ee22.ckpt) |


</div>

#### Notes

- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.

## Quick Start

### Preparation

#### Installation

Please refer to the [installation instruction](https://github.com/mindspore-lab/mindcv#installation) in MindCV.

#### Dataset Preparation

Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.

### Training

* Distributed Training

It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run

```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/cmt/cmt_small_ascend.yaml --data_dir /path/to/imagenet
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.

Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.

For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).

**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.

* Standalone Training

If you want to train or finetune the model on a smaller dataset without distributed training, please run:

```shell
# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/cmt/cmt_small_ascend.yaml --data_dir /path/to/dataset --distribute False
```

### Validation

To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.

```
python validate.py -c configs/cmt/cmt_small_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
```

### Deployment

Please refer to the [deployment tutorial](https://mindspore-lab.github.io/mindcv/tutorials/deployment/).

## References

<!--- Guideline: Citation format should follow GB/T 7714. -->
[1] Guo J, Han K, Wu H, et al. Cmt: Convolutional neural networks meet vision transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 12175-12185.
61 changes: 61 additions & 0 deletions configs/cmt/cmt_small_ascend.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# system
mode: 0
distribute: True
num_parallel_workers: 8
val_while_train: True

# dataset
dataset: "imagenet"
data_dir: "/path/to/imagenet"
shuffle: True
dataset_download: False
batch_size: 128
drop_remainder: True

# augmentation
image_resize: 224
scale: [0.08, 1.0]
ratio: [0.75, 1.333]
re_value: "random"
hflip: 0.5
interpolation: "bicubic"
auto_augment: "randaug-m9-mstd0.5"
aug_repeats: 3
re_prob: 0.25
crop_pct: 0.9
mixup: 0.8
cutmix: 1.0

# model
model: "cmt_small"
num_classes: 1000
pretrained: False
ckpt_path: ""
keep_checkpoint_max: 10
ckpt_save_dir: "./ckpt"
epoch_size: 300
drop_path_rate: 0.1
dataset_sink_mode: True
amp_level: "O2"

# loss
loss: "ce"
label_smoothing: 0.1

# lr scheduler
scheduler: "cosine_decay"
lr: 0.002
min_lr: 0.00001
lr_epoch_stair: True
decay_epochs: 295
warmup_epochs: 5

# optimizer
opt: "adamw"
filter_bias_and_bn: True
momentum: 0.9
weight_decay: 0.05
loss_scale_type: 'dynamic'
loss_scale: 16777216.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really large. How can you get this magic number?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use the initial loss scale(2**24) in mindspore.amp.DynamicLossScaleManager.
dynamiclossscale

drop_overflow_update: True
use_nesterov: False
3 changes: 3 additions & 0 deletions mindcv/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from . import (
bit,
cait,
cmt,
coat,
convit,
convnext,
Expand Down Expand Up @@ -55,6 +56,7 @@
)
from .bit import *
from .cait import *
from .cmt import *
from .coat import *
from .convit import *
from .convnext import *
Expand Down Expand Up @@ -112,6 +114,7 @@
__all__ = []
__all__.extend(bit.__all__)
__all__.extend(cait.__all__)
__all__.extend(cmt.__all__)
__all__.extend(coat.__all__)
__all__.extend(convit.__all__)
__all__.extend(convnext.__all__)
Expand Down
Loading